The Wonderful World Of Conversational AI

3 July 2021

written by

Nofar Schnider, Associate at StageOne Ventures

If we could go back in time and tell our ancestors that computers would one day understand us, they would probably think we were nuts. But in 2021, artificial intelligence is enabling computers to actually “understand” us, thanks to Conversational AI.

What is Conversational AI?

In a nutshell, Conversational AI is a set of technologies that allow computers to recognize human language. Using Conversational AI, machines can decipher different languages, comprehend what is being said, determine the right response, and even answer in a way that mimics human conversation, thanks to Natural Language Processing innovations.

Today, Conversational AI is integrated into bots, virtual assistants, and tools that people, companies, and organizations use for various purposes. One can talk or chat with a bot without ever realizing that there isn’t a real person on the other end.

Breaking down NLP

Natural Language Processing combines linguistics, computer science, and artificial intelligence into a wide set of tools to allow computers to understand and interact with humans using spoken words. First, the computer must understand what is being said, either by processing voice or text inputs. This can be done using many methods and algorithms: speech recognition, word segmentation, lemmatization, morphological segmentation, part-of-speech tagging, and many others. Each method answers a different need for language processing.

Natural Language Understanding (NLU), a subtopic of NLP, focuses on the reading comprehension of computers. It can be used to find sentiments within the text, as well as semantics, and other critical language components — just like we used to do when we studied grammar at school. Most of today’s technologies are powered by NLU, including searches on Google and Facebook, as well as commands made to virtual assistants.

Take asking your virtual assistant to navigate you to work over the phone as an example. First, the assistant will convert the voice input to text using Automated Speech Recognition (ASR). Next, using the converted text, the assistant will use NLU to parse it, understand the intent and mark any important words (or entities). If it understood the text, the request should be answered or executed, and a response should be deployed to the user by the Dialog Manager. Should, however, something be missing, the Dialog Manager would then request additional clarifications before executing the request or “announce” that said request cannot be executed, due to a lack of contextual understanding.

The final part of converting the response to voice is done by Text-to-Speech (TTS).

Source: https://nlp.stanford.edu/~wcmac/papers/20140716-UNLU.pdf

Conversational AI platforms and tools

In recent years, incredible tools have emerged to help R&D teams add Conversational AI capabilities to their products. These tools are based on supervised machine learning and require the R&D teams to train them (the tools) by feeding them with examples of intents and entities. Once the training process is complete, these tools can be used as part of their regular offering.

Here are a few tools worth mentioning:

Rasa — A Conversational AI platform that allows for the building of text and voice-based assistants. The company also offers an open-source framework that supplies the building blocks required to create virtual assistants.
wit.ai — A Natural Language interface that allows bots and applications to be created and trained (acquired by Facebook in 2015).
Dialogflow (formerly api.ai) — A Natural Language Understanding platform that makes it easy to design and integrate a conversational user interface into mobile applications, web applications, devices, bots, interactive voice response systems, etc. (acquired by Google in 2016).
Microsoft LUIS (Language Understanding) — A machine learning-based service to build Natural Language into applications, bots, and IoT devices.
Amazon Lex — Conversational AI for chatbots. It offers a service for building conversational interfaces into any application, using voice and text. It provides the advanced deep learning functionalities of ASR for converting speech to text and leverages NLU to recognize ant text’s intent, to enable you to build applications with highly engaging user experiences and lifelike conversational interactions.

There are many other tools and libraries in various development languages that provide the building blocks and means to create Conversational AI-powered services and products. In Python, for example, you can find libraries and tools such as NLTK, SpaCy, and PyTorch-NLP.

One deep learning-based model worth mentioning is called Generative Pre-trained Transformer 3 (GPT-3), a technology that was created by OpenAI. It is an autoregressive language model that can produce human-like text. The generated text is high-quality and one can barely notice it was not written by a human being.

Putting Conversational AI to good use

In recent years, we have witnessed the appearance of several start-ups that leverage the power of Conversational AI and NLP to solve big pains for companies and organizations. The use of Conversational AI helps them reduce costs, offer better service, and spend more time on important tasks rather than waste employee’s attention on simple ones.

At StageOne Ventures, we have spent the past few years investing in an impressive amount of Artificial Intelligence-based applications and infrastructure, especially in Conversational AI:

Our first investment — Apprente, was founded in 2017 and was acquired by McDonald’s towards the end of 2019, to help the company take orders faster, simpler, and more accurately.

Apprente was looking to upend the customer care industry, traditionally bogged down by high labor costs, inadequate service, poor innovation, and anemic automation. Apprente introduced Conversational AI that drove the customer experience to new levels of human communication while slashing expensive overhead for businesses. The company’s technology solution is based on deep machine learning capabilities that exercise lifelong and rapid learning methodologies through direct communication with users. The result was intelligent conversational agents that continuously learn and adapt through simple-to-complex interactions.

Another company we invested in is Second Nature. The company was founded in April 2019 by Ariel Hitron and Alon Shalita to leverage advanced artificial intelligence to create a “virtual customer” that runs simulations with B2B sales reps to help them prepare before live sales conversations with prospects. Second Nature’s AI sales coaching software gives them feedback in real-time to help them improve, and coaches and certifies them to help them achieve their best performance levels.

One of our most recent investments is a portfolio company called Staircase that develops next-generation technology for B2B Customer Success and Service teams. The company’s mission is to empower Customer Success teams with AI and advanced analytics so that they can monitor, develop, and scale better human relationships, driving retention and growth.

The traditional Customer Success platforms were built a decade ago and are similar in their approach to CRM/ERP solutions emphasizing workflow and automation based on rules for low-touch customers.

Our latest investment in this field is Sedric, a company that was founded in May 2020 by Nir Laznik and Eyal Peleg. The company is building an AI-based corporate risk management and compliance adherence solution for the modern financial services industry.

We asked our portfolio companies to share their take on Conversational AI, its requisite tools, and where they believe it is headed:

Which Conversational AI/NLP/NLU solutions do you use (or have used) as part of your solution?

Ariel Hitron, co-founder and CEO of Second Nature: “We are using a mixture of existing and internally developed technologies. We focus on developing voice-related solutions, that enable us to respond quickly, understand when a person is pausing or has finished talking, and when it is time for us to respond. This problem does not exist when dealing with text inputs. A voice-based conversation is more open, and specifically, with salespeople, there is a story. The conversation is less structured and usually longer. The context is also very important — questions come up with regard to the context of the conversation and it should be kept under consideration.”

Eyal Peleg, co-founder and CTO of Sedric: “At Sedric, we are at the frontier of the AI NLP landscape. Our observations of the current solutions have led us to see that the more basic and general NLP solutions are becoming a commodity, whereas companies with an edge domain/task-specific solutions are becoming more prevalent. In addition, the increase in compute power over GPU opens the door for more complex architectures, which will be easier to train and deploy, thus providing greater performance”.

What are the biggest challenges you are facing today regarding Conversational AI?

Ori Entis, co-founder and CEO of Staircase: “One of the main challenges is access to high quality & quantity of data to train the NLP models. Another challenge is working with state-of-the-art ML models that are released to the public by academic institutions and large corporations (Google, FB, etc.), from time to time. These models are very powerful but require a large investment in data science, training, tweaking of parameters, feature engineering, and customization for specific business use cases. Multi-lingual support is a third challenge since most of the research is done in English”.

Eyal Peleg: “We have developed proprietary solutions that rely on multi-layered classification models with a feature space that includes the signal, the text, and contextual data. These models are based on transformer networks and are customized according to domain-specific language models and customer-specific use cases. One challenge is maintaining data integrity since user labels have an education curve and inherent bias. Another challenge is transferring the learning of meaningful insights from one vertical to the other.”

Where do you think Conversational AI is headed? Where it will be used in the following years?

Ariel Hitron: “I think we will see more narrative algorithms that enable content creation and control to summarize the data to a relevant context, just like GPT-3. One improvement would be the ability to crossover between different databases, answering according to the data and cross-referencing to create better answers. Another would be to predict according to the context, based on personal information, especially with regard to B2C solutions.

I think Conversational AI will become more common in voice-enabled services, especially if it would be easier to integrate it with chatbots, but everything that is related to voice will work even better.”

Ori Entis: “I believe Conversational AI is going to become more popular, as NLP improves its understanding of language, as well as its ability to generate human-like content. Conversational AI will be used in the analysis of large amounts of text/audio/video for business purposes (sales, service, and support), insurance/finance (analyzing claims and fraud detection) as well as in entertainment (online analysis, translation, and commentary). In the business world, once we achieve satisfactory results for content generation, I believe that the current method of using templates will be replaced/augmented with ML-generated text. This will allow a personalization at scale in the marketing/advertising world that is not possible today.”

Some additional companies that use Conversational AI and are worth mentioning:

Gong.io is a company that enables revenue teams to realize their fullest potential by unveiling customer reality. Its Revenue Intelligence Platform captures and understands every customer interaction to deliver insights at scale, empowering revenue teams to make decisions based on data, instead of on opinions.

One of Gong’s competitors is Chorus.ai, a company that develops an AI conversation intelligence cloud platform for sales teams. The platform can transform conversations into data and insights.

While both companies were founded in 2015, Gong now focuses on driving its clients’ revenue and growth, while Chorus focuses on conversation intelligence for sales teams. They share the same means, but each company has a different focus.

Another company in this field is Hyro, a Conversational AI company that uses natural language and computational linguistics to turn complex content into simple dialogue. It allows service providers, mainly those in the healthcare, government, and real estate industries, to better serve their customers, by replacing simple chatbots.

Another company that focuses on a speech-to-text solution is Verbit, which recently raised a $157 million series D round. Verbit uses artificial and human intelligence to provide its transcription and captioning solution. It quickly generates detailed speech-to-text files with high accuracy. The company’s AI technology supports on-demand communication access real-time translation (open captioning) services for real-time results.

The future of Conversational AI

To summarize, as we have witnessed, Conversational AI is already being integrated into some products, especially in Sales teams’ solutions. There are many other fields that can benefit from this technological innovation, like healthcare (digital wellbeing, psychology, etc.), e-commerce, and any sector that is related to “customer satisfaction.”

At StageOne Ventures, we believe this is only the beginning of Conversational AI. We are constantly looking for interesting startups in this field and making sure we are up to date on any and all relevant innovations.

We can’t wait to see how the field continues to develop!