Meena: Google’s New Chatbot

A more human-like and versatile chatbot

Google recently published a paper on its new chatbot Meena. Google has hit all the right chords in terms of its design and approach. While the underlying techniques are not entirely new, but it seems to be the right direction in terms of building chatbots which are truly versatile and more human-like in terms of their interactions.

The Rise of Chatbots

Chatbots are AI system which interact with users via text messages or speech. There has been tremendous growth in applications of chatbots off late. The chatbot market, in fact, is expected to grow to $9.4B based on some estimates.

There are a lot of use cases of chatbots — for example, customer service, e-commerce facilitator, food delivery etc. Most of the big companies have their own version of chatbots — example Siri by Apple, Google Assistant by Google, Alexa by Amazon etc.

There are also a lot of frameworks available to help you build your own chatbot applications, example — open source framework RASA, Google’s DialogueFlow, IBM’s Watson, or Microsoft’s Bot Framework

Notwithstanding the growth which chatbots have seen, here are some of the major challenges with most of the current solutions:

  1. Closed-Domain / Rule Based: Most of the chatbots are closed-domain, which means they work only within a specific domain. These bots look for certain keywords to figure out the user intent and some rule-based system under the hood determines what action should be taken based on the intent. While these systems are still useful but they are not generic and require a lot of domain specific work to get going
  2. Not Conversational: The holy-grail of chatbots is that the interactions should look human-like. With most of the current systems that’s hardly the case
  3. Not Multi-turn: Most of the chatbots fail to take the larger context of multiple interactions (where the user and the chatbot interact with each other by taking turns) into account while giving a response, this again leads to non human-like experience while interacting.

Meena

Meena is a multi-turn, open-domain chatbot based on Transformer seq2seq architecture (you may read this for more on Transformers) trained in an end-to-end fashion. It's huge in size comprising of 2.6B parameters and was trained on 300GB+ text data. Meena tries to address the challenges highlighted in the prior section by its approach and design.

Here are some of the key highlights from the paper:

  1. Training Data: Meena is trained on public domain social media conversations. Each conversation is turned into a (context, response) pair where “context” represents up to 7 messages prior to the “response” message. Total data comprised of 40B words and 341 GB of text data.
  2. Model Architecture: Meena is based on Evolved Transformer (ET) architecture, which is an evolutionary NAS architecture based on the original Transformer architecture. The model has 1 ET encoder block and 13 ET decoder block.
  3. Model Training: The model was trained for a whopping 30 days on a TPU v3 pod (2,048 TPU cores !!)
  4. Taking the existing research forward: Most of the concepts which are used in Meena, example — open-domain nature, end-to-end training, transformer seq2seq architecture etc have been attempted in some form in the past. It's just a lot bigger (training data, model size) and better (architecture, decoding).
  5. Evaluation Metric: It's hard to evaluate performance of chatbots as there is no single correct answer. One of the major contributions of the paper as per the authors is — design of an evaluation metric which they call Sensibleness and Specificity Average (SSA). It's supposed to mimic whether the conversation would be called as a good conversation based on human judgement or not.
  6. Sensibleness and Specificity Average (SSA): For a given response, SSA is defined as the average of binary indicators of whether the response would be deemed to make sense (sensible) and specific (that is not a generic response like “I don’t know” or “that’s good”) by a human or not.
  7. Training Objective: Defining a metric for evaluation of end performance is not enough as one also needs to define a measure which the model would try to optimize for while training. Another major accomplishment as per the authors is demonstrating that the “perplexity”, the commonly used metric in NLP, is correlated with the SSA metric which mimics human judgement. So optimizing the model for decreasing perplexity automatically makes it more desirable by humans.

Results

Meena vs Humans & other chabots. Source Google Blog

Meena scores 79% on the SSA metric which the authors have defined, which is below the level of Humans at 86%, but it's still way better than a lot of other popular chatbots which were evaluated. Of course, other chatbots were not trained keeping the new SSA metric (which the authors have proposed) in mind, but current approach still deserves merit as it's not handcrafted and is trained in an end-to-end fashion.

Summary

Google’s approach and results for Meena are exciting. While there is still a lot of work left for being able to build chatbots which are as good as humans in terms of expertise and versatility, but Meena seems to be a step in the right direction.

Towards Data Science

A Medium publication sharing concepts, ideas, and codes.

Moiz Saifee

Written by

Senior Principal at Correlation Venture. Passionate about Artificial Intelligence. Kaggle Master; IIT Kharagpur alum

Towards Data Science

A Medium publication sharing concepts, ideas, and codes.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade