Photo by Arseny Togulev on Unsplash

Hands-on Tutorials

Improving virtual assistants’ performance using semantic search and Sentence Transformers

Enrique Mora
Towards Data Science
7 min readNov 26, 2020

--

Introduction

99% of the virtual assistants that we can find in the market are following the “intent detection” paradigm. If we think about this nomenclature carefully, it makes total sense since our virtual assistant is trying to capture (or detect) the user intention from natural language (text or voice). Once the user intention has been captured, our virtual assistant can react and send the conversation to the proper dialog flow.

Sometimes it is complex to perform this detection successfully (the intention of a human is not always clear from a sentence), or sometimes the user is talking about things that are not in the knowledge scope of the virtual assistant.

In these cases, it makes sense to have a kind of fallback mechanism that helps the virtual assistant to present a possible and accurate answer to the user, instead of the frustrating “I cannot understand you, can you write it with other words?” sentence applicable in most of the assistants.

Figure 1: Intent detection chatbot high-level architecture (Image by the author)

Intent detection and NLP

How is our well designed virtual assistant able to capture (or detect) the user intention? Here is where natural language understanding plays its role by performing the typical cognitive task that is extremely (sometimes) easy for humans and an authentic nightmare for computers.

The NLP/NLU techniques allow us to tackle this task in different ways. In the scope of the intent detection paradigm, we need to define the topics that the virtual assistant can talk about. We call each topic an “intention”. Once we have defined these intentions, the next step is to train a machine learning (ML) text classifier where each intention will be a class.

The main task will be, then, to try to assign a probability for the user sentence to each class. The class (or intent) with a higher probability will be the selected one.

Figure 2: Intent detection classifier training process (Image by the author)

After this explanation, words start to have more sense. Intent detection is “just” an ML text classifier (for a very short text, the user input). When we say that the virtual assistant is learning or that we need to train it, it is because we have to train an ML classifier. And for sure, we need to measure it.

If we are not able to classify the sentence…

As with any other ML classifier, our intent detection engine has a performance and, as with any other ML supervised model, it is not perfect. Therefore, in some cases, our ML model will not be able to classify properly the user input (also called utterance). What can we do when this happens?

In the not-so-much-rare-case where the intent detection model would not be able to detect properly the intention, the user input will be classified in the “default” intent/class.

If this happens, we can follow several approaches to send back the virtual assistant response to the user:

  1. Sending a user-friendly sentence such as “I cannot understand you (human) can you write it in other words?”… Mmm sure, I love it when the chatbots do this!
Image by the author

2. Using the first option plus “and remember that I can only talk about …”

Image by the author

3. Trying to find a possible answer in other places

The third option is indeed a right fallback strategy. Usually, we have a “knowledge base” where we can retrieve information to find the correct answer for the user. Although knowledge base is a very nice word, it is in the end a simple (and very useful) FAQ list.

This Questions and Answers pairs can help a lot to the poor virtual assistant by improving its knowledge and scope. How can this be done? Easy: try to match the user input to any question in the list by using any flavor of fuzzy search, for example. If you find any coincidence, send the best-selected answer back to the user.

Figure 3: FAQ fallback strategy (Image by the author)

This simple strategy has been proved as a very good one to mitigate the intent detection errors. However, it has some problems. The main one? We are trying to find the best match between the user input and our question list through “keywords” and, of course, in some cases, this approach fails miserably.

Let’s see some examples:

Figure 4: Fuzzy search problems for question matching (Image by the author)

Semantic Search and Sentence Transformers

Semantic search is the task that allows us to find similarities between texts by their meaning instead of using just keywords.

How can we implement this in our virtual assistants? Recall our fallback strategy (using FAQ) weakest’s point: we are searching for the match using keywords. It should be nice to do the same and finding the question match by meaning!

Probably you have heard something about the “new” ML Natural Language models based on Transformers. In a nutshell, these new models can predict the conditional probability of a word (or sequence of words) given a context. If we have a model like this, we can tackle a very long list of common NLP tasks shown in multiple papers during the last year.

Figure 5: transformer-based encoder (Image by the Author)

A very important feature of these models is that while they have been trained to solve a specific task, once the training is finished, by removing the “head layers” (or the specific part of the model to solve the task) the rest of the model can be used for others. How does it work?

Generally speaking, the Transformer layers will translate the input sequence to a “latent” vector space in a smart way. Since this translation of the sequence in vectors is good enough, we can use it as an input of the “specific task model layers”.

However, we can go a step further. If we have a good and smart representation of the sequence (text in this case) as a vector, we can find distances between them, and therefore, we can find similarities!!

Thus, the strategy is clear:

  1. Represent all the questions of our FAQ as vectors using a good Transformers-based ML model, then,

2. do the same with the input question and

3. find the most similar vector question to the vector of the user input.

If the vector representation is good enough, we will find the most similar one based on the semantic of the text.

Figure 6: How does semantic search work? (Image by the author)

In theory, this makes total sense. However, as always, the devil is in the implementation details. Our incredible and never enough lauded Transformers have a caveat for this task. They are sequence to sequence models, so the output of the model will be a sequence of vectors, one for each token (or word), so we need a way of summarizing all the vectors in just one that represents the complete sequence (sentence).

For example, I recommend you take a look to the Sentence BERT paper Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks [1]. The idea is to train a model that can learn to do the best summarization of the sentences using Siamese networks

Figure 7: Siamese Networks in Sentence Bert [1]

You can find the implementation of the Sentence BERT in the nice Sentence Transformer library (visit the Github repository and the library documentation).

See how easy is to use the libraries with a few lines of code:

Source: https://www.sbert.net/examples/applications/semantic-search/README.html

As you can see, with this kind of libraries, it is easy to do the task to find the most similar question to the user input from our FAQ.

Conclusion

Having a good fallback strategy is very important to enhance the virtual assistant performance. In this case, I suggest you implement a very easy one, just to search in a FAQ list if the user utterance, that doesn’t fall into any of the intent detection engine classes, is similar to any “question”. If it is, we can send back the associated “answer” to the user.

The new sentence embeddings obtained from Transformers based models, can help you to do the searching semantically instead of just using keywords, and then boost the match performance.

References

[1] Reimers, Nils, and Gurevych, Iryna, Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks (2019), Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing

--

--

Artificial Intelligence Sr Solution Architect @Nestlé. Nestlé AI Global Strategy Program member. Machine Learning professor at Barcelona Tech School.