The world’s leading publication for data science, AI, and ML professionals.

Modular DREAM Socialbot in Alexa Prize

During Alexa Prize we have learned more about running a socialbot in production development, and we are ready to share our knowledge!

An introduction and walkthrough of an open source dialogue-system project

In the spring of 2019, a team of students from the Moscow Institute of Physics and Technology (MIPT) under the leadership of Mikhail Burtsev was selected to participate in the Alexa Prize Challenge 3 from Amazon. That is the official beginning of the DREAM socialbot development which is now alive and is 2 years old already. Our journey in Alexa Prize Challenge 3 ended in May 2020 after the Semifinals as we were not selected to pass to Finals unfortunately. But we managed to create our first version of DREAM socialbot using the open-source DeepPavlov Agent framework. After the Semifinals we spent 4 months adding the support for working with the Knowledge Graphs (KGs), with the goal of eventually open-sourcing the entire bot in the second half of 2020. However, in late September, Amazon announced Alexa Prize Challenge 4, and our application was proudly selected for participation again. The competition was great, and although we got into the Semi-Finals, we sadly did not pass to the Finals. Anyway, we have learned a lot from these 2 years of running a socialbot in production development, and we are ready to share our knowledge with interested readers, so we are starting a series of articles about DREAM socialbot.

Dream Team from Alexa Prize Challenge 4.
Dream Team from Alexa Prize Challenge 4.

There are two main classifications of dialogue systems:

  1. task-oriented and chit-chat,
  2. open domain and closed domain.

Task-oriented systems conduct the conversation with the aim of some task completion, for example, airplane tickets booking or restaurant table reservation. Chit-chat systems intend to have a chat with no special purpose for the dialogue. The domain of the dialogue system determines a topic restriction for the conversation, so closed domain systems cover only one or several topics while open domain systems can conduct dialogues on any topic.

So, as the socialbots participating in the Alexa Prize Challenge should be able to support conversation on a wide variety of popular topics, they are open domain chit-chat dialogue systems by definition.

Another critical characteristic of a dialogue system is architecture which is basically divided into monolithic and modular systems. Monolithic (or end-to-end) dialogue systems are the one model processing input and returning the final response relevant to the dialogue’s context. Such dialogue systems have low interpretability and are almost uncontrollable. Given that the size of the model is limited to the resources and maximum response time, the model initially can not cover well open domain dialogue and have a rich vocabulary at the same time. Therefore, nowadays most of the dialogue systems have modular architecture containing a lot of different rule-based, Machine Learning and Deep Learning models.

DREAM Socialbot is a modular dialogue system implemented with the open-source DeepPavlov Agent framework. An overall structure of the original DREAM socialbot is shown in the image below.

DREAM Socialbot Architecture at the end of Alexa Prize Challenge 3. Image from the Technical Report
DREAM Socialbot Architecture at the end of Alexa Prize Challenge 3. Image from the Technical Report

There are several main parts corresponding to the DeepPavlov Agent architecture: Annotators, Skill Selector, Skills, Response Selector, and Dialogue State. Initially, user input utterance is processed with a lot of Natural Language Understanding (NLU) models also called Annotators, including different correction, classification, and tagging components. The Skill Selector, taking into account Dialogue State and current user utterance annotations, picks up a list of Skills that will try to generate responses for the current context. There are more than 2 dozen retrieval and rule-based Skills.

Interesting to notice, that we have tried to integrate a generative sequence-to-sequence neural model but it was not released to users due to its unpredictability and low control opportunities.

All selected Skills return one, several, or zero candidate responses to be annotated with information that is necessary to make a final choice. The Response Selector filters out inappropriate response candidates, applies hand-written heuristics, and uses an empirical formula to choose the final response. After that the response can be extended with the user name if known, and with special linking questions to further develop the conversation. All components have access to the Dialogue State which stores the dialogue history with annotations of every utterance and even candidate responses, as well as structured information about socialbot and user personality. Modular architecture allowed different developers to work on components separately that accelerated the process and involved all the team members.

The DREAM team used the DeepPavlov Agent framework, while all other teams were building their socialbots on top of the private CoBot framework by Amazon. Generally, a lot of efforts were dedicated to DeepPavlov Agent adaptation, deployment, and analytical tools, basic content within retrieval and rule-based skills.

After the Alexa Prize Challenge 3 our team intended to open-source our DREAM socialbot. We’ve spent the entire Summer of 2020 to properly jumpstart this process, with the focus on building replacements for the CoBot services, and especially on building an open-source mechanism for answering factoid questions using our fresh new KBQA component shipped in the DeepPavlov Library back in May 2020. While we have succeeded in building a new version of the socialbot with only the open-sourced components, we only managed to make it available as a demo chatbot on our demo.deeppavlov.ai website by September 5, 2020. While we had bigger plans, surprisingly Amazon announced the Alexa Prize Challenge 4 which had a shorter beginning period due to previous pandemic postponements of the previous Challenge. So, not having enough time to carefully refactor the socialbot, we decided to postpone the release of the DREAM socialbot till the end of the Challenge 4. This plan gave us an opportunity to focus more on the contents and guts of the socialbot, and open-source a much better version albeit a year later than originally planned.

DREAM Socialbot in the Alexa Prize Challenge 4 is thus based on the final version of the original DREAM Socialbot with the improvements we made last Summer. While some of these improvements like replacement of basic CoBot classifiers were temporarily cut as we got access to the updated services during the Challenge, our work on the Knowledge Graphs (KGs) became a major contribution from last Summer to our new DREAM Socialbot. This was easy to do thanks to the modular architecture, so KGs became a part of the Annotators. Some Skills from the previous year’s version utilized remote APIs to get some useful information. We incorporated this knowledge collection process also as one of the Annotators. This gave us an opportunity to share this structured knowledge between all Skills.

One of the main goals was to expand the content of the socialbot. Moreover, analyzing previous year technical reports we also concluded that there is still no better dialogue coherency control than script-based dialogues. Therefore, one of the main achievements of our participation in the Alexa Prize Challenge 4 is the development and release of Dialogue Flow Framework (DFF) – an open-source framework for scripted dialogue systems development. DFF gave us an opportunity to get rid of the topic-specific retrieval skills moving on to use of scripted topic-specific skills which make an impression of coherent dialogue for at least several turns. Modular architecture allowed the skills to utilize all the available information from Annotators starting from user utterance classification to extracted entities and retrieved knowledge.

DREAM Socialbot Architecture at the end of Alexa Prize Challenge 4. Image from the Technical Report
DREAM Socialbot Architecture at the end of Alexa Prize Challenge 4. Image from the Technical Report

To summarize, modular architecture of dialogue systems implemented within the DeepPavlov Agent framework allows us:

  1. conduct separate development of different components by different developers,
  2. utilize all available information from Annotators in all components,
  3. combine independent skills of different origin (structure, framework),
  4. run in parallel components of the same level.

Independence, separate development, and parallel execution of the AI Assistant modules are all very important for the competition, allowing us to use all available human and computational resources efficiently.


By the way, DeepPavlov Agent architecture is very efficient and withstands high loads that are necessary for dialogue systems in production. The next article will be dedicated to the asynchronous pipeline of the DeepPavlov Agent, its features, advantages and disadvantages. Later we will also cover DREAM socialbot structure and components in detail, Dialogue Flow Framework, our development process and features, and a lot of insights and tips. Stay tuned!

You can read more about the DeepPavlov ecosystem on our official blog. Also, feel free to test our BERT-based models by using our demo. Please star ⭐️ us on the Github page. And don’t forget that DeepPavlov has a dedicated forum, where any questions concerning the framework and the models are welcome.


Related Articles