The world’s leading publication for data science, AI, and ML professionals.

Linguistic Features of Human Speech in Dialogues with a Virtual Assistant

Do people talk differently to chatbots than they do to the other people?

Often, machine learning approaches and, of course, rule-based approaches are used to create virtual assistants. Both (mostly machine learning) rely on input data, which is usually human dialogs. This does not take into account the factor that users of dialog systems will not communicate with them in the same way as with real people.

In this article, I will consider our experiment on statistical analysis of the text of human-human and human-chatbot dialogs. The dialogs were conducted on the topic of frequently asked questions (FAQ) on COVID-19 in the Russian language. The results were processed using quantitative text analysis methods (such groups of metrics as: readability, syntactic complexity, lexical variety). Statistical analysis showed significant differences in the above metrics. In particular, respondents used shorter words and sentences and simpler syntax in dialogues with Chatbots. Moreover, lexical diversity and readability indices were higher in human-human dialogues, all of which suggests that people use simpler language when communicating with virtual assistants. In this regard, developers should pay special attention to the preparation of source data for creating such applications.

Introduction

It is no secret that today virtual assistants (in particular chatbots) have already actively entered our lives. They follow us around in banks, on government websites, and just when we want to have fun. Even though I think Gartner’s prediction that 70% of white-collar workers will be talking to chatbots every day by 2022 hasn’t completely come true, the trend of their adoption in our lives is more than visible. In my opinion, technology like virtual assistants is neutral. Depending on whose hands it is in, the degree of its usefulness or hostility may vary. In this article, I propose to focus on the useful side of virtual assistants, namely the ability to quickly and accurately get an answer to a particular question.

As an example, consider an FAQ-chatbot answering user questions about Covid-19. The collection of dialog data took place from March 2021 to July 2021, i.e. during one of the peak moments of the pandemic. The data collection process was designed so that the analysis of the two different types of dialogs (human-human and human-chatbot) could be conducted in a nearly "all other things being equal." More details in the next chapter.

Creating a chatbot

First of all, it was necessary to collect a knowledge base of the question-answer format. The data for the knowledge base were taken from open and official sources of information, such as the site stopkoronavirus.rf. The data were manually structured so that an appropriate classification model could be trained. As a technical solution, the Google Dialogflow platform and its intent classification module were used. The interface of the chatbot was the Telegram messenger. In the figure below you can see the conceptual architecture of chatbot.

Conceptual architecture of the FAQ Chatbot (Image by Author)
Conceptual architecture of the FAQ Chatbot (Image by Author)

Planning the experiment

For the experiment, we divided respondents into Group 1 (human-to-human dialogs) and Group 2 (human-chatbot dialogs). Let’s start with the second group (Group 2), because everything is simple there – respondents are invited to have a dialogue directly with the chatbot. In this case we give an introduction to the topics of the dialog and possible questions. In the first case (Group 1), everything is similar, except that respondents are dialoguing with a human expert without knowing that he or she is using the same chatbot to answer the question. In other words, the expert is a proxy between the respondent and the chatbot. This type of experiment is called an inverted Wizard of Oz. A description of the original Wizard of Oz is available here. The figure below shows a schematic of the experiment from our original article.

Experimental Approach (Image by Author)
Experimental Approach (Image by Author)

This experiment was conducted from March 2021 to July 2021. As a result, 35 dialogs were obtained for Group 1 (human-person) and 68 dialogs for Group 2 (human-chatbot). The resulting data were anonymized and used in further analysis. The source code and data are in the Github repository. In the next chapter I will give detailed statistical data on the measured metrics.

Results of text dialogue analysis

A number of metrics in the following groups were chosen to analyze textual dialogues:

The metrics were calculated using the Python library LinguaF. The description and techniques for calculating metrics can be found in our original article. Below, there is a table with the calculated average values of the metrics, as well as t-test values that allow us to determine whether the differences of the same metric on different types of dialogs (Group 1 and Group 2) are statistically significant (significance level α=0.01).

Table with calculated values of metrics showing the linguistic difference between texts (Image by Author)
Table with calculated values of metrics showing the linguistic difference between texts (Image by Author)

Obviously, for most metrics we have a statistically significant difference. Let’s start with the simple ones. Metric #1 – the average number of words in the sentences of the dialogs of Group 1 is almost 2 times greater than the same value for Group 2. This is also true for metric #2 – average sentence length. The average word length and number of syllables in a word are also slightly higher for Group 1.

The metric of syntactic complexity (Mean Dependency Distance), which shows the average distance between dependent words in a sentence shows that for human-to-human dialogues, this distance is much greater. This obviously means a more complex sentence structure in the case of Group 1.

The differences in the values of metrics #6 and #9 are not statistically significant. However, the other metrics of lexical diversity clearly show a lag in the dialogues of Group 2. All readability metrics showed a statistically significant difference, which illustrates the more complex structure of the texts of the dialogues of Group 1.

Findings and Conclusion

In this work, we have numerically confirmed that a person in a dialogue with a chatbot tends to use simplified linguistic constructions. This concerns both individual words and the structure of sentences as a whole. The fact of using simplified language is an impetus for a more careful collection of raw data used to create chatbots.

Often, machine learning approaches and, of course, rule-based approaches are used to create virtual assistants. Both approaches (mostly machine learning) rely on raw data, which is usually human dialogues. This does not take into account the factor that users of dialog-based systems will not communicate with them the same way they would with real people. We recommend that all developers keep this factor in mind and modify the data and algorithms underlying your systems accordingly.

For citing, use: https://icaiit.org/proceedings/10th_ICAIIT_1/2_2_Perevalov.pdf

How often do you use chatbots in your everyday life? Please, write in the comments!


Related Articles