Context Theory I: Conversation Structure

Published in

Towards Data Science

7 min readJun 10, 2019

"Must a name mean something?" Alice asked doubtfully. 
"Of course it must,"  Humpty Dumpty said with a short laugh; "my name means the shape I am -  and a good handsome shape it is, too. With a name like yours, you might  be any shape, almost."
                          Through the Looking Glass, Lewis Carroll

In the previous article, we discovered basic linguistic concepts related to context theory during the search of a meaningful unit. In this article, we will focus on conversation structure, why some utterances can follow some others without being awkward, how a repair is organized…what makes a dialogue meaningful.

A dialogue is definitely more than sharing words; according to Martin Buber, “a dialogue is a reciprocal conversation between two or more entities… it is an effective means of on-going communication rather than as a purposive attempt to reach some conclusion or to express some viewpoint.”

Before exploring meaning of individual utterances, let’s see the meaning of a dialogue as whole. Remember, I told you that context is greater than summation of its components (though we are not sure what exactly those components are, yet).

Certainly the meaning of a dialogue depends on

who the participants are
where they are
why they are there
context of the language use.

Then the linguistic choices are not made arbitrarily, but systematically motivated by contextual factors. Famous linguist Hymes explained the issue better in his SPEAKING model as follows:

Setting and Scene refers to the physical and psychological settings that dialogue is taking place, at the university building, at grandma’s house or at the hospital; in a funny, serious, professional or romantic mood, maybe the cultural ambiance in a sense.

Participants is the information about the participants, their cultural and social backgrounds. Obviously the way a teacher speaks to a pupil is different from auntie Clara speaks to her grandson Joe.

Ends are the purposes and outcomes of the dialogue. Harry wanted to confess his love to colleague Sally on her birthday party, however instead of saying “I love you Sally” while presenting a gift, he was able to utter only “Nice to meet you” followed by her awkward silence. The initial goal and the outcome is unfortunately very different.

Act sequence is the sequence of utterances.

Key refers to the spirit of the speech i.e. happy, sad or enthusiastic.

Instrumentalities are channels of communication writing, speaking, signaling, sometimes a gaze, sometimes a WhatsApp message or SMS. Online ways are relatively new, but very common channels.

Norms are the social norms. For instance French, Turkish, German and many more languages has a you-respectful/2nd person plural: vous, siz, Sie.

Genre is the kind of the dialogue, a small talk, a flirt talk or an anecdote between family. Just like music, speech has several genres. Unlike music, some speech genres can be difficult to define.

I told you that language may not be that self-contained, there is definitely an environment effect. Moreover, our brains think that center of the world is himself/herself. While we are speaking about relative time or geographical positions, the reference point s always now and here. These are common phrases of everyday language:

My home                   Tomorrow             Me
My office                 Last week            Your boyfriend
Here                      After my birthday    My mom
Over there                Recently
Opposite to the street    Soon

Where, when and to whom are these phrases referring to exactly? Better, imagine finding a message in a bottle saying, “I run out of food, most probably I’ll be dead in 2 days. Please save me, I’m in an island near old lighthouse.’ without any date or map. Would you go looking for the owner of the message? How can you know if he’s not dead already without a reference point?

The most egocentric term of this article is the Deixis. A word is deictic if its semantics is fixed but the denotational meaning varies depending on time and/or place, the reference time/point.

Center of our semantic worlds: Ourselves, image source:Wikipedia

I         this      today
you       that      tomorrow
we        here      now
          there
personal  spatial   temporal

are typical deictic words, notice the existence of the pronouns. Pronouns are usually underestimated in NLU tasks, either they die completely in stopwords-filtering module or don’t attract much attention. It is true that they don’t carry much information in general, but in short texts they can shift the meaning a lot.

Maps apps, navigation devices, driving assistants like our Chris, taxi apps and ridesharing apps should resolve spatial deixis in almost all queries. Consider the following dialogue segment:

Customer: I ordered a taxi 5 mins ago, where is it now?
Bot: It's on the way, arrives in 4 mins.
Customer: Can he come opposite side of the park?

Here, obviously “opposite of the park” is the park that is near to the customer’s location (in my case Tiergarten 😁). Our Chris faces these sort of queries everyday and successfully resolves them:

Hey Chris, navigate home
Navigiere ins Büro
Navigate to work
Navigate to my place

Temporal deixis are more common in customer complaints (yesterday, 5 days ago, tomorrow, 1 week ago) and spatial deixis are more common in helping customers to find their ways. Spatial deixis is much more difficult, temporal deixis indeed can be parsed by CFGs; whereas spatial deixis includes geographical semantics. (We don’t tell anyone how we do it for Chris, it’s pure black magic😉).

As one sees, a dialogue as a speech act has many ingredients, but why do we come together and speak at the first place? In order to have a meaningful dialogue, the participants should meet on the same ground. First of all both parties should be willing to exchange some information and meet on some common ground; there should be some mutual knowledge and mutual assumptions. Moreover during the dialogue, they should signal the other participant that they hear and what they hear; the hearer must ground the speaker’s utterances. Misunderstanding repair is a part of this process and at the same time a huge task in Dialogue Management. Consider the following dialogue part, grounding patterns added to the design for a more human alike conversation experience:

Customer: I want to buy a ticket for tomorrow.
Bot: So you fly tomorrow, where are you going?  (confirm what the bot heard)
Customer: To Munich.
Bot: OK, great. What time exactly? (confirm that the bot heard the previous utterance)

Human beings “know” who, when and for how long should speak in a dialogue, turn-taking happens via reading signals of syntax, prosody, intonation, pauses, gestures, some discourse markers and maybe a gaze. Notice that some of the signals are rather about “being human”, humans can read other humans not only by words or spoken language. A slight pause or a look in the eye is enough to understand that other person is finished. Human beings have the capability of reading others since thousands of years, and those genes are passing thanks to the evolution process. What about your chatbot design? While speaking on the phone with a customer, can he know his turn to speak? (kudos to Google Duplex here, who definitely learnt where to join the conversation.)

Last and maybe the most important concept of this article is sequential organization. When we want to charge meaning to an utterance, where this utterance is positioned in the interaction is as important as its words. Surely meaning of a sentence can vary on

is this utterance opening a dialog?
is it response to some prior utterance?
is it repair of some prior utterance?
or is it opening a new context?

, see it with your own eyes if you don’t believe me:

Customer: Hello!                       (greeting)
Bot:      Hello, how can I help you?Bot: Would you like to order a taxi now?
Customer: Helllooooo, I already ordered a taxi 5 mins ago! (ah no)Bot: Would you like to place an order right now?
Customer: Yes, please.
Bot: OK, then I'm reserving your seat for Atlanta 9:00AM flight tomorrow.
Customer: Yes, but I'm not going to Atlanta, I go to Athens.

Different sorts of adjacency pairs derives different meanings. In the second example, a question-answer adjacency pair charges its literal meaning to Yes, while the second adjacency pair is a error recovery pair and in this context Yes does not carry much information indeed, rather used as a context continuator.

Another concept is related to context recovery, repair organization is a crucial topic in Conversational AI indeed. We all have our moments of “Sorry I didn’t get that” of our designs (accidents happen 😶), the question is can one recover from the trouble and turn back to the previous context as soon as possible? When a trouble occurs, either the user initiates a correction or the chatbot wants to clarify the situation. See the following dialogue segments:

Customer: Please play Crazy from Britney.
Bot:      I'll put on Seal right away.
Customer: No, Britney Spears.
Bot:      Alright, You Drive Me Crazy from Britney comes.Customer: Put on some good vibes, please.
Bot:      Sorry I didn't quite get that, can you maybe rephrase your sentence?
Customer: Play some music please.
Bot:      Sure, what would you like to listen?

A typical repair trajectory consist of the following sequence of acts: trouble, initiation of correction and the correction. The most important thing is not to lose the prior context, due to getting lost in the repair trajectory:

- I want to order 2 cheeseburgers please.
- Ketchup or mayo?
- One with ketchup, other with mayo please.
- Sorry I didn't quite get that.            (here the order is not comprehended)
- Which part you didn't understand?
- Sorry I didn't quite understand, can you maybe rephrase? 
(this is the point of losing it, bot forgets about the order and focuses on recovering the very last utterance)

Dear readers, we already completed both linguistic and a structural basics; yet we crave for more. In the oncoming episodes of this series, we discover semantic frames, “types” of context, how to attend and exactly what to attend…more computational concerns. At the end of the day we all enjoy a bit of PyTorch code, right😉

For all and more, join us at https://chris.com for the finest Chris engineering. We bring a revolution to Conversational AI and create a unique product. You can also visit me on https://duygua.github.io always. Meanwhile stay happy and tuned!

References

Buber, Martin. 1958. I and Thou. New York: Scribner.
Hymes, D. 1974. Foundations of sociolinguistics: An ethnographic approach. Philadelphia: University of Pennsylvania Press.

Context Theory I: Conversation Structure

References

Written by Duygu ALTINOK