The world’s leading publication for data science, AI, and ML professionals.

What did a chat bot made in 1966 tell us about human language?

Language is an quintessential mark of human-level intelligence. But exactly how hard is it for a computer to have conversations?

Earlier this year, an engineer at Google made an announcement: he believed that LaMDA – Google’s internal language model – has sentient characteristics. Along with the announcement were pages of transcripts, which included conversations between him and the machine. The conversations were extremely life-like, as if it was between two people. The transcripts were heavily featured on major news network, with debates on whether or not LaMDA was truly self-aware and intelligent. Many machine learning and A.I. researchers, such as Yann Lecun, Demis Hassabis, and Yannic Kilcher, took the negative stance. They believed that LaMDA was an impressive language model, but far from true intelligence.

This was not news in the A.I. community. More than 50 years ago, another chat bot was created using rudimentary language rules. Surprisingly, it fooled more people than expected into thinking it was intelligent.

The therapist

Joseph Weizenbaum silently watched, as his test subject typed commands to a computer terminal. The test subject was having a conversation with ELIZA, a computer program designed to be a therapist.

The year was around 1966. At this point, Weizenbaum has been working on computers for a very long time. He started out working on analog computers, which used vacuum tubes and hydraulics instead of transistors. He helped developing an early digital computer, then later on the first banking computer in 1955. He was, by all means, a pioneer in computing.

But Weizenbaum was a humanist at heart. He wondered often about the nature of intelligence and the relationship between human and machines. Even back then, Artificial Intelligence as a field was already blossoming. Competing techniques and theorems to build and judge intelligence were proposed. The most famous of them were probably Turing test. In Turing test, a human will evaluate an unknown system by conversing with it, and judge whether or not the system has human intelligence based on the quality of the conversation.

The simplicity of Turing test might have made Weizenbaum skeptical. After all, it was a very subjective test, based entirely on language. So he studied different ways to segment languages so that a statement can be converted into an answer. Take the following statement, for example:

It seems that you hate me.

Weizenbaum noticed that by identifying only important words and phrases such as "you" and "me" and discard the rest, the sentence can be formalized as

X you Y me.

Which can then be transformed into a question

Why do you think that I hate you?

He can also rank the importance of words and phrases. Take the following sentence:

I know everybody laughed at me.

The words "I" and "me" are highlighted, but Weizenbaum noticed that when people used "everybody," they often talked about particular people. Therefore, the phrase "everybody" takes higher precedent, and a different answer can be used

Who in particular laughed at you?

That was the basis of Weizenbaum’s conversational system. He named the program ELIZA, after the main character in the play "Pygmalion" who learned how to converse with others. ELIZA used various scripts, the most successful of them was called Doctor. Doctor mimicked the therapy style of Carl Rogers, an influential psychologist.

Which brings us to 1966. Weizenbaum was observing his subject interacting with ELIZA. The subject typed into the screen, and ELIZA responded with the Doctor’s script:

Human: Men are all alike.

Computer: IN WHAT WAY?

Human: They’re always bugging us about something or other.

Computer: CAN YOU THINK OF A SPECIFIC EXAMPLE?

Human: Well, my boyfriend made me come here.

Computer: YOUR BOYFRIEND MADE YOU COME HERE

Human: He says I’m depressed much of the time.

Computer: I AM SORRY TO HEAR YOU ARE DEPRESSED

Human: It’s true. I am unhappy.

After a while, the subject turned around, and politely asked Weizenbaum to step out of the room so that she could be given some privacy. This surprised Weizenbaum – his test subject had experienced something close to a real therapy session, using a program with less than 200 lines of code!

What actually happened?

ELIZA went to convince many others that they were having meaningful conversations with it. An anecdote from the 1970s said that a sales executive conversed with it, thinking that he was talking to a client at MIT. ELIZA might not be truly intelligence, and today we can build far superior language models, but it taught us many valuable lessons in building Natural Language systems.

Language models don’t have to be good to be believable

The reason why ELIZA worked so well was later described as the ELIZA effect. Human, as it turns out, is prone to give human characteristics to computer programs we connected with. This was the reason why Steve Jobs wanted the Macintosh to say "Hello." It was also why even though language models weren’t good until recently, chat bots has existed commercially for far longer. The heuristic used in ELIZA – scanning for important words, then ranking them to determine responses – are still being used in many chat bots nowadays. Suppose you are writing a chat bot for a pizza delivery company. You first received a prompt.

I’d like a large pizza with cheese and salami __ toppings delivered ASAP.

Then, you can scan for important words and provided appropriate replies.

One large pizza, cheese with salami toppings, coming up! Where do you want it delivered?

Perhaps the most surprising aspect of ELIZA was how simple it was – its entire working was described in less than 10 pages. As Ian Goodfellow stated in the book Deep Learning:

In many cases, it is more practical to use a simple but uncertain rule rather than a complex but certain one […]

Is passing the Turing test easily?

Many argued that ELIZA should be considered the first program to pass the Turing test. Many others used ELIZA as an example for how Turing test is incomplete in judging intelligence. Remember LaMDA, the language model whose conversations were published? Would you say that the quality of the conversation means that LaMDA has something resemble intelligence? Is this simply a statistical outcome, a result of a mere algorithm? Or are the two one and the same?

In 1980, philosopher John Searle proposed a thought experiment called the Chinese room. Inside the Chinese room, there is a person with a Chinese dictionary and a rule book. The rule book contains the execution steps for a powerful A.I. language model. The person has no knowledge of Chinese, and every time he was given a prompt from a Chinese speaker outside the room, he will use the rule book to create appropriate responses. If, by following all the rules within the book, he can create believable responses, does he actually understand what was being conversed? Searle argued that the person obviously did not and neither did the rule book, therefore simply conversing and passing Turing test does not mean understanding.

Chinese room met a lot of criticisms and praises. I will only cite one criticism here: the premise was ill-defined. The premise thought that understanding and following instruction are two different things. This criticism says that they are the same, and "understanding/consciousness" is the result of many complicated instructions working together.

Weizenbaum’s warning

ELIZA kick-started a revolution in NLP. However, Weizenbaum looked at his creation with a somewhat dark view. One time, he tried to have ELIZA log its interactions so he can examine them at a later date. He immediately met resistances from research subjects, saying it was akin to violating their deepest privacy. He found human confiding their secrets into unthinking machines a symptom and a sign of dystopia. For him, it was the highly industrialized world which siloed people into pressurized conditions that let them connect so easily with algorithms like ELIZA. Because human does not understand the computer system they interacted with, they attributed to ELIZA the only characteristics they know of – themselves.

Later in life, Weizenbaum felt more negative of the impact of artificial intelligence and computers. He made a clear distinction between decision (computer) and choice (human) and believed that there is a limit to what computer can do. Weizenbaum died in 2008, before the modern neural network revolution. One must wonder what would he think about how Machine Learning are being applied nowadays, or about language models such as GPT or LaMDA.

Sources

[1] Weizenbaum, Joseph. "Computer power and human reason: From judgment to calculation." (1976).

[2] Weizenbaum, Joseph. "ELIZA – a computer program for the study of natural language communication between man and machine." Communications of the ACM 9.1 (1966): 36–45.

[3] https://spectrum.ieee.org/why-people-demanded-privacy-to-confide-in-the-worlds-first-chatbot


Related Articles