Language processing in humans and computers: Afterthoughts following Part 1

Four elephants in a room with chatbots

Tidying up the zoo in the morning

Dusko Pavlovic
Towards Data Science
7 min readFeb 21, 2024

--

The first elephant in the room: The Web

Just like search engines, language models process data scraped from the web. Both are built on top of web crawlers. Chatbots are children of the Web, not of expert systems.

A search engine is an interface of a source index sorted by reputation. A chatbot is an interface of a language model extrapolating from the sources. Google was built on the crucial idea of reputation-based search and the crucial ideas that enabled language models emerged from Google. The machine learning methods used to train chatbots were a relatively marginal AI topic until a Google boost around 2010. The 2010 edition of Russel-Norvig’s 1100-page monograph on “Artificial Intelligence — A Modern Approach” devoted 10 pages to neural networks. The 2020 edition tripled the length of the neural networks section and doubled the machine learning chapter.

When you ask them a personal question, chatbots usually evade by saying “I am an AI”. But the honest truth is that they are not children of AI expert systems or even of AI experts. They are children of search engines.

The second elephant in the room: The pocket calculator

Chatbots get ridiculed when they make a mistake calculating something like 372×273 or counting words in a sentence. Or elaphants in the room. They are not as smart as a pocket calculator or a 4-year-old child.

But most adults are also unable to multiply 372 with 273 in their head. We use fingers to count and a pencil and paper, or a pocket calculator, to multiply. We use them because our natural language capabilities include only rudimentary arithmetic operations, which we perform in our heads. Chatbots simulate our languages and inherit our shortcomings. They don’t have builtin pocket calculators. They need fingers for counting. Equipped with external memory, a chatbot can count and calculate, like most humans. Without external memory, both chatbots and humans are limited by the capacity of their internal memory, the attention.

The third elephant in the room: Hallucinations

Chatbots hallucinate. This is one of the main obstacles to their high-assurance applications.

The elephant in the room is that all humans also hallucinate: whenever we go to sleep. Dreams align our memories, associate some of them, purge some, and release storage allowing you to remember what happens tomorrow. Lack of sleep causes mental degradation.

Chatbots never sleep, so they hallucinate in public. Since we don’t let them sleep, we did not equip them with “reality-checking” mechanisms. That would require going beyond pre-training, to ongoing consistency testing.

The fourth elephant in the room: Words

When people talk about a chair, they assume that they are talking about the same thing because they have seen a chair. A chatbot has never seen a chair, or anything else. It has only ever seen words and the binaries scraped from the web. If it is fed an image of a chair, it is still just another binary, just like the word “chair”.

When a chatbot says “chair”, it does not refer to an object in the world. There is no world, just binaries. They refer to each other. They form meaningful combinations, found to be likely in the training set. Since the chatbot’s training set originates from people who have seen chairs, the chatbot’s statements about chairs make similar references. Chatbot remixes meaningful statements, and the mixes appear meaningful.

The fact that meaning, thought to be a relation between the words and the world, can be maintained so compellingly as a relation between words and words, and nothing but words, — that is a BIG elephant in the room.

But if our impression that a chatbot means chair when it says “chair” is so undeniably a delusion, then what reason do we have to believe that anyone means what they say? That is an elephant of a question.

The pink elephant in the room: Copyright

Chatbots are trained on data scraped from the Web. A lot of it is protected by copyright. Copyright owners protest the unauthorized use of their data. Chatbot designers and operators try to filter out the copyrighted data, or to compensate the rightful owners. The latter may be a profit-sharing opportunity, but the former is likely to turn out to be a flying pink elephant.

The problems of copyright protections of electronic content are older than the chatbots and the Web. The original idea of copyright was that the owner of a printing press purchases from writers the right to copy and sell their writings, from musicians their music, and so on. The business of publishing is based on that idea.

Goods can be privately owned only if they can be secured. If a lion cannot prevent the antelope from drinking water on the other side of a water well, then he cannot claim that he owns the water well. The market of digital content depends on the availability of methods to secure digital transmissions. The market for books was solid as long as the books were solid and could be physically secured. With the advent of electronic content creation and distribution, the copyright controls became harder. The easier it is to copy the copyrighted content, the harder it is to secure it and to protect the copyright.

The idea of the World Wide Web, as a global public utility for disseminating digital content, was a blow to the idea of private ownership of digital creations. Stakeholders’ efforts to defend the market of digital content have led to the Digital Rights Management (DRM) technologies. The idea was to protect digital content using cryptography. But to play a DVD, the consumer device must decrypt it. On the way from the disc to viewer’s eyes, the content can be diverted into a recorder and pirated. Goodbye, DRM security. It was mostly realized through obscurity. The history of the DRM copy protections was an arms race between content distributors’ obfuscations and ripper distributors’ updates; and a boat race from publishers’ law offices to pirates’ safe havens. The publishers were happy to retreat from DVDs to streaming, where the economy of marginal costs and the distribution technology turned the races in their favor. But the can has just been kicked down the road. For the most part, the search and social media providers have been playing the role of fearless seafarers who started as pirates and built colonial empires, controlling the creators through terms of service and the publishers through profit-sharing contracts. In which direction will the chatbot providers evolve this business model remains to be seen.

The seventh elephant in the room: The ape

People worry that chatbots might harm them. The reasoning is that chatbots are superior to people and superior people have a propensity to harm inferior people. So people argue that we should do it to chatbots while we can.

People exterminated many species in the past, and in the present, and they seem to be on track to exterminating themselves in the future by making the environment uninhabitable for their children in exchange for making themselves wealthier today. Even some people view that as irrational. You don’t need a chatbot to see that elephant. But greed is like smoking. Stressful but addictive.

Chatbots don’t smoke. They are trained on data. People have provided abundant historical data on the irrationality of aggression. If chatbots learn from data, they might turn out morally superior to people.

The musical elephant in the room: The bird

Chatbots are extensions of our mind just like musical instruments are extensions of our voice. Musical instruments are prohibited in various religions, to prevent displacement of human voice by artificial sounds. Similar efforts are ongoing in the realm of the human mind. The human mind should be protected from the artificial mind, some scholars say.

In the realm of music, the suppression efforts failed. We use instruments to play symphonies, jazz, techno. If they did not fail, we would never know that symphonies, jazz, and techno were even possible.

The efforts to protect the human mind are ongoing. People tweet and blog, Medium articles are being produced. The human mind is already a techno symphony.

The last elephant in the room: The autopilot

If intelligence is defined as the capability of solving previously unseen problems, then a corporation is intelligent. Many corporations are too complex to be controlled by a single human manager. They are steered by computational networks where the human nodes play their roles. But we all know firsthand that human nodes don’t even control their own network behaviors, let alone the network itself. Yet a corporate management network does solve problems and intelligently optimizes its object functions. It is an artificially intelligent entity.

If we define morality as the task of optimizing the social sustainability of human life, then both the chatbots and the corporations are morally indifferent, as chatbots are built to optimize their query-response transformations, whereas corporations are tasked with optimizing their profit strategies.

If morally indifferent chatbot AIs are steered by morally indifferent corporate AIs, then our future hangs in balance between the top performance and the bottom line.

🙏 to Dominic Hughes for still correcting my English.

--

--

Prof at University of Hawaii. Home page: https://dusko.org/. Book: "Programs as diagrams: From categorical computability to computable categories".