In Praise of Artificial Stupidity

Why today’s A.I. is not “really” A.I., but it may not matter that much to you.

Published in

Towards Data Science

18 min readDec 17, 2018

Make Artificial Intelligence great again

“Artificial Intelligence is the new electricity.” A. Ng
“Those who do not learn history are doomed to repeat it.” G. Santayana

Not a week goes by in sunny Silicon Valley that a new Artificial Intelligence company makes the headlines, with shiny new promises and/or mind-blowing funding rounds. The frenzy is not limited to the Valley: a cloud of somewhat overlapping concepts — big data, data science, machine learning, Artificial Intelligence, deep learning— have become mainstream in recent years; serious business people went as far as claiming that data scientist is “the Sexiest Job of the 21st Century”, partially vindicating my failed attempt to become an NBA superstar (for some clear photographic evidence, compare this life with the one below).

Still, not as sexy as the life of a data scientist.

To many old(ish) practitioners, it may seem like the world finally caught up with what we knew all along: building smart machines is pretty f*%$ing cool! On the other side, there seems to be some terrible confusion on what A.I. means, what A.I. promises and what A.I. can actually deliver.

As an old(ish) practitioner and founder of “yet another San Francisco startup with an AI domain”, I find myself fully immersed in an ecosystem where optimism is often only matched by genuine ignorance of basic facts in A.I. history: I cannot help but wonder, are we at the beginning of an era, or instead at the end of one?

In what follows, I submit my deeply subjective overview of A.I. — yesterday, today, tomorrow — and shall commit the ultimate (nerd) sin: decoupling the scienc-y hype from the business value.

Disclaimer: in what follows, we won’t make particularly original claims from a scholarly perspective; even just considering non-academic stuff, some of our considerations are shared here, here, here and here. The aim of this post is not so much to discuss academically some strong argument pro/against A.I., but more to give an “insider” perspective on the current hype and put it into a wider context, especially for educated readers that are new to the field. Without any presumption of completeness and neutrality, you will find a list of references, fairly impressive folks and further comments at the very end.

Doing A.I. before A.I. was cool

“You will see, our art consists of some mathematics and a lot of imagination.” F. Durrenmatt

To remind myself how old I am and recover some perspective on today’s hype, let’s jump back to the Nineties we all know and love. I reproduced below at pedantic length some remarks made by a founding A.I. figure, John McCarthy, in a review of Penrose’s book (you didn’t expect Penrose in a 2018 A.I. blog, did you?):

Progress in AI is made by:

Representing more kinds of general facts about the world by logical formulas or in other suitable ways.
Identifying intellectual mechanisms, e.g. those beyond logical deduction involved in commonsense reasoning.
Representing the approximate concepts used by people in commonsense reasoning.
Devising better algorithms for searching the space of possibilities, e.g. better ways of making computers do logical deduction.

Like other sciences, AI gives rise to mathematical problems and suggests new mathematics. The most substantial and paradigmatic of these so far is the formalization of non-monotonic reasoning.

Wow: we just went through a full paragraph on A.I. without encountering “deep learning”. Penrose aside (sorry, I have a soft spot for logicians messing up with the guy), the references in the passage are so removed from the current debate that is awkwardly fascinating.

If your passion for A.I. started in the Nineties from classical readings (such as Minsky, 1988, Pearl, 2000, Hofstadter, 1995, etc.), McCarthy’s ideas on A.I. progress will resonate in you like a song from a distant, happy past. However, if you just arrived at the data science party, chances are that those words will sound empty: while today’s interviews at “A.I. companies” would likely involve the inside-out of, say, ordinary least squares, most candidates would probably have little-to-no-idea of what non-monotonic reasoning even mean. “What the hell happened?” you may ask at this point (“Should we hire them anyway?” you may also ask, but that’s a tough one).

In a slogan, the unreasonable effectiveness of data happened (as a concept, not just the seminal Halevy, Norvig, and Pereira, 2009, which is, unsurprisingly, a very iconic representation of that concept).

To see what I mean it’s time to take a deeper dive into the glorious days of the first A.I.: hold your breath.

(A.I.) Winter was coming

“You could attach prices to thoughts. Some cost a lot, some a little. And how does one pay for thoughts? The answer, I think, is: with courage.” L. Wittgenstein

As it’s well known, the discipline was “born” at the Dartmouth workshop: John McCarthy —yes, that McCarthy — coined the phrase “Artificial Intelligence” to mark computational learning of concepts and tasks that before were reserved solely to human intelligence. If I had to sum up those glorious old days — sometimes called Symbolic A.I. — in a single sentence, it would roughly look like the following:

the key to general (“commonsense”) intelligent behavior is representing concepts in a way that allows them to be explicitly manipulated and combined: our best guess for this “representation” is formal logic, and our best guess for “manipulation” is logical deduction.

Notwithstanding the optimism of the founding fathers, it turned out that it was “fairly easy” to replicate Bertrand Russell’s theorems from the Principia, but it was “basically impossible” to have autonomous agents walk through a room like a 2 years old child would. Commercial applications proved to be unsuccessful at scale and “Artificial Intelligence” became sort of a bad thing: winter was indeed coming, and came for us all, almost “burying” the field — at least in the eye of mainstream culture and large scale commercial endeavors.

While not exactly sure on the root cause(s) for the A.I. renaissance, in my mind a big part was played by the massive growth of Google, which proved that a company built on hard-core tech and algorithms (which today we would call “A.I.”) could indeed be the “next big thing”. What was different about how Google approached products that could accomplish “tasks that before were reserved solely to human intelligence”?

Let’s start with a very familiar example: Gmail awesome spam filter, with an example taken directly from my own inbox.

A message flagged as “spam” by my Gmail account.

For humans, there’s a lot of commonsense reasoning that goes into deciding if a message is legit vs spam, e.g.:

serious businesses do not write to A.I. startuppers about petroleum partnerships;
directors of “petroleum sales department” (whatever that is) won’t have a an outlook domain (who has an outlook domain anyway?).

In other words, your reasoning would be:

a) explicit— i.e. if asked, you could use concepts like the ones above to state why you think the message is spam (e.g. you have “representations” and those are easily sharable through language);

b) heavily based on a lot of non-trivial knowledge, starting from fully understanding the English language — i.e. the same message in, say, Russian would not have been analyzed in the same way.

If you are thinking that teaching all of this to a computer sounds like a nightmare, well, you just grasped why the good old methods of A.I. were quickly deemed very unpractical even for simple “cognitive tasks”. So, what was Google’s way out of this?

As it turns out, deep knowledge of English or commonsense beliefs are not necessary at all to reliably classify a message as spam. We can sidestep the problem of language meaning by transforming the challenge into a simple statistical question: how many times the words “partnership” and “petroleum” are to be found together in a spam vs legit message? If you ask that question for all the words in the email, you will have a “global probability” that the email itself is spam vs legit: the more emails you’ve processed containing those words, the more accurate you’ll be in your filtering.

If you think about it, this is an incredible engineering trick: we started with a problem that require non-trivial intelligence to solve, we acknowledged we cannot fully understand/replicate that intelligence, we realized that the very same goal can be achieved by substituting non-trivial intelligence with a trivial algorithm and lots of data: as it turns out, it’s easier to solve what is essentially a meaning problem (is this person using words to trick me into doing something?) by ignoring meaning altogether and leveraging patterns of co-occurrence.

With the rise of Google and the Big Data age, a perfect storm was set in motion to make “intelligent machines” cool again: increased computational power, unprecedented amount of available digital information and many clever algorithms (like the spam classifier above) started powering all sorts of commercially successful products. While details differ, most of those successes had one move in common: substituting the long and hard work of representing complex knowledge into an optimization problem of some sort; since we could not scale out intelligence, we scaled data harvesting instead.

In a sense, most of what we’ve learned after the A.I. winter is not really how to build smarter machines: to me, the biggest take-away message is indeed that a lot of practical problems which we thought would require intelligence, could actually be solved by a stupid algorithm and tons of data points.

The brainchild of “intelligence as curve fitting” is today’s deep learning hype.

Deep learning and shallow ideas

“It is not worth an intelligent man’s time to be in the majority. By definition, there are already enough people to do that.” G. H. Hardy

There’s no doubt that deep learning achieved ground-breaking, measurable improvements in all sorts of “A.I. tasks” and unlocked incredible potential for practical use of A.I. technology. If it’s possible at all to talk to our smartphones and get something more than frustration out of it, it’s largely due to deep-learning related improvement in speech recognition; if it’s reasonable to use Google Translate as a first approximation to understand a song in a foreign language, it’s largely due to deep-learning related improvement in machine translation — and the list could go on and on.

Downplaying the scientific and engineering achievements of neural networks is downright pointless and utterly silly.

There is a but though. The very idea of neuron-like structures that learn “mimicking the human brain” (yes, people actually say that out loud) is indeed pretty old: what changed recently is that hardware and algorithmic improvements made possible (well, simplifying things quite a bit) to train bigger nets on more data, and drastically improve performances on all sorts of tasks. The general idea of “intelligence as curve fitting” is still the same: given enough data and clever computational techniques, we make again and again the trade from “expensive” and “elusive” knowledge to “cheap” and “measurable” optimization.

Real 2018 slide from a real A.I. unicorn: gotta love the human brain metaphor.

Funny enough, the difference between humans and machines has become evident not so much in how many answers they got right/wrong, but more in how they are wrong. In other words, even if humans and machine agree on the caption for this picture:

they would agree based on very different “thought processes”. As a proof of that, the following machine-generated caption (“a dinosaur on top of a surfboard”) is not just wrong: it’s actually so far from an “intelligent guess” that raises suspicion on the entire game— how is it possible for a physical system that understands images to be so wrong?

Well, in fact it’s not possible: there is no understanding when the system “happens to be” right/wrong, as the general lesson is still the same — in order to be, say, 90% accurate on many seemingly intelligent tasks, there is indeed no need to be intelligent at all.

Things get usually worse when the task at hand has a deep (pun not intended) structure that somewhat resists “curve fitting”: I’ve had a life-long interest in languages, so it’s natural to ask how much deep learning really revealed about human language. Take for example a very cool project, deepmoji, that uses deep learning to produce a model that “has learned to understand emotions and sarcasm”. Even if the original paper is pretty interesting, it’s also fairly easy to realize that “understand sarcasm” may prove to be slightly more elusive than advertised. Consider the pair of sentences below:

Testing Deepmoji with negation (original video here).

My flight is delayed.. amazing.
My flight is not delayed.. amazing.

While in the first case DeepMoji detects sarcasm and suggests appropriate “angry” emojis (awesome!), the second sentence, which differs only by three letters (N-O-T), is completely misunderstood: there’s no sarcasm at all there. As before with images, this shows that we did not really “understand” sarcasm in the first place, but we were “just” able to train neural networks to pick up statistical features that, under certain conditions, proved to be precise enough.

Let us stress just two final points before wrapping up:

explainability: it’s often noted that deep learning models are “black boxes”, as it is hard for humans to understand why they do what they do. While most observers stress obvious ethical and practical consequences of such feature, our little tour through A.I. history (doesn’t certainly prove but) suggests that some degree of explainability may be a key part of “intelligence”: having representations implies some sort of “modular” structure which leads more naturally to answer “why questions” than a simple matrix of weights;
data: a lot of statistical learning (not just deep learning) requires tons of data to work. While tons of data is not an exact quantity, it’s important to notice that, however you define that, it’s likely much more than other, more efficient physical systems would require: which systems? Chances are, you probably have one close to you right now: they are called “children”, and they learn complex concepts sometime even from a single example. While we certainly can imagine a super intelligent alien race that takes ages to master new tasks, it’s tempting to measure how far we are from true intelligence by considering how slow and expensive machine learning is.

In the end, as impressive as “curve fitting” has become, there’s a growing number of practitioners that believe that A.I. is in deer need of new ideas, not more data/GPUs (Geoffrey Hinton, a true pioneer of deep learning, recently stated that we should “throw it all away and start again”). We will mention some pioneering developments we like at the very end, since we now have to ask: what should we do with all this newly found Artificial Stupidity?

Prediction machines

“Sometimes it seems as though each new step towards AI, rather than producing something which everyone agrees is real intelligence, merely reveals what real intelligence is not.” D. R. Hofstadter

Agrawal, Gans, and Goldfarb, 2018 is a recent book by three economists that analyzes the A.I. renaissance through the lenses of economic theory. Their core argument is fairly simple: A.I. means lower cost of predictions, and since predictions have great and widespread business values, a lot of economic processes will be changed in the near future as A.I. permeates every aspect of our personal and corporate life.

At this point, it should be clear that the “A.I.” they have in mind is clearly not what McCarthy and friends set out to build during that summer at Dartmouth; what they have in mind is a plethora of small, task-specific, finely optimized pieces of software that will solve narrow business problems better than existing systems — a spam filter, a carousel of recommended books in an e-commerce, a notification on when to buy a ticket to Chicago, optimizing for weather, airlines price fluctuations, etc.

This, and “just” this, is the current A.I. revolution, which is taking by storm consumer and enterprise markets by simply be better at “stupid” tasks: while these prediction machines won’t achieve any level of intelligence or understanding, combined together they lower the costs and increase the efficiency of many processes.

In this perspective, while tech is not the main actor anymore, it’s obviously the enabler of this new business ecosystem: without the booming of open source engineering tools and libraries, educational resources, computational power, sheer data, none of this swarm of “A.I. for X” companies would have been possible and a lot of situations would still be fairly un-optimized. As prediction gets abstracted and encapsulated with even better libraries and services, we can see a near future in which A.I. components will be as common in a code base as today is interacting with a database. While exchanging the big dream of understanding intelligence with aptly optimized curve fitting tools is certainly disheartening to some, taking A.I. from university labs into the wild business world was a necessary step to renew interest, increase funding and attract talent. From the business point of view, there are multi-billion dollar opportunities out there who require just a bit of science and the right amount of data (McKinsey estimates A.I. may create more than $3.5T in value annually): for “prediction-based A.I.” spring, certainly not winter, is coming.

Obviously enough, all this “dumbed-down” A.I. will be worthless in comparison to what (even a fraction of) “true” A.I. could achieve; unfortunately, the one thing A.I.-as-a-field did very consistently throughout the years, is to fail spectacularly to match its own expectations. In this respect, the advent of “prediction machines”, while less exciting and game-changing than the future advent of “thinking machines”, marks an unprecedented milestone in the history of the discipline — i.e business value above a symbolic “point of no return”.

While hype and optimism will continue to come and go, “curve fitting” finally gave A.I. a seat at the big table: it’s indeed true that sometimes wise beats smart.

A future far far awA.I.

“Hofstadter’s Law: It always takes longer than you expect, even when you take into account Hofstadter’s Law.” D. R. Hofstadter

In the last sixty years we went from “deduction machines” to “prediction machines” — how far until “thinking machines”? Well, this is a very hard prediction: nobody knows how far the target is and—even worse— nobody knows what path will lead us there. As hinted above, it seems highly unlikely that “bottom-up approaches” alone or “symbolic reasoning” alone will suffice: how to effectively combine the two in full generality, however, is still a mystery. On the more “symbolic” part of the spectrum, a recent wave of incredible ideas, papers and tools in probabilistic programming promises to marry logic and probability in an unprecedented way: if you are curious on our own take on it, we recently published a long, opinionated and cognition friendly post with runnable code samples.

If A.I. history — which we so partially and subjectively reviewed—teaches us something, I feel confident in putting forward three suggestions:

For business people in A.I.: don’t panic. Sure, the A.I. market may slow down soon, deep learning may start plateauing and all this optimism in the press may freeze up a bit. While it’s true that a lot of “.ai startups” just “went with the flow”, many prediction machines that have been built in recent years are here to stay, and many more are yet to be invented. Sure, we may want to revisit the whole “Artificial Intelligence” label at some point, but that does not mean we shall trow away the baby.
For researchers of A.I.: go back to basics. A.I. was built on an interdisciplinary effort to understand cognition; funnily enough, most of the original ideas of deep learning come from psychology journals (e.g. Hinton 1985, Rosenblatt 1958). While it’s hard to pinpoint a roadmap now and almost impossible to see how the pieces — computer science, psychology, linguistics, neuroscience, logic — fit together, tackling a very hard problem from different angles can’t hurt our chances.
For investors in A.I.: lead innovation, don’t just follow it. Money has been flowing consistently in the industry, and particularly in deep learning startups. However, all this money for a narrow view of A.I. is likely to get us stuck in a “local minima”, while other promising approaches, further removed from the hype, get less attention. And this is not just bad for science: if you’re scouting for the nice outliers in those power-laws of returns, as the market for “prediction machines” gets crowded, your best bet for the next big thing may well come from a new approach entirely.

And what about A.I. startup founders (if anyone cares about them at all)? Well, while planning ahead for true machine intelligence, we should probably just continue to pursue more humble and mundane milestones in our everyday quest. Our inspiration and guidance will be no less than Alan Turing himself:

“We can only see a short distance ahead, but we can see plenty there that needs to be done.”

Let’s never forget it’s up to us to get it done.

See you, space cowboys

If you have question, feedback or comments, please share your A.I. perspective with jacopo.tagliabue@tooso.ai.

Don’t forget to get the latest from Tooso on Linkedin, Twitter and Instagram.

Acknowledgments

Thanks to the entire Tooso team, Stefano Pacifico and Davide Romano for comments on previous drafts of this post.

Some of these points have been discussed at greater length in our previous Medium story on learning concepts and during our November A.I. talk: we would like to thank organizers and participants of the Future of AI meetup for their enthusiastic contributions and helpful feedback on our perspective.

Miscellaneous notes and more readings

Obviously any industrial scale spam filter would employ much more sophisticated ideas that the naive Bayes classification explained above, but up to these days it’s still a very simple and effective baseline algorithm with great pedagogic value. For data science newbies and non-lazy readers, there’s a chapter here that is really good.
The Microsoft deep learning Twitter bot can be found here: we learned about it while listening to a fantastic talk by Josh Tenenbaum, which touches on many interesting points re: learning, probabilistic programming and model-based reasoning.
We talked at length about Josh Tenenbaum recent works in “bayesian learning” in our recent post on concepts. If you want to start somewhere, Probabilistic Models of Cognition is a truly amazing book.
We talked at length about Fluid Concepts specifically and Douglas Hofstadter in general in our recent post on concepts. Some short articles aside, it feels like Hofstadter mostly disappeared from the scenes: ci manchi Douglas!
We glossed over tons of academic stuff that is relevant for the topics at end: without any presumption of completeness, you can get a bunch of classic (e.g. Fodor, Pylyshyn, 1988, Pinker, Prince, 1988) and more recent stuff in the references below (e.g. Darwiche, 2017, Marcus, 2018, Pearl, 2018). Lake, Baroni, 2017 is a recent work in NLP we liked a lot; for some ideas on how to marry inductive biases with deep learning, there’s a nice paper by people at Google Brain, Deep Mind, etc. (see ref below). While some folks have been recently using twitter to debate these and similar topics, a lot of these arguments really go back decades (as we pointed out, some fundamental things have not yet changed).
Turing-award winner Judea Pearl has been for years a vocal advocate of “casual models”, i.e. the idea that true intelligence cannot be achieved without the ability to reason about causal processes and make counterfactual judgment (“what would happen if…”). Similar ideas have been proposed by cognitive scientists (see Lake, Ullman, Tenenbaum, and Gershman, 2016 for an overview), particularly stressing the “compositionality aspect” of human learning, i.e. the ability to build complex concepts re-using simpler ones (Lake, Salakhutdinov, and Tenenbaum, 2015). Model-based learning is the main topic of our post on probabilistic programming, which contains commented code samples and additional references for deeper exploration.
The expert reader may have noticed that we used “representations” somewhat freely (sloppily?) in our discussion; in particular, we highlighted “explicit representations” as a key component of human-like intelligence, but we did not comment at length on representations that have been crucial in deep learning empirical successes. The difference between the two types is more evident when language is involved (because 1) humans use language between themselves to communicate and 2) non-trivial compositionality is the essence of language), but it is certainly a genuine difference across all A.I. tasks. For the conceptually inclined reader, Fodor, Pylyshyn, 1988 is a landmark discussion of representational mental states in symbolic vs. neural architectures.

References (lazily formatted)

Agrawal, Gans, and Goldfarb, 2018, Prediction Machines: The Simple Economics of Artificial Intelligence
Darwiche, 2017, Human-Level Intelligence or Animal-Like Abilities?
Google Brain People et al., 2018, Relational inductive biases, deep learning, and graph networks
Fodor, Pylyshyn, 1988, Connectionism and cognitive architecture: A critical analysis
Halevy, Norvig, and Pereira, 2009, The Unreasonable Effectiveness of Data
Hinton, 1985, A Learning Algorithm for Boltzmann Machines
Hofstadter, 1995, Fluid Concepts And Creative Analogies: Computer Models Of The Fundamental Mechanisms Of Thought
Marcus, 2018, Deep Learning: A Critical Appraisal
McCarthy, 1990, Review: Roger Penrose, The emperor’s new mind
Minsky, 1988, The Society of Mind
Lake, Baroni, 2017, Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks
Lake, Salakhutdinov, and Tenenbaum, 2015, Human-level concept learning through probabilistic program induction
Lake, Ullman, Tenenbaum, and Gershman, 2016, Building Machines That Learn and Think Like People
Pearl, 2000, Causality: Models, Reasoning and Inference
Pearl, 2018, Theoretical Impediments to Machine Learning With Seven Sparks from the Causal Revolution
Pinker, Prince, 1988, On language and connectionism: Analysis of a parallel distributed processing model of language acquisition
Rosenblatt, 1958, The perceptron: A probabilistic model for information storage and organization in the brain