What can Philosophy teach Machine Learning?

A Journey from Socrates to AI via Cognitive Science

Federico Castellano

Published in

Towards Data Science

9 min readDec 7, 2018

'Scuola di Atene' by Raffaello. Source: Pixavay

From Socrates to Cognitive Science

Since Socrates asked Thrasymachus for a definition of the concept of justice, philosophy posed for the very first time one of the most challenging philosophical questions: what is a concept? For many hundred years, inquiries concerning the nature and structure of concepts caught the attention of the world’s finest minds; yet it wasn’t until the sixteenth and seventeenth centuries that such inquiries fully flourished by the hand of two rival philosophical traditions: Empiricism and Rationalism.

Empiricists argued that concepts are a sort of pictures or images in the mind. According to this view, the concept of dog amounts to a mental picture or image of a prototypical dog. The concept of justice, in turn, amounts to a combination of the mental pictures or images that we typically associate with the things and events we take to be fair.

On the opposite side, rationalists argued that concepts should not be understood as isolated mental pictures. On the contrary, they claimed that concepts are more like interconnected nodes in a massive inferential network.

Alonso de Proaza’s illustration of the Porphyrian tree (sixth-century tree representing Aristotle’s categories) in his work "*De logica nova"* (1512). It illustrates one of the first attempts to build an inferential network.

The rivalry between empiricists and rationalists derives from a previous and more fundamental disagreement about the very nature of thought and knowledge. For empiricists, having a concept amounts to having the capacity to perceptually recognize and classify objects by virtue of mentally computing all the perceptual features that such objects have. For example, having the concept of dog –and, hence, having thoughts and knowledge about them– amounts to having the capacity to discriminate dogs from things that are not dog based on the perceptual features that such objects typically have –hence its name (‘Empiria’ means experience in Ancient Greek).

For rationalists, in contrast, having a concept involves a more demanding cognitive capacity, i.e., the capacity to rationally draw all the conclusions that inferentially follow from it. So, for example, having the concept of dog –and, hence, having thoughts and knowledge about them– involves being able to infer that dogs are mammals, that mammals are animals and so are dogs, that animals are different from plants and so are dogs, that plants and animals are living beings and so are dogs, and so on.

A graph representing a small inferential network

Currently, most discussions about concepts are framed within the cognitive science approach. According to this approach, minds are analogous to computers. So, thinking is understood in terms of computations over representational structures in the mind (see Thagard, 2018).

The cognitive science approach inherited many concepts from early modern philosophy. In fact, Empiricism and Rationalism laid the foundations of the cognitive revolution. So, it is not surprising that, for years, many cognitive scientists were involved in a long debate between two rival theoretical frameworks: on the one hand, the so-called concept-empiricism, concept-atomism, of just the representational view of concepts; on the other hand, the so-called conceptual-role semantics, inferentialism, or just the pragmatist view of concepts (see Margolis & Laurence, 1999). The former argues that concepts are either sets of perceptually based semantic features (mental pictures) or some sort of linguistic-like mental words. In both cases, concepts are conceived as isolated mental representations. The latter, in contrast, argues that concepts are not mental representations but sets of inferential capacities. According to this view, the meaning of a concept lies in its inferential relationships to many other concepts.

It goes without saying that the disagreement between the two parties reproduces the same disagreement between Empiricism and Rationalism. For, what really is at stake here is a disagreement about the very nature of cognition, i. e., a disagreement between those who think that the entire cognitive architecture is ultimately dependent on computations over isolated sets of features, and those who think that thinking is fundamentally a matter of computing over a massive network of inferentially interconnected nodes.

Robert Fludd’s microcosm diagram of the mind, in his work "Utriusque cosmi maioris scilicet et minoris metaphysica, physica atqve technica historia" (1619).

From Cognitive Science to Machine Learning

You are probably wondering what all this has to do with machine learning and AI. Well, it actually does a lot. Artificial neural networks are connectionist systems. Connectionism is a frame within the cognitive science that aims at modeling mental phenomena entirely in terms of patterns of neural activation. Despite breaking away from early computational models of the mind for which thinking is just computing over symbolic mental structures, the connectionist model borrowed many ideas from the representational theory of the mind, including the empiricists (atomists or just representational) view of concepts (see Fodor & Pylyshyn, 1988).

So, for connectionists concepts are feature vector representations. A feature vector representation is a vector that represents a particular object or class in a feature space. So, for example, the concept of dog is nothing but the vector of feature activities that represents the class ‘dog’.

Illustration of a feature vector representation of the concept of dog

Although the connectionist approach to concepts has proven to be very powerful, it also has its limitations. Current machine learning algorithms are really good at performing many cognitive tasks that we typically associate with concepts such as recognizing stuff, finding correlations, classifying objects, memorizing patterns, encoding and retrieving information, etc. However, we usually feel that, to a large extent, these algorithms fall shots of modeling real human cognition. As suggested by D’Mello et al. (2006):

Machine learning often requires large, accurate training sets, shows little awareness of what’s known or not known, integrates new knowledge poorly into old, learns only one task at a time, allows little transfer of learned knowledge to new tasks… In contrast, human learning has solved many of these problems, and is typically continual, quick, efficient, accurate, robust, flexible, and effortless.

I believe that many of the limitations that machine learning algorithms face right now are caused, in part, by the absence of an integrated conception of conceptual cognition. Driven by the empiricist's spirit lying at the bottom of the representational theory of mind –which is the default position in cognitive science– the connectionist (or neural network) models have devoted too much attention to feature vectors activities, leaving the inferential relationships between concepts completely out of the discussion.

Although there have been some serious attempts to account for conceptual knowledge in terms of relational graphs representations, very little has been done in order to implement such structures in neural networks. Fortunately, in the last few years, several studies linking graph theory with neural networks have come out with very interesting results. These studies come in different flavors. Currently, two of the most important projects on the matter are the Relational Neural Networks (RNN) (see Battaglia et al., 2018) and Graph Convolutional Networks (GCNs) (see Kipf & Welling, 2017). Both lines of research are promising, yet there is still a long way to go.

Illustration of a multi-layer Graph Convolutional Network (GCN). Source: THOMAS KIPF’s "GRAPH CONVOLUTIONAL NETWORKS", URL = <https://tkipf.github.io/graph-convolutional-networks/>

From Machine Learning Back to Philosophy

When faced with two or more rival theories, we usually feels the need to choose between one of them. Although many times this is the right thing to do, it is not so right when it comes to theorizing about cognition. The alleged rivalry between those who think of concepts as feature vector representations and those who think that concepts are nodes in a relational graph is misleading. In my view, both theories are not rivals but, in fact, they need to work together to reach a richer and more realistic model of human cognition.

In 1781, Immanuel Kant published one of the most remarkable philosophical and scientific books ever written: the "Kritik der reinen Vernunft" (KrV). Among many other things, Kant realized that empiricists and rationalists were both right and wrong at the same time. For, according to Kant, conceptual knowledge is the result of both experiences (or intuitions in Kant’s vocabulary) and rules of inference (or concepts in Kant’s words) working hand in hand. Neither intuitions nor concepts on their own can lead anyone to learn anything about the world. Quoting a very famous line from Kant’s Kritik:

Concepts without intuitions are empty. Intuitions without concepts are blind.

I really do think that Kant’s theory of cognition may shed some fresh light upon current debates in the fields of cognitive science and artificial intelligence. In particular, I believe that the quote cited above may apply perfectly well to the debate between the feature-vector and the inferentialist approaches to concepts. For, inferential networks without feature vector representations are empty, and feature vector representations without inferential networks are blind. Let me expand on this idea a bit further.

Source: <https://medium.com/@rgrydns/kant-how-is-a-synthetic-a-priori-judgment-possible-45af58688600>. Original from "*Philosophy for Beginners"*, by Richard Osborne, illustrated by Ralph Edney (New York: Writers and Readers Publishing, 1992), p. 104.

As stated above, the vast majority of current machine learning algorithms relies solely upon vectors of feature activities. These algorithms have been largely used for recognizing, classifying, and memorizing patterns from what is given as input. However, to do so they need to be trained with large amounts of accurate data and, once they have learned from a training set, they show little capacity to discover and integrate new knowledge from what they have previously learned. It looks as if machines were cognitively blind. They cannot help but reinvent the wheel every time they learn something new. These are serious problems.

Now, suppose that what a machine required for performing complex cognitive tasks were just computing over large relational graphs containing thousands and thousands of nodes interconnected inferentially. It is easy to imagine how it would be for a machine to discover and integrate new knowledge from what it has previously learned. It just requires computing the appropriate inferential connections that hold between a given concept, say ‘dog’, and many other concepts such as ‘mammal’, ‘animal’, etc. Yet, the machine would have still learned nothing about dogs. For, without feature-vector algorithms working on the background, it would be unable to recognize, classify, and memorize anything when provided with real (pictures, words, or whatsoever of) dogs as inputs. It is true that, eventually, the machine would learn that dogs are mammals, that mammals are animals, that animals are living beings, etc.; but at the same time, it would have truly learned nothing about any of those things. Their concepts would be just empty.

Wilfrid Sellars (1974), an eminent American philosopher and prominent defender of a neo-Kantian approach to cognition, used to distinguish three different kinds of conceptual responses:

Concept-entry responses: perceptual inputs trigger appropriate discriminatory/classificatory/recognitive conceptual responses.
Intra-conceptual responses: entry conceptual responses trigger patterns of valid inference with respect to other concepts.
Concept-exit responses: intra-conceptual responses trigger novel discriminatory/classificatory/recognitive conceptual responses.

According to this picture, at the entry level inputs are processed, recognized, and classified under concepts. Those conceptual responses trigger, in turn, inferential responses with respect to other concepts, many of which may have not been processed at the entry level. Finally, such inferential transitions may trigger novel recognitive/classificatory responses towards those concepts which have not been processed at the entry level, letting systems learn new things without being fully trained at the entry level.

Needless to say, this entry-intra-exit picture is an oversimplification of human cognition. Some concept-entry responses may lead concept-exit responses directly; concept-exit responses may work as inputs for new concept-entry responses; inconsistencies between concept-entry and concept-exit responses may lead the cognitive system to change or adjust patterns of inference between nodes, etc. What I want to call attention to here is the fact that conceptual cognition is a complex phenomenon which results from a very subtle interaction between different kinds of responses.

Conclusion

So, what can philosophy teach machine learning? Among other things, it can teach it that no real deep learning can be achieved without integrating into a unified picture (a) feature vector representations and (b) inferential networks. Artificial intelligence and machine learning won’t make much progress on modeling human cognition until this is fully acknowledged.

References

Battaglia P. et al. (2018), “Relational Inductive Biases, Deep Learning, and Graph Networks”, arXiv:1806.01261v3 [cs.LG].
D’Mello, S. K., Franklin, S., Ramamurthy, U., and Baars, B. J. (2006), “A Cognitive Science Based Machine Learning Architecture”. AAAI 2006 Spring Symposium Series. American Association for Artificial Intelligence. Palo Alto, California: Stanford University.
Fodor, J., & Pylyshyn, Z. (1988), “Connectionism and Cognitive Architecture: a Critical Analysis”, Cognition, 28: 3–71.
Kipf, T. and Welling, M. (2017), “Semi-Supervised Classification with Graph Convolutional Networks”, ICLR.
Laurence, S., & Margolis, E. (1999). “Concepts and Cognitive Science”, In Concepts: Core Readings, E. Margolis & S. Laurence (eds.), pp. 3–81.
Sellars, W. (1974), “Meaning as Functional Classification”, Synthese, 27 (3–4):417–437.
Thagard, P. (2018), “Cognitive Science”, The Stanford Encyclopedia of Philosophy, Edward N. Zalta (ed.).