Philosophy for Data Scientists

Intro to Post-Structuralist French Philosophy for Data Scientists (Part II)

Derrida on data, context, representation, and the nature of Being

Travis Greene
Towards Data Science
20 min readDec 9, 2020

--

Yes, another cute bulldog in front of a wall spray-painted with the colors of French flag. Photo by Karsten Winegeart on Unsplash

In this post, we’ll continue where we left off in Part I. We’ll now focus on Derrida’s critique of structure (he’s a “post-structuralist”) and his account of context and signification. I’ll also give a brief refresher on some key historical figures (Nietzsche, Freud, and Marx) so that Derrida’s ideas can be placed in a more intelligible context. I’ll intersperse some interludes relating Derrida’s ideas to data and data science throughout.

This post will proceed in a Derridean fashion. Why? For at least a couple reasons. First, it’s likely the ideas will need time to sink in before they make any sense. The pull of binary logic is strong in Western thought. Second, perhaps by restating his ideas in multiple ways, at least one of them will resonate with you. As Derrida himself would claim, there is no “final” authoritative understanding of Derrida. His thoughts, in different temporal and geographical contexts, will always be open to a multiplicity of meaning.

Why Post-Structuralist? What’s Structure?

What is structuralism? Philosopher Simon Blackburn paints an anthropological view of structuralism as:

“…the belief that phenomena of human life are not intelligible except through their interrelations. These relations constitute a structure, and behind local variations in the surface phenomena there are constant laws of abstract structure.”

Influenced by the structural linguistics of Ferdinand de Saussure, French anthropologist Claude Lévi-Strauss explored the universals of human culture, such as kinship structures (matrilineal or patrilineal? communal or individual?) and their corresponding effects on taboos of incest. A similar interest in structure inspired work on abstract mathematical structures by the Bourbaki group in the 1960s. The notion of structure has had a lasting influence on sociology, too, tracing back to the work of Emile Durkheim and Karl Marx. More recently, Anthony Giddens developed “structuration theory” analyzing social systems as self-regulating, invisible structures constantly shaping and constraining the social behavior within them.

Generalizing, we can say the following. Structure is what allows patterning, and thus regularity, to be observed against a fixed background, providing a center from which all elements are oriented. At this level, structures permit and define the identities of elements which, taken together, constitute the structure. Indeed, the identity of the elements cannot be determined except by reference to the element’s relations to other elements. Let’s look at these ideas more closely.

Structures: Self-regulating Systems Closed Under Transformation

Jean Piaget gives a nice treatment of structuralism that may resonate with data scientists. Piaget says structures are different from mere aggregates and have three necessary properties:

  1. Wholeness: Although composed of individual elements subject to unique laws, the elements together define the structure as structure. The system does not rely on elements external to it.
  2. Transformation: Elements within the structure can be transformed according to sets of laws which define the nature of the transformations.
  3. Self-regulation: Transformation of elements never yields results external to the system. In other words, all transformations of elements lead only to elements internal to the system (in mathspeak: the structure is closed under transformation). Structures self-preserve themselves, though they may evolve when elements are transformed within them.

So with that background, we can understand what exactly Derrida was reacting against and why people call him a “post-structuralist.” He is rejecting the idea that there is only one structure (Western logic/metaphysics) or only one way of ascribing identity to an event or thing. Remember that Wittgenstein claimed in his Tractatus that logic was the (only possible) description of that very structure. If we go outside the structure, we must remain silent.

Derrida, however, argues that we unnecessarily limit ourselves if we think this way because we, as humans, possess the power to define new structures (and corresponding identities) at will. Let’s now look at some key influencers of Derrida before we jump into his actual writings.

Photo by Evan Dennis on Unsplash

Forebears of Derrida: The Three Masters of Suspicion

When first approaching Derrida it helps to have a basic familiarity with the ideas of those who have influenced him. Nietzsche, Freud, and Marx have been referred to as the “masters of suspicion” for the way in which they have questioned the ideological bases and methods of Western knowledge production. Derrida will combine these ideas in a rather idiosyncratic way to critique the foundations — the axioms, so to speak — of Western logic. One general concern is whether logic is a description or an idealization of thought. If it is a description, then whose description does it correspond to? If it’s an idealization, then whose ideal is it? For more background see Part I here (Hegel & Foucault).

Nietzsche

All three thinkers shared a similar concern with a kind of “false consciousness.” Let me explain. For Nietzsche, our inherited (Judeo-Christian) moral values turn out to be contingent and not necessary truths, based on historical contingencies, such as who defeated whom in battle. Truth and power were two sides of the same coin. Everything we believed to be true was, in reality, an expression of the will to power (in Foucault’s words, “the will to truth”). The powerful use the linguistic guise (or “discourse”) of truth to hide and close off questioning about the origins of their power. The exquisite skeptic, Nietzsche claimed there were no facts, only interpretations. If God was dead, and God’s intentions contained the truth behind the meaning of things and their names, then the truth could now never be known. Truth is a mere rhetorical device clothed in metaphysical garb, given to us in order to preclude an infinite regress of skepticism about its foundations.

Freud

Freud’s contribution is slightly different. Based on a kind of “hydraulic model” of the mind, his work revealed the ego’s battle against the id in his book Civilization and its Discontents. Freud explained that what we thought were our “true” desires and needs were actually symbols of a much deeper, repressed need for something else (typically unresolved sexual urges). There was a vast unconscious network of impulses scurrying around in our minds of which we were, of course, unaware. The neurotic person was one whose ability to repress these animalistic urges was less than perfect. Freud’s ideas of powerful unconscious drives went against the influential ideas of Plato and Aristotle, where man was essentially rational and his virtuous soul was kept in neat and proper order by reason. We are not pre-formed Cartesian subjects capable of achieving perfect self-knowledge, but incomplete fractured entities. Our incompleteness as subjects puts limits on our objects of knowledge.

Marx

Marx implores us to take up the skepticism of Nietzsche and see how those in power have used an ideology of truth to repress and alienate those without power from their true potential as humans. Marx’s focus was on liberation: on freeing persons from their ideological shackles — the institutions, norms, and practices of a society — that blinded them to their true nature. The idea of false consciousness in Marx relates to our mistaken pursuit of our interests, when in fact these “interests” reflect the goals and needs of the powerful ruling classes (note the connection with recommender systems: they claim to recommend what we are “interested in” but actually reveal what the company is interested in selling). We have confused appearance with reality. If we could transcend this state of false consciousness, as the movement of the Hegelian dialectic supposes, we could finally realize our true interests. The ruling classes are those with the means of material production (e.g., economic capital) and this is what grounds ideology through a process of reification, or a “naturalizing” of what is accidental and contingent and treating it as necessary and fixed.

Photo by Dan-Cristian Pădureț on Unsplash

Exposing the Internal Contradictions of Western Metaphysics

The literary technique of Deconstruction is famously associated with Derrida. Roughly put, the idea is that structures contain the seeds of their own destruction. Western thought, obsessed with reason since Plato, has unknowingly relied on a system of rhetorical metaphors masquerading as reality. It has imposed a way of thinking, a structure to thought (logic), and defined it in such a way that anything not following its rules is therefore illogical. Deconstruction broadly seeks to reveal the dominant rhetorical devices implicit in Western philosophical thinking, and encourage us to instead view what was previously fixed and solid as flexible, capable of infinite play through the creative interpretation of signs.

Here’s a metaphor of my own to help make sense of things. Imagine pulling a single thread from an old sweatshirt and the sweatshirt instantly unravels, revealing it to be nothing more than some loosely held-together threads. Pulling the thread is analogous to exposing the internal contradictions of the structure. In this case, structures are systems (“totalities”) of Western thinking which put (Western) logic and (Western) reason at the top of the metaphysical hierarchy. The contradiction is that what they assume to be necessary is in fact contingent. The tunnel vision of Western thinking, and its assumption of the possibility of absolute knowledge, prevents it from ever achieving absolute knowledge!

By systematically excluding other kinds of knowledge from its canon, it is necessarily forced to fail in its goal of attaining complete knowledge.

A Binary Misinterpretation of Derrida: Us vs. Them

Derrida’s work itself is a performance piece against the totalization of Western metaphysics in modern life. Yet today some people use his ideas to promote a totalization of academic thought in the name of social justice. It’s true though, his ideas were deeply shaped by his outsider experience as an Algerian-born Sephardic Jew growing up in Post-World War II France. You have to remember he came of age during the 1968 Student Movement.

In the modern political sphere, there seems to be an intuitive appeal for grand-narratives (filtering historical events to fit a simplified causal arc) and binary thinking. Derrida would have been repulsed by the binary, us vs. them logic used to stifle opposing, marginalized views. Derrida might argue that America is experiencing something similar to a Nietzschean transvaluation of values, expressed as a will to power under the banner of social justice. In response to calls for social justice, Derrida would have said something like “Whose Justice, Whose Rationality?” (see Alisdair MacIntyre’s 1988 book of the same title). We should be wary of anyone calling to stifle the voices and perspectives of others.

Binary logic is a logic of power: it makes it exceedingly easy to exclude, to round-off. But reality is inherently messy and complex. Binary logic is at its most repressive when forcing something which sits perfectly between two poles to move completely to one or the other. Derrida wants to focus our attention on the liminal space between quantized values: to the analog continuity of being rather than digitally discrete Being. We gain from recognizing, not ignoring, the complexity of Being.

Remember Hegel and his dialectic of synthesis-antithesis? Well, we might interpret Derrida as doing something similar: by accepting that things may be both A and not-A (perhaps at a later time) we open ourselves to the understanding of an even greater unity, where we see each prior instance as a limiting case of something grander and more abstract.

Derrida also was deeply impressed by cybernetics and information theory. We must at some point round off measurements of real numbers in order to do anything useful with them. We must compress them. Imagine if you had to send someone all the digits of pi! But when this rounding process is repeated over and over, this eventually leads to unpredictability. It is a fact of our existence as finite beings (i.e., we don’t have infinite storage capacity) that “chaos” will eventually present itself as the unpredictability of systems. In short,

Binary logic is great at compression, but bad at truth. Truth is nice, but could take forever to transmit.

Similar to how Gödel and Turing showed the limits of logic using the tools of logic itself, Derrida sets out to deconstruct the totality of Western metaphysics — using the very concepts of Western metaphysics. If Derrida can reveal the illogic of logic, if he can introduce doubt as to its foundations, then he provides a motive for change.

The Dilemma of Continuous vs. Discrete, Analog vs. Digital

Derrida is really getting at fundamental metaphysical questions about the nature of reality. Derrida and Gilles Deleuze were fascinated by a tension central to information theory: how do we balance our needs for compression with truth?

If we compress too much, we lose essential structure and become susceptible to the effects of noise. We might think redundancy can shield us against the harmful effects of noise, but this shield costs something — namely, transmission time and storage space.

To virtually guarantee you receive my message correctly, I could send you 1000 extra copies. But then you’d have to be able to store all of them and they’d take a while to send. By the time you receive them, they may no longer be useful! (It’s kind of like when you discover you never responded to a friend’s Facebook message from six months earlier and you realize it would be kind of useless to respond to “Hey, you want to hang out this weekend?” half a year later, so you don’t respond).

Likewise, I could send you a single, perfect copy of my message, but if I’m unlucky and something happens to my message, you’re not going to get the exact message I sent. It’s brittle without the shield of redundancy.

Logocentrism, Science, and The Metaphysics of Presence

Much like the Frankfurt school Critical Theorists who decried the effects of following instrumental reason to its logical conclusion — leading to the horrors of Auschwitz — Derrida’s task is to protest the dominance of Western metaphysical thinking, passed down since Plato’s theory of the forms. He calls this Western bias towards the use of reason logocentrism.

Logocentrism is the striving inherent to Western metaphysics towards totality, the idea that subject can reach perfect self-identity with object. If we achieve a complete description of the world, then the identity of knower and known dissolves and there is perfect transparency between subject and object. Objects are unmediated by any kind of re-presentation (they are fully present to the subject!). Knower and known are indistinguishable. Logocentrism is this quest for perfect self-transparency between knower and known, subject and object. It is the immediate and unmediated presence to consciousness of eternal and immutable essences — these are called Eidos in Greek.

If mind acts as a mirror of nature, then logocentrism is the striving towards making this mirror as transparent as possible via the logical reduction of a complex reality. If logocentrism were successful, then our internal representations of an external reality would be identical with reality and we would know the mind of God.

Even today we find traces of logocentrism in our most modern scientific theories of the universe. Here is Stephen Hawking in A Brief History of Time:

If we do discover a complete theory, it should in time be understandable in broad principle to everyone, not just a few scientists. Then we shall all, philosophers, scientists, and just ordinary people, be able to take part in the discussion of why it is that we and the universe exist. If we find the answer to that, it would be the ultimate triumph of human reason — for then we would truly know the mind of God.

Appearance and Reality

We have now come back full-circle to the age-old philosophical problem of distinguishing between mere appearance and reality. We now see why Freud, Marx, and Nietzsche are referred to as the Masters of Suspicion and why they had such an impact on Derrida. Each of these thinkers made the startling claim that what we naively assumed to be reality was merely appearance.

Two related questions now arise:

  1. When does a thing under a different representation stop being that thing?
  2. What distinguishes a thing and a representation of that same thing and two separate things?

The Nature of Uncertainty in Probability and Statistics

Derrida’s ideas have even deeper implications for the concept of uncertainty. Now, the subject of uncertainty has a complex past and there are differing views about what it is. Neural network pioneer Bart Kosko, for instance, says we need fuzzy logic on the basis of a distinction he makes between epistemological and metaphysical uncertainty. The crux of the debate is whether uncertainty is located in objects themselves (fuzzy logic), or in observers of objects (subjective Bayes).

Metaphysical uncertainty: The identity of something is not indeterminate due our lack of knowledge about it. The object itself is intrinsically fuzzy. No amount of information we receive can change this! Roughly put, fuzzy logic is a mathematical framework where elements in fuzzy sets have membership functions expressing the degree of membership in various sets (but these degrees don’t need to sum to 1). Traditional set theory (i.e., ‘crisp set theory’) is, from the point of view, just a special case of fuzzy set theory in which membership functions can only take on the two values of 1 or 0.

Epistemological uncertainty: The identity of something is determinate, but we lack enough information to settle the question completely. This is broadly the subjective Bayesian view of uncertainty. Bayesian statistician Dennis Lindley argued that fuzzy logic was unnecessary because the only kind of uncertainty that mattered was epistemological uncertainty, and Bayes’ formula gives us all the tools we need to deal with it.

Here are a couple geometrical examples to illustrate some of these questions. They have to do with identity and difference and they challenge the binary logic essential Western scientific thinking.

Sine & Cosine as 2D Projection of a 3D Unit Helix

Are sine and cosine really just 2D projections of a 3D unit helix? Derrida would say there’s no answer to this question. There are no truth conditions for definitively settling the issue: we choose one representation when doing so suits our particular problem at hand. This is effectively the neo-pragmatist position on truth taken by philosopher Richard Rorty. Derrida would claim this is a case of metaphysical, as opposed to epistemological, uncertainty.

Back to Plato’s Cave: are sine and cosine really just 2D projections of a 3D unit helix? Source: Reddit.

Triangle Inequality

Here’s another geometric example illustrating some problems of perspective and identity. When the interior angles approach 0, the length of z approaches x+y. When the left and right interior angles of this triangle get to 0, the triangle collapses into a line. At that point, how would we distinguish between a line and a triangle? Is the figure a line, or is it a special case of triangle with 0 degree angles? Is it neither? Both?

The Triangle Inequality. Source: Wikipedia.

Liminality and The Logic of Classification Accuracy

A confusion matrix showing four possible outcomes of binary classification. Note there are two (FP, and FN) ways to make an error and one may be much worse than the other in social or ethical consequences. Source: Wikimedia.

Anyone who has spent time working with highly imbalanced datasets knows accuracy is a bad metric to evaluate a classifier’s performance. The reason is of course that when one class is, for example, 10X more common than the other, it’s very easy to achieve high accuracy by simply predicting the most common class in every new case. Assuming my test set is drawn from from the same distribution as my training set, if it has 100 observations and I know from the training set that 99% them are negative, then I can simply predict negative for all test cases to achieve 99% accuracy.

The problem in using accuracy to measure predictive performance is that it assumes all errors (FP/FN) have uniform cost. If we ask of our predictive algorithm that it maximize accuracy, it will do this. This is the double-edged sword of Adorno & Horkheimer’s instrumental reason at its finest. But the way it does so may not align with what we expect.

So what will happen? The algorithm will trade-off bad predictions of the minority class for good predictions of the majority class! After all, overall accuracy is simply a weighted average of the accuracies of each individual class weighted by the proportion of training examples in the class. Because minority class predictions occur less often, the algorithm wastes no “effort” in correctly classifying them. In short, it merely does what any instrumentally “rational” agent might do when asked to maximize some objective.

Simplicity of Binary Classification & Performance Metrics in ML

Derrida would not like the logical simplicity of many ML classification tasks. Reality moves, grows, changes: there is no black and white — only varying shades of gray.

Digital representations work through compression, by rounding or capping a continuous input signal into a discrete number of bits (assuming log base 2). We can see why digitalization of human experience is problematic: digital circuits might store a continuous voltage of 1 if inputs exceed some value, and 0 if they fall short of some threshold. But what happens to all those analog values that, by chance, fall between these two margins?

Derrida is asking us something similar here. What would happen if we didn’t “round off” our complex social reality to make gender and ethnic identities fit some discrete values? We set classification thresholds, compute confusion matrices and corresponding ROC curves. Derrida would point out that while some particular cutoff threshold would maximize our ROC curve, given a different context another cutoff might be better. There is no “single” best cutoff: it all depends on how it’s applied.

Derrida drew on ideas from logic, physics, and information theory to argue that there was no final truth about the meaning of a given text outside of a given frame of reference. Photo by Jason Leung on Unsplash

With that general overview of themes, let’s dig into some actual work of Derrida. The discussion below draws mostly on ideas from his book Limited Inc.

Derrida and the Problem of Context

Derrida is perhaps most famous for his insistence that it is a precondition for the possibility of communication that there be no absolutely determinable contexts of signification or meaning. Derrida sought to jettison the ancient Greek/Platonic ideal of logos, that there exists some pure form of intelligibility in a concept or event — that an event could ever be understood outside of a frame of reference, from God’s point of view. Plato, for example, held that logos is what accounted for the ability of particular cases (tokens) to refer back to general, abstract forms (types). Tokens are understood as tokens because they partake in the general form or Eidos.

Semantic Communication

For Derrida, any kind of semantic communication through the written mark (what he refers to more generally as writing) implies an absence of presencing (meant in the Heideggerian sense of truth as aletheia, as disclosure or uncovering of Being, see my post here for more details) when the mark is received and “decoded.” Yet the original presencing must include the conscious state of the sender of such a message. But once this state is signified via the written mark (once this presencing has been re-presented in its written, symbolic form) it is dead and must now function in the absence of the original intention of its author.

Derrida says “representation supplants presence,” or carries or stands in for what originally was present. The act of writing itself brings a “rupture” between it and the context of its production which are “inscribed” in the written mark. Yet we can nevertheless communicate without knowing who the author of the mark was or what the “collectivity of presences” inscribed in the mark were. Derrida is not trying to eliminate the notion of meaning, but instead point out a problem in an account of semantic communication he traces back to the ancient Greeks.

Photo by Mark König on Unsplash

Detaching Sign from Signifier

Upon the moment of reception of a sign, the context — including the conscious experience of the receiver — has changed. Space and time have passed between the original authorial act of inscription and its reception as text. But Derrida argues that there is no need to posit, as a traditional correspondence theory of truth might, that the referent of the mark (in its productive context) must be present in order for the mark to be made intelligible by its receiver. Indeed, the absence of its original author is a source of creative strength for the text.

In fact, that the signified (referent) and signifier (the written mark) can be detached is what permits communication as such. If the mark could only refer to the original singular moment of its inscription, then by definition it could never be iterable. But since marks are indeed iterable, it follows that whatever the receiver receives, is not quite what was sent.

To illustrate this point, Derrida adduces the citation. The effect of citation is to cut off the “vouloir-dire” of a mark from its “given context,” and permit the “engendering of an infinity of new contexts in a manner which is absolutely illimitable.” In other words, I can cite anyone out of context and thereby produce a new context. I can do this an infinite number of times and thereby generate an infinity of possible meanings for a set of marks.

This phenomenon is what’s behind the decontextualized “sound bite” and also seems to drive the creative power of open-source libraries/packages in various programming languages. The original author of package created these functions to solve some particular problem, but I can find a use for the same function in solving my own, different problem.

Today I would argue that our digitally recorded behaviors are essentially “behavior bites,” extracted from their originary moments of intention and repurposed and interpreted in a way useful for data collectors.

Photo by Oleksandra Bardash on Unsplash

Iterability of Writing and Chomsky’s Discrete Infinity

This functioning in differing contexts depends on the “iterability” of the sign. The iterability of writing is a precondition for the existence of writing. Iterability is what ties repetition of identity to difference (“alterity”). For it is through replication that identification is even possible. Re-cognition assumes cognition: regularity cannot occur prior to the ascription of identity to events. But it is through a kind of gestalt switch, or re-framing, that suddenly the identity of a repetitive event is cast in a new light. I am reminded of the difference between discrete and continuous numbers, between analog and digital Being.

Derrida’s concepts of citation and iterability have an affinity with Noam Chomsky’s characterization of the “discrete infinity” of human language, summarized in his New Horizons in the Study of Language and Mind:

Human language is based on an elementary property that also seems to be biologically isolated: the property of discrete infinity, which is exhibited in its purest form by the natural numbers 1, 2, 3, . . . Children do not learn this property; unless the mind already possesses the basic principles,
no amount of evidence could provide them. Similarly, no child has to
learn that there are three and four word sentences, but no three-and-a
half word sentences, and that they go on forever; it is always possible to
construct a more complex one, with a definite form and meaning. (pgs. 3–4)

Photo by ev on Unsplash

Behavioral Data as Public Signs

What does Derrida got to do with data science? Derrida’s ideas help explain how our digitally recorded behaviors (our personal data) become detached from conscious, intentional presence (our subjective experience online), thereby becoming digital “public signs’’ which can be algorithmically manipulated and combined in novel ways by recommender system designers.

Derrida is not saying that the notion of “meaning” can be eliminated. He is claiming that meaning is irreducible to what appears to us in a text because distanciation between author and reader severs us from the original authorial moment of intention. This act of distanciation, however, gives us an opportunity for creative play in advancing new and original interpretations of signs.

Digital distanciation opens the question of whose interpretations count and how these interpretations are put into action and applied to real people

For example, many digital platforms use recommender systems (and increasingly reinforcement learning) to determine the content and format of our online experiences. But on what interpretations do these recommendations rest? The downside of digital distanciation is that we can now never know the original “true’’ intention in the absence of the original act of the data subject, if such a thing existed. What did you really mean when you clicked the “buy now” button? Did you intend to buy it, or was it a mistake?

As long as data subjects do not possess legal rights to enter into dialogue with data collectors, they cannot negotiate the meaning of their digital behaviors. Data collectors, in an effort to save money and time, choose to re-present and not present the author of the data to account for his behavior. To me, however, it’s not clear why this must be so.

Why should others get to one-sidedly interpret a representation of my action in my absence?

GDPR: Personal Data Rights for Dialogue and Interpretation

If we follow the EU’s General Data Protection Regulation (GDPR), we let the data subjects themselves decide what their digitally recorded behaviors mean. If you’re interested in reading more and learning about the philosophical foundations of the GDPR, you can check out our paper Beyond Our Behavior: The GDPR and Humanistic Personalization.

--

--