Can We Bridge the “Two Cultures” of Data Science?

The role of facts and values in science, Twitter wars, and how data science risks splitting into two cultures over ethics statements

Travis Greene
Towards Data Science

--

Photo by National Cancer Institute on Unsplash

In his 1917 lecture Science as a Vocation, German sociologist Max Weber argues that questions of fact are separate from questions of value. Given certain pre-specified ends, the task of science is to find the best means of achieving them. But asking which ends to achieve is a question of values, answerable only by philosophy or religion. Weber was not alone on this point. Eminent scientists and mathematicians, such as Henri Poincaré, espoused similar views about the incommensurability of science and values. This post examines the (im)plausibility of extending this idea to data science.

The emerging field of data science risks splitting into two opposing factions based on conflicting philosophies of science. One espouses — implicitly or explicitly — the idea that facts are and should be kept separate from values (itself an expression of value!). Let’s call them atomists. The other advances a view — implicitly or explicitly — that facts and values are and should be inextricable from one another. Let’s call them holists.

The problem is this: When thought leaders and established academics cling to an atomist philosophy of data science, data science qua science, risks isolating itself both from other disciplines, and from society at large. With little experience working on issues relevant to society, the next generation of data scientists will be trained to focus on applications and techniques relevant to business needs (e.g., ad optimization) rather than the social good.

If atomism takes over, data science risks being viewed by the public in a way that nuclear power is often viewed today: potentially very useful, but ultimately too dangerous to be practical. There might even be something like a NIMBY movement for publicly-facing machine learning applications. To combat this, the field of data science might borrow and learn from key principles of qualitative research, which I review at the end.

Data Scientists As Value-neutral Toolmakers

In the 1959, at the height of Cold War, and as the US military-industrial complex established itself, British scientist and writer C.P. Snow characterized a common view among practicing scientists. Asked about the role of the scientist in society, Snow imagined the scientist saying:

We produce the tools. We stop there. It is for you — the rest of the world, the politicians — to say how the tools are used. The tools may be used for purposes which most of us would regard as bad. If so, we are sorry. But as scientists, that is no concern of ours.

We can imagine the data scientist in the year 2021 saying something similar:

We produce the algorithmic tools that maximize click through. We stop there (after publishing our results in technical journals not designed to be understood by laypersons). It is for you — the rest of the world — to say how algorithmic tools that maximize click through should be used. Algorithmic tools maximizing click through may be used for purposes most of us would regard as bad (such as political polarization, radicalization, or social media addiction in young people). If this is the case, we are sorry. But as data scientists, this is no concern of ours.

Splitting Facts and Values: Disciplinary Divides

The idea that facts and values are independent — and should be kept so — has a long pedigree in academia, particularly in the “hard” sciences: physics, engineering, computer science, and math. The view goes roughly like this.

The scientist operating during a period of “normal science” is, in the words of Thomas Kuhn, a mere “puzzle-solver.” The scientist acts as a rational and competent optimizer of a pre-specified objective function defined by her field of study. She works in paradigm given to her by a broader community of scientists, complete with its own ontology, epistemology, and methodology. But does a scientific paradigm also require an axiology: a system of values justifying why certain methods are better than others, or why a certain taxonomy is better than another? Or are such meta-questions meaningless, outside the scope of science itself?

Kuhn imagined the typical scientist as working to put together pieces of a puzzle selected by a pre-established community of scientists. The puzzle is a metaphor for a worldview which informs all methods and interpretation of evidence within the paradigm (the puzzle itself). Photo by Markus Winkler on Unsplash

At least for a brief period in the 1940s and 50s, after witnessing the destructive power of atom bomb, the fact-value distinction seemed under attack by eminent scientists. J. Robert Oppenheimer, Albert Einstein and Norbert Wiener famously expressed moral concern about their scientific role in the creation of such a device. But the idea of value-free science, and correspondingly, the value-neutral scientist, has persisted to this day, particularly in academic fields where students receive little to no exposure to ideas in the humanities, sociology, and philosophy.

Bureaucratic Science and Compartmentalization of the Soul

The fact-value distinction is a manifestation of a much deeper symptom of Western thinking. Weber, Snow’s imagined scientist, and Simon’s view reflect a broader tendency in modern, liberal democracies towards bureaucratization. Bureaucratization describes a society in which the division of mental, moral, and physical labor is increasingly broken down into smaller and smaller fragments in the name of efficiency and technical competency.

In the late 19th century that syphilitic grandfather of postmodernism, Friedrich Nietzsche, famously announced that God was dead. Traditional sources of value, which guided human aims and gave meaning to communal life, were no longer viable. The economic logic of markets supplanted cultural and religious values and removed questions of teleology, value, or goals from the picture. Notions of utility replaced questions of moral goodness. Observable aspects of the cosmos could be explained by nothing more than the random collisions of identical particles following universal laws of nature.

Karl Marx, decrying this process of industrialization and globalization, would claim that it leads to alienation; for sociologist Emile Durkheim, the process leads to anomie. Marx saw the emerging market-ideology of the industrializing West as divorcing humans from their essential nature as social, creative, and expressive beings. Marx, of course, borrowed from Hegel’s teleological view of science and human reason. And even though he insisted on an unbridgeable division between facts and values, Weber lamented this trend towards narrowness and abdication of moral responsibility for technical competence as the “compartmentalization of the soul.” Without course correction, academic data science is poised to continue this trend.

Ralph Waldo Emerson summarizes this compartmentalization process of the academic in his classic 1837 speech, The American Scholar:

…Man is not a farmer, or a professor, or an engineer, but he is all. Man is priest, and scholar, and statesman, and producer, and soldier. In the divided or social state these functions are parceled out to individuals, each of whom aims to do his stint of the joint work, whilst each other performs his….The state of society is one in which the members have suffered amputation from the trunk and strut about so many walking monsters, — a good finger, a neck, a stomach, an elbow, but never a man.

Despite these critiques by Marx, Emerson, Snow, and others, proponents of this more holistic philosophy of science might have defended its validity on the basis of its practical successes in terms of technological applications. After all, look how far we’ve come! This may have been a reasonable argument if these technologies had been kept confined to the halls of the research university. But AI/ML are now commercialized technologies affecting our daily lives in countless ways, from online shopping and credit scoring, to algorithmic trading, policing, parole decisions and sentencing, driving, online dating and socializing, and increasingly, military applications.

The time has come to accept responsibility for the applications and effects of one’s creations.

The Naturalistic Fallacy?

Wait, but aren’t you trying to derive an ought from and is? Aren’t you basically describing how these technologies are used, and then trying to then claim, that this description implies or requires a prescription — how these technologies should be built. Yes, I am. Facts indeed inform values, and vice versa: values inform facts.

The number of gun deaths should inform gun policy (age restrictions, safety locks, outright bans), health outcome data should inform smoking and nutrition policy, plane crash data should inform plane safety policy (better autopilots, pre-flight checklists), deadly fires should inform architectural design (required emergency exits), vehicle crash data should inform car design (e.g., seatbelts, motorcycle helmets, and smooth dashboards and steering wheels, thanks Ralph Nader!).Perhaps more controversially, the findings of empirical social science, psychology, and neuroscience of human well-being should inform public policy, not be stashed away in arcane academic journals accessible only to the select few.

Without access to the facts, we cannot even begin to form our values around what is important to protect. Conversely, our values serve to guide our attention to the relevant facts: if we didn’t value human life, we wouldn’t collect gun death statistics or plane crash information in the first place. Immanuel Kant once said something similar: Thoughts without content are empty, intuitions without concepts are blind.

Photo by 🇨🇭 Claudio Schwarz | @purzlbaum on Unsplash

Culture Wars in Data Science: Twitter & NeurIPS

The philosophical ideas outlined above are worth examining again in light of two recent events in the AI/ML community. The first concerns a Twitter war involving University of Washington professor Pedro Domingos, after he complained on Twitter about the new NeurIPS ethics impact statements required by the conference. The second, of course, was the firing of Timnit Gebru by Google.

I focus on Domingos as his views about science and ethics are likely shared by many in AI/ML. See the open letter signed by several AI/ML researchers.

Domingos’ response on Twitter to NeurIPS’s requirement of ethics impact statements, which started a huge Twitter war.

Eventually, Domingos’ comments led to him being attacked by many on Twitter in the AI/ML ethics community, and the “conversation” quickly devolved.

A snippet of the culture wars in academic data science. https://twitter.com/pmddomingos/status/1337615171166998529

We Need to Talk about Science and Values

I disagree with Domingos on this particular issue — but I do think he makes a valid point that must be addressed, without recourse to ad hominem attacks or emotionally-driven calls for “canceling” him and his contributions to data science. His comment should be considered in detail because otherwise data science risks splitting into two cultures: one where questions of value are not answerable and important (the atomist camp), and one where they are (the holist camp).

To delineate the scope of arguments below, let me first give a rough working definition of what I mean by a data scientist. I mean someone whose job description includes some responsibility for producing new and systematic, generalizable knowledge, typically in the form of published research articles in peer-reviewed journals. I would venture to guess that very few people without PhD-level training are able to publish in peer-reviewed conferences such as NeurIPS or journals such as the Journal of Machine Learning Research (JMLR).

Where did the Modern Idea of “Value-free” Science Come From?

According to philosopher of science Hugh Lacey, the idea of a value-free science traces back at least to the inductive empiricism of Francis Bacon and Galileo. For his part, Bacon, the sense-based empiricist, warned us against blindly handing over the keys to our innate cognitive biases, or the “idols of the mind,” while Galileo harped on how the facts of nature were “deaf and inexorable to our wishes.”

Experimentation and intervention in nature were the key tools in the empiricist’s toolbox. The accumulation of ever greater stores of sense-data (augmented of course by improvements in instrumentation, such as microscopes and telescopes and so on) became the model of progress in science. We see this tendency for rote data collection and classification in the Enlightenment obsession with taxonomies and encyclopedias as well. Today, ML continues the empiricist tradition by replacing “sense data” with “big data.”

Logical Positivism & The History of the Value Free Ideal

The empiricism of Bacon and Galileo would be revived in the 20th century under the name of logical empiricism (or logical positivism). The logical positivists, led by philosopher-logician Rudolf Carnap, attempted to systematically build the foundations of science on logic.

Philosophy unselfishly positioned itself as the queen of the sciences, while mathematics was its king. Philosophers stripped the ordinary language claims of the “special” sciences (i.e., physics, economics, social science, etc.) into the unifying language of symbolic logic, whose structure reflected a correspondence with reality.

Anything which did not fit inside this “picture,” such as metaphysics or ethical values, was deemed nonsense — after all, there was nothing for ethical values to “map to” in the real world, independent of human existence. Even granting the existence of such mappings, they could never be verified or confirmed by empirical investigation, the positivists alleged.

Science (Latin Scientia, meaning knowledge) thus consisted in refining this limited but rigorous picture such that, in the limit and via the methods of empirical science, it would eventually mirror nature itself, making the need for “representation” of nature superfluous. Once laid bare and organized according to its axioms, the operations of symbolic logic could be applied and new, testable hypotheses could be put to empirical confirmation and the indubitable foundations of all scientific knowledge could be assured. This was the dream of the Vienna Circle, at least.

Philosopher and logician Hilary Putnam explains in his 1983 article Objectivity and the Science/Ethics Distinction how this philosophy of science quickly spread to other fields fond of axiomatic theoretical models (i.e., economics):

The Logical Positivists argued for a sharp fact-value dichotomy in a very simple way: scientific statements (outside of logic and pure mathematics), they said, are “empirically verifiable” and value judgements are “unverifiable.” This argument continues to have wide appeal to economists (not to say laymen), even though it has for some years been looked upon as naïve by philosophers.

Values must guide science, but we must guard against dogmatism through free, fair and transparent debates about the reasons and justifications of such values. Photo by Steve Harvey on Unsplash

Value-guided Science? Beware of Dogmatism

Although overblown, Domingos’ comments are not baseless. There are good reasons to be skeptical of the insertion of ethical and social values as criteria for deciding what counts as scientific research.

Soviet science was notoriously influenced by its interpretation of Marxist philosophy. Anything that did not concur with the Soviet interpretation of dialectical materialism was viewed with suspicion. Similarly, Darwinian evolution was, for a time, outshone by Lysenko’s crusade against genetics. And of course, we can’t forget how Galileo was forced to recant his support of Copernicanism under the inquisition of the Catholic church. These are all real instances of where blind adherence to dogmatism set back scientific progress.

But dogmatism goes the other way, too. By asserting its apparently “value-free” nature, “science” has been used to justify colonialism, slavery, genocide, and even world wars. The concept of race itself emerged with the institution and practice of modern science, with its hierarchy of races and classifications of persons into distinct racial groups on the basis of supposed phenotypic and genotypic markers (phrenology, anyone?). The idea of “theory-free” inductive science popularized by Bacon and later by Francis Galton and his student Karl Pearson was used to support racist and eugenicist policies, including forced castration and sterilization of so-called “degenerate stock.”

Without ethics guiding science, Nazi human experimentation, Soviet poison laboratories, Japanese human experimentation on Chinese prisoners, American experiments on the effects of atomic radiation on its soldiers, and the Tuskegee syphilis experiments might still be viewed as acceptable, among countless other violations of the basic human rights we value today. In fact, we have these rights today because of our knowledge of what some are willing to do to others in the ostensible pursuit of value-free “science.”

For those conducting these “scientific” experiments, their scientific judgments were of course not based on things as wishy-washy “values,” but were simply plain and obvious facts of nature, self-evident to anyone who would care to look properly. For them, science really was value-free. As Galileo said, facts of nature were “deaf and inexorable to our wishes.” Nature itself simply made different races of humans on a hierarchical scale: they were “natural kinds,” discovered by carving nature at its joints.

Do you really think their science was value-free, or do you think they were simply blinded to their own prejudices and popular beliefs of the time? Do you think, perhaps, that those who thought slavery was morally wrong were seen as “biased,” seeing the world through glasses colored by their moral beliefs and values?

Benefits of Ethics Impact Statements in AI/ML Publications

If Domingos’ “atomist” camp loses, and the trend of including ethics statements on AI/ML research publications continues, how might this affect research and society in general? I outline a few potential effects below.

Ethics impact statements could help to:

  1. Encourage broad and holistic — instead of narrow and atomistic — thinking in ML researchers. This could break down interdisciplinary academic barriers and micro-niches currently isolating researchers from public debates surrounding ML-based technology and applications. If the public’s understanding of AI/ML is solely based on Netflix’s The Social Dilemma, that’s not a good sign.
  2. Reduce the chances of irrelevant, harmful research and wasted mathematical talent. Jeff Hammerbacher, one of Facebook’s first data scientists, once said, “The best minds of my generation are thinking about how to make people click ads…That sucks.” If it’s clear certain technologies are continually being put to harmful ends (eroding personal autonomy and democracy, increasing suicide and depression, etc.), why continue investing research money, time, and academic training in them? What if young data scientists used their knowledge of programming and statistics to make progress in physics, biology, or the design of vaccine trials instead of ad placement optimization? See Cynthia Rudin’s video on the “Implosion of ML” for a slightly different angle.
  3. Stimulate interdisciplinary collaboration. Why not include a philosopher to help imagine your algorithm’s ethical impact? Who knows, this could even lead to new approaches and perspectives by borrowing ideas from other fields. While I don’t expect ML researchers to suddenly turn into Aristotles, Kants and Benthams overnight, society would be better off if they attempted to at least engage with ideas from the humanities and related disciplines.
  4. Increase public stakeholder engagement. Go to various societal stakeholders and explain, in clear and simple language, what your algorithm does, how it works, and ask them how it might affect them. Does your grandma care about which news shows up in her Facebook newsfeed? Recent political violence in the US underscores the importance of understanding how false information is disseminated by algorithms across social networks. By including ethics statements, academic ML research may become more accessible and important to laypersons and concerned citizens.
  5. Inspire new Data Science for Social Good projects and discussions about the social good. Here, ideas from philosophers John Dewey and Jürgen Habermas may provide useful foundations for developing democratic principles of public debate and free speech — founded on individual rights — that can be harnessed to promote the open and fair discussion of which good(s) to pursue and why, without resorting to caustic name-calling.

What Data Science Can Learn from Qualitative Research

Data science can benefit from a more inclusive research scope that takes facts and values equally seriously. Doing so requires explicitly acknowledging the role of human consciousness and subjectivity in scientific knowledge production. The researcher is an instrument, too. If AI/ML research is to truly embrace ethics, the outdated positivist idea of science as presenting “the facts” from a view from nowhere is no longer tenable. Let’s be clearer about whose views, interests, and public policies our work might promote.

Below are some key aspects of qualitative research data scientists might use to think about their own research (adapted from Sarah J. Tracy).

Authentic research: Which improvements and changes to various classification algorithms, recommender systems, explainability or data collection methods would you want applied to yourself, your family and friends, or your community? In the case of persuasive technology, would you yourself submit to such persuasion? Are you willing to construct social policy or legislation based on it? Are you personally willing to recommend, as public policy, the results of your work?

Self-reflexivity: Fish don’t realize they live in water, yet water influences all they experience. What are my biases and motivations for doing the work I do? What are my strengths and weaknesses as a data scientist? Which methods and models work better or worse for this kind of problem? What are the assumptions behind them? Does it make sense to apply the same assumptions used to model the faceless, identical particles of physics to individual, self-conscious persons with hopes, dreams, and desires? Would the person whose behavior is being modeled agree with your assumptions about her behavior?

Participatory design: How might the end users participate in, ask questions about, or give feedback on the end product of your research? Whose personal data were used in the evaluation of your algorithm, and what might they have to say about its intended use and its impact on them? This is already an emerging field in HCI research, but remains a niche area in academic data science.

Ethics: Qualitative research puts ethical concerns at the front and center, and only now is the AI/ML community recognizing a similar need to do so. Abeba Birhane and Fred Cummins have a nice paper on relational ethics for data scientists called Algorithmic Injustices: Towards a Relational Ethics.

Here’s to holism.

--

--