Is Data Science a science?

An argument that asserts the field of Data Science should indeed be considered a fundamental science

Ricky Sethi
Towards Data Science
5 min readMar 13, 2021

--

Photo by Roman Mager on Unsplash

What is Science?

At its core, all fundamental science is about making predictions in the form of experiments: precise, quantifiable, falsifiable predictions. As Richard P. Feynman put it:

“The fundamental principle of science, the definition almost, is this: the sole test of the validity of any idea is experiment.”

So if science is about making predictions, how is it different from the predictions that astrologers make? The core distinction is in the kinds of predictions each makes. Most horoscopes, for example, will give you general predictions. These horoscopes will usually say things like, “you’ll have a great day today.” Scientific predictions, on the other hand, are precise, quantitative predictions; they don’t say you’ll have a nice day but instead, they say you’ll step outside your door at precisely 3:07pm and get struck by a meteor!

Admittedly, most predictions are not quite so dire and are usually more prosaic in nature. But prediction itself, which we can also call experimentation, is at the heart of basic science. We can think of basic science as that which focuses on the most fundamental aspects of the universe, matter, and energy. The examination of basic science has revealed an intimate relationship between such fundamental science and computation.

In fact, we can think of a computation as some process that transforms one representation of the world, our input, to some other representation of the world, our output, as can be seen here. For example, the input might be a list of temperatures at each of the sensors distributed around campus at 6am and the output would be the average of all the sensors’ temperature values. The process or method that transforms the input to the output would be the calculation of the average itself. We might thus say that the problem of computing the average of temperatures of all the sensors at 6am is a computable problem.

As it happens, an important component of fundamental science deals with those problems in the universe that are computable. As David Deutsch says in [Zenil 2012]¹, “the laws of physics refer only to computable functions.” This implies that, in a very real sense, all the laws of physics belong to this set of computable functions, even though computable functions themselves are only a small subset of all possible mathematical functions!

So how does this relate to Data Science?

Photo by Boitumelo Phetla on Unsplash

Before we tackle the idea of whether Data Science is a science or not, something that doesn’t seem to have a definitive answer, let’s step back and look at the idea of proof. This is a word that is overused quite frequently as there are many different kinds of proof: for example, there are scientific proofs, legal proofs, and mathematical proofs.

In mathematics, a proof is an inferential argument that shows a statement is true as supported by axioms, definitions, theorems, and postulates. Mathematicians normally use deductive reasoning to show that the premises, also called statements, in a proof are true. A direct proof is one that shows a given statement is always true and the proof is usually written in a symbolic language. In an indirect proof, mathematicians usually employ proof by contradiction, where they assume the opposite statement is true and eventually reach a contradiction showing the assumption is false.

In science, an inherently inductive enterprise,² we cannot prove any hypothesis to be true as that would require an infinite number of observations so the best we can hope to do is use inductive reasoning as the basis of our generalization and hold it to be provisionally true. As noted by Lee Loevinger about Karl Popper, “In this view, which is fairly widely accepted, an hypothesis can be falsified, or disproved, but cannot be verified, or proved.” Once it’s validated extensively and consistently, and we deem it to be sufficiently substantiated, we then call it a theory.

In law, legal proof is the process of establishing a fact by using evidence. In science, we might call this a validation of some theory as that usually also takes the form of an argument where you present a series of premises in support of some conclusion. Similarly to proof in law, proof in science is usually limited to proof of facts in the sense of using data to establish the validity of facts. This is discussed at length by D.H. Kaye in [Kaye 1991]³, which shows that the use of quantitative observation-statements provide evidence to prove or, as we’d say in science, show the validity of, facts. So we could, in some sense, say that legal arguments use evidence to show the validity of a theory whereas science uses data to falsify a theory.

For example, following [Kaye 1991]³, collecting quantifiable data of the intensity and polarization of radiation at various frequencies from a radio telescope pointed at the Crab Nebula is the evidence that shows (in law, proves) something in the direction of the Crab Nebula is a radio source, the fact. Such facts can be deduced or induced from statements of observations, the evidence. Thus, a fact is based on some repeatable observation or measurement that is generally agreed upon to recur with the same value or in the same way under the same kinds of circumstances.

These facts are then used to inductively reason about a hypothesis or model of the system being studied. The predictions made by that model are further verified and, when enough predictions are verified independently, the hypothesis, or set of hypotheses, is considered sufficiently validated to be called a theory.

If it looks like a science and sounds like a science…

Photo by Kai Alyssa Bossom on Unsplash

This process, this scientific method, is exactly what we employ when we utilize our machine learning models, like Hypothesis Testing or Decision Trees, within a Data Science framework and use data to iteratively test and improve our models. I might further argue, following Feynman’s formulation, that as long as you’re using a systematic model to make predictions and then testing those predictions with data and using those results to validate or improve your model iteratively, you’re doing science.

Applying these scientific models to specific problems without iteratively changing or further developing those models results in engineering and technology principles. As such, I might be inclined to categorize Data Analysis as an engineering discipline and Exploratory Data Analysis as a technological application.

Read more about Data Science vs Data Analysis in Ch. 10 of my new book, also publicly viewable on ResearchGate.

[1] H. Zenil, A Computable Universe: Understanding and Exploring Nature As Computation. River Edge, NJ, USA: World Scientific Publishing Co., Inc., 2012

[2] Inductive at least to the extent that thinkers like Richard Feynman and Karl Popper would find it to be so, Feynman in his exposition on the Key to Science and Popper in his formulation of conjecture and criticism.

[3] D. H. Kaye, Proof in law and science, Jurimetrics J., volume 32, page 313, 1991

--

--