Is Data Science A Real Science?

As I hear criticisms about how data science is unscientific, I would like to clarify what is science, and to show how data scientists can answers these common criticisms about their field.

Ludovic Benistant
Towards Data Science

--

What is science?

Science is a quest to reach good explanations about the world. As David Deutsch pointed out in his book, The Beginning of Infinity, a good explanation is clear, precise and hard to vary.

What is data science?

Data science is a new scientific field that thrives to extract meaning from data and improve understanding. It represents an evolution from other analytical areas such as statistics, data analysis, BI and so on.

Data science has emerged today thanks to the explosion of data provided by the internet, raising computing power with more advanced technologies, the development of computer science and machine learning algorithms.

The term data scientist was first used by Jeff Hammerbacher and DJ Patil to describe a new job title. They searched for individuals with a new set of critical skills such as mathematics, statistics, computer science, machine learning, and business knowledge.

Why might data science never be a “real” science?

Criticism: “Big data analysis is mainly: garbage in, garbage out… So, how can you certify the quality of your data and results?”.
Indeed, our vast amount of data can be filled with mistakes and errors. But, this criticism is also relevant for small and medium data sets. What is important is how we collect the data, and how we treat it. We will make some mistakes along the way, but for most of it, data science can deal with the issues of data quality; like every other analytical field.

Criticism: “Data science can find anything in a large amount of data. By subsetting the data and building new features, they can prove anything. As Ronald H. Coase said: “If you torture the data long enough, it will confess.”
Another big issue, yet, surmountable. Of course, data scientists have to be careful not to jump to conclusions too fast. Again, just like any other person that tries to reach the truth. In data science, we have dozens of rigorous methods to treat data. For example, cross-validation and regularization are useful not to overfit one’s predictions. Also, we can publish our codes, and other scientists can trace exactly what we have done with the data from the start to the end.

Criticism: “Data scientists can only build observational studies. They might spot a few correlations, but can’t say anything about the underlying causes.
Controlled experimentation with random samples and control groups are better to assess causality. But in many cases, they are expensive, unethical or even impossible to lead. To study different phenomena correctly, science needs both: controlled experimentation and observational studies.

Criticism: “Ok for data science… But, business data science will never be a real science. Science is impossible in business because companies strive for money, not scientific truth.
Even if making a separation here is relevant, a data scientist working in a business can be viewed as scientific as many other researchers often working in R&D. Indeed, a commitment to reality and truth can bring lasting competitive advantage to businesses, and scientific discoveries can yield significant benefits.

Even if these criticisms need to be taken into account by any data scientist that tries to reach knowledge using data, they don’t condemn them to be failed scientists. In addition, these criticisms often miss the point, as they come from people that don’t know much about the methods we use to tackle large amounts of data.

Can we judge the whole data science field at once?

The main mistake of the criticisms above is to judge the entire data science field at once and without a real understanding of it. That can’t work for any scientific field:

Some doctors don’t understand statistics at all; does this mean that the medical sector is unscientific? And do we say that medicine is doomed to be unscientific?

In 2015, psychology was severely hurt by many failed experimental replications. Do we conclude that all psychologists aren’t scientific and will never be?

We can’t judge a whole scientific field at once because they include many unscientific people among more scientific ones.

Some data scientists are financially motivated computer geeks. Some others work hard to confirm any prior intuitions. Data science has to deal with this and bear a bad reputation in some places.

But, the data science field also has had many successes in biology, finance, marketing, and economics. Furthermore, many more discoveries will come from our new method to make sense of large amounts of data, using machine learning algorithms or computational statistics.

Is a data scientist a real scientist?

“Science is a way of thinking much more than it is a body of knowledge” — Carl Sagan.

To find better explanations about the world, scientists make space for correcting errors in their previous knowledge. They look for clear, rational and testable explanations.

Data scientists need to uphold these scientific principles and also embrace scientific values such as openness to ideas, criticisms and respecting the other’s rational opinion.

So next time you are wondering if any field is scientific, go down to the individual level and see how much that person respects the scientific epistemology, values, and method. For instance you could ask, what does she try to achieve? How is she questioning her knowledge? What are her main sources of belief?

--

--