The world’s leading publication for data science, AI, and ML professionals.

On Data and Science

Part 1: The importance on bringing science to the discussion.

Illustration by Héizel Vázquez
Illustration by Héizel Vázquez

We’ve been doing science for a while now. I’m going to put beginning in 1637 when Descartes published "Discourse of the Method". The main result of that book is the distinction between knowledge and truth, and that the discourse of the scientist is related to knowledge (that we later discover to be always incomplete) not to look for the truth.

That’s a very important point, because it gives us a focus, science wants to know stuff that are not "undisputed truths".

The role of data in science

Illustration by Héizel Vázquez
Illustration by Héizel Vázquez

Data has been close to science in almost of its history. Sometimes is theory and then get the data, but for the most of what we are going to discuss here and other upcoming articles it’s going to be in the model data -> theory.

I want to tell you a little story that will make you understand better why data it’s important, but not the most important part of science.

The story is about a man called Tycho Brahe. He spend almost all his life measuring the positions of the stars, planets, the Moon and Sun. For what? He wanted to learn how to predict elipses, also he was unhappy with the Ptolemaic system and the Copernican theory wasn’t enough for him either. So he wanted to find the best way to describe the skies and its moving parts.

Sadly he wasn’t that sure how, but he kept on measuring things until his final days. He died in 1601, and someone named Johannes Kepler, that became his assistant a year before, who was a great mathematician, had access to almost all his data. With that data Kepler improved Copernicus theory of the Universe and developed three laws that described the motion of the planets. Kepler’ works served as basis for the later studies of Isaac Newton about the theory of gravity and the motion of bodies.

The story is much more long and fun, if you want to know more about it please take a look online. But you could be asking yourself at this point, what does this story has to do with data science?

The biggest learn we have from that story is that having data, and sometimes a lot of data is worthless unless you have a good question to answer. This is still true nowadays, and the start of the modern love of data began with statistics.

The role of statistics

Illustration by Héizel Vázquez
Illustration by Héizel Vázquez

I’m not going to write a lot about statistics here, but I will point to two specific things that changed the world forever. First is an article called "The Future of Data Analysis" by John Tukey published in 1962 and the other one is a presentation by professor Jeff Wu titled "Statistics = Data Science" given in 1997.

These are pretty old references I know, but they are very important. Believe me.

In the article by Tukey he said this:

For a long time I have thought I was a statistician, interested in inferences from the particular to the general. But as I have watched mathematical statistics evolve, I have had cause to wonder and to doubt. […] All in all, I have come to feel that my central interest is in data analysis…

This is a huge statement to make by a statistician. In this time, the words "data science" did not exist as today, but the way Tukey described data analysis is very close to what we call now data science. He even called data analysis a science, because it passes these three tests:

  • Intellectual content.
  • Organization into an understandable form.
  • Reliance upon the test of experience as the ultimate standard of validity.

Saying also that this "new science" is defined by an ubiquitous problem rather than a concrete subject. He then goes and talk about how to learn and get started with data analysis, and how to become a data analyst also how to teach it. It’s an amazing article that we all should read if we want to understand the beginnings of our field.

In the second piece, 35 years later after Tukey’s publication, Jeff Wu said this:

Statistics = Data Science?

Where he proposed that statistics should be renamed "data science" and statisticians should be named "data scientists". In today’s standards we now that statistics is a part of data science, but why? Because we say that we also need programming, Business understanding, machine learning and more. Maybe it’s just that statistics evolved and now some statisticians became data scientists. But some of them.

To understand the portion of statistics and statisticians that became data science and data scientists we need to read the article "Statistical Modeling: The Two Cultures" by Leo Breiman published in 2001.

Here he mentions that there’s some people in the statistical culture that are driven by data modeling and some by algorithmic modeling. Where the first ones assume that we have a stochastic data model that maps input variables x to response variables y. And the second ones consider that the mapping process is both complex and unknown, and their approach is to find a function f(x) that operates on x to predict the responses y.

He then goes to discuss why the data modeling culture has been bad for statistics for so long leading to irrelevant theories and questionable scientific conclusions keeping statisticians from using more suitable algorithmic models and working on exiting new problems. Also he talks about the wonders of the other part of the spectrum, the algorithmic modeling culture giving examples from his own works and others on how it can solve hard and complex problems.

The role of data science

Illustration by Héizel Vázquez
Illustration by Héizel Vázquez

Data Science is the main focus of most sciences and studies right now, it needs a lot of things like AI, programming, statistics, business understanding, effective presentations skills and much more. That’s why it’s not easy to understand or study. But we can do it, we are doing it.

Data science has become the standard solving problem framework for the academia and the industry and it’s going to be like that for a while. But we need to remember where we are coming from, who we are and where we are going.

I’ll be creating more articles in the subject, you can consider this an introduction.

Thanks for reading 🙂


For more information you can follow me here:

Favio Vazquez – CEO – Closter | LinkedIn


Related Articles