What is data science and what is it not?

Sunit Kakati
Towards Data Science
3 min readSep 8, 2017

--

Data science, also known as data-driven science, is an interdisciplinary field about scientific methods, processes, and systems to extract knowledge or insights from data in various forms, either structured or unstructured, similar to data mining.

Now, we are ready to talk about what data science is. It’s a thing that encapsulates some programming skills, some statistical readiness, some visualization techniques, and, last but not least, a lot of business senses. The kind of business sense that I in particular care about is the ability and willingness, sometimes eagerness, to translate any business questions into questions answerable using currently or forthcoming available data within one’s reach. In fact, it takes a special way of connecting all the dots in the random world full of data most of which you may not find immediately useful to make a working data scientist.

A data scientist, based on my current understanding, is the person who connects the dots between the business world and the data world. Similarly, data science is the craft that a data scientist utilizes to make this happen.

WHAT IS

  1. It is a little bit of a misnomer and a buzz word that media is using to describe everything. However, it’s good to have this dicussion to come into an agreement.
  2. The questions is about Data science. So I will not talk about Data Scientists. Go to What is a data scientist? if you are interested.
  3. The biggest error that I found in most of the answers was some sort of “Data Science is when you are dealing with Big Data, large ammounts of data”. That is not true, Data Science can be applied to a data set with one thousand lines, there is no problem with this.
  4. If we are goig to call as “science” we need to consider the Science and Scientific Method definition. According to this, Data Science is not only about the practical or empirical methods, it needs scientific foundations.
  5. No one talked about the difference between Data and Information.
  6. Data is a raw, unorganized set o things that need to be processed to have a meaning.
  7. Information is when data was processed, organized, structured or presented in a given context so as to make it useful
  8. Based on this, we would have Data science and Information science. Right now, people have a bias to talk about Data science including Information science.
  9. It was clearly being used in a lot of fields for the past years:
  10. Statistics/Mathematics
  11. Business analytics
  12. Market intelligence
  13. Strategic Consulting
  14. Many others…
  15. The craziest part is that you see professionals of these areas updating their resumes with something like “I worked with Data Science…”
  16. The creation of data science in a simple way. Two sides that were not totally connected, but with the new fast paced and technological world would have to merge together:
  17. Statistics/mathematics: formulate proper models to generate insights.
  18. Computer science: make the bridge between the models and the data in a feasible time to come with the result.
  19. Topics/tools that a person neeed to understand or have some knowledge when working with Data Science:
  20. Linear algebra
  21. Non-linear systems
  22. Analytical geometry
  23. Optimization
  24. Calculus
  25. Statistics
  26. Programming language (R, Python, SAS)
  27. Softwares: Excel, SPSS by IBM
  28. General platforms: Watson Anlytics by IBM, Azure Machine Learning, Google Cloud machine learning,
  29. Data visualizations: Power BI, Tableau, R/Python using plotly/ggplot
  30. Machine Learning (supervised, unsupervised and reinforcement learning)
  31. Big Data
  32. Big Data Frameworks (Hadoop and Spark)
  33. Hardware (CPU, GPU, TPU, FPGA, ASIC)
  34. One Picture Worth Ten Thousand Words. The Drew Conway’s Data Science Venn Diagram . The Substantive expertise (or Domain expertise) is the specific knowledge of the area that you are applying Data Science. To know more about the lack substantive expertise in data science: What’s Missing in Data Science Talks — As Risky As It Gets

WHAT IS NOT

  1. Machine Learning is not a branch of Data science. Machine Learning originated from Artificial Intelligence. Data science is only using ML as a tool. The reason is that it produces amazing and autonomous results for specific tasks
  2. It’s not the salvation of companies that never measured anything and now want to get insights from their data. “Garbage in, garbage out” Data science will be as good as the data generated on the following years.
  3. Just present data using some Excel charts without any insight about the data.

--

--

Working as an IT Consultant || IITian ||Attend Coursera, Udemy and Stanford Online Lagunita