The world’s leading publication for data science, AI, and ML professionals.

Building the Future of Data Science

If we fail to predict the future they call us a failure, and if we do it too well they call us a sorcerer.

READ ALL THE PARTS HERE: PART 1, PART-1B, PART 2, PART 3.

But as Poincaré said: "It is far better to foresee even without certainty than not to foresee at all."

I’ve been talking about data science for a while. In the upper part of the article you can see some of my pieces related to this one, but know I want to take a step further. I’ll discuss the future of the field.

Again reframing what Poincaré said about Mathematics, If we wish to foresee the future of data science, our proper course is to study the history and present condition of the science. I’m going to do this in several parts (yeah sorry about that), so stay tuned for more later.

Let’s start with some history. Instead of boring you with paragraphs of text I build this timeline about data science that will help understand where we are coming from and where we are going.

Disclaimer: This timeline may not be totally complete so if you think I’m missing a piece please let me know. Also, I’m combining the history of data science to some developments on computer science, machine learning, deep learning, data analysis and data mining.

Not a new field at all, but with amazing developments in the past decades. I had to cut some parts and moments because of space, but the most important things are hopefully there.

The full list of references I used to create this timeline is in the bottom of the article. Please check them, they’re awesome.

Hopefully this timeline will give you an idea of the history of the science of studying data, that we call data science right now, but that may not be the final term we have. So always be prepared for a change.

Where are we going?

A while ago I published this chart:

About he interest on semantic technologies over the years. We can easily see that it’s increasing over time. Semantics in this context means the use of formal semantics to give meaning to the disparate and raw data that surrounds us, and also the relationship between signifiers and what they stand for in reality, their denotation.

When we talk about semantics in data we normally mean a combination of ontology, linked data, graphs and knowledge-graphs, the data fabric and more. You can read about all of that in the links at the beginning of the article.

But why? Why the shift? The thing is that all data modeling statements (along with everything else) in ontological languages for data are incremental, by their very nature. Enhancing or modifying a data model after the fact can be easily accomplished by modifying the concept.

We normally store data in graphs in these technologies. Whereas relational databases store highly-structured data in tables with predetermined columns and rows, graph databases can map multiple types of relational and complex data. And it’s better for what we have right now.

I’ve been in countless projects right now, and the common thing is we spend a lot of time trying to make sense of the data we have, and one of the reasons maybe that we are not storing the data and its relationship in a good format. The promise of the data fabric is just that, to support all the data in the company. How it’s managed, described, combined and universally accessed.

Remember, data and context come first, this new paradigm integrates and harmonizes all relevant data sources – structured and unstructured data alike – using a built-in graph database and semantic data layer. The data fabric conveys the Business context and meaning of your data, making it easier for business users to understand and properly utilize.

For me that’s the future of data science. We are moving in a direction where semantic technologies are going to be the standard in every company. But we won’t stop there. All the advances in augmented reality, virtual reality and more will companion these shift. For example take a look at this project:

From the creators:

Northstar is an interactive data science platform that rethinks how people interact with data. It empowers users without programming experience, background in statistics or machine learning expertise to explore and mine data through an intuitive user interface, and effortlessly build, analyze, and evaluate machine learning (ML) pipelines.

And imagine combining that with semantic technologies and be able to talk to your data. Ask questions and the systems will give you answers. That’s the other part of our future, automation. We need automation for data storage, data munging, data exploration, data cleansing and all the things we actually spend a lot time doing. You can say that tools like DataRobot can offer you those things, but in my experience there’s much more to do in this field.

One great example of a platform to do all these things is Anzo. Where you have automation everywhere. And it’s easy to add more features on the go like explainable AI, continuous intelligence and more.

https://www.cambridgesemantics.com/product/
https://www.cambridgesemantics.com/product/

But there are more, and more will come. I actually did an exercise using AnzoGraph, a part of Anzo, you can read it here:

The Data Fabric, Containers, Kubernetes, Knowledge-Graphs, and more

How do I keep up with everything going on?

This is one of the questions I get over and over. How do I know where to look, and stay up to date with advances in the field.

My answer is this:

Be active. Read but reply. See but comment. Study but explain. Ask questions.

Don’t be just a consumer of all this things. Immerse in the field, read articles, see videos, read books and more, but also create. Reply to conversations, ask questions, create group studies, join a course.

Being an active member of a scientific community it’s not easy, and because Data Science uses many theoretical and applied sciences, it’s particularly not easy to follow the advances. But possible.

Some places that you have to transform into your most visited pages:

In arXiv you’ll find the latest articles in the field. Mostly pre-prints, but almost the final article. There you can search for terms, related terms and specific authors. It’s the home page of most people doing their master’s or PhD’s. Start here: https://arxiv.org/list/physics.data-an/recent

GitHub is where almost all the code in the world lives. You will find there amazing applications and projects with code and examples. Also you have a trending page so you can see what people are interested right now. Start here:

Explore GitHub

The mission of Papers With Code is to create a free and open resource with Machine Learning papers, code and evaluation tables. You’ll find countless papers about the insides of data science and computational sciences with the code implemented. It’s all free and very well documented and distributed. Start here:

Papers With Code : the latest in machine learning

If you are a programmer, or you are learning how to code you know about Stack Overflow. It’s a community where people ask and answer questions related to Programming and more. Make sure to visit it constantly to see new replies and advances. Start here:

Posts containing ‘data science’

Reddit is a huge website/forum where you can find almost everything in the world. But there’s a lot of great subreddits (like groups) where people share valuable information, ask great questions and talk about data science, machine learning, math, science in general and more. Search for: r/learnmachinelearning/, r/deeplearning/, r/datascience/ and r/MachineLearning/.

KDnuggets is one the biggest platforms to read articles and information about data in general. Is edited by Gregory Piatetsky-Shapiro and Matthew Mayo. You’ll find some of my pieces here. Start here:

Machine Learning, Data Science, Big Data, Analytics, AI


Another amazing place to find incredibles articles, blogs and more about data science. Almost all my articles are stored here, and from amazing people in the field. Created by Ludovic Benistant, and edited by a very talented team. Start here:

Towards Data Science

Data Science – Towards Data Science

What should I learn right now?

Article by Isaac Faber
Article by Isaac Faber

This is not an easy question. But after reading this and other articles you may feel overwhelmed. My sister Héizel and I created this short explanation of what you should be learning about data science:

And for the future we are building you’ll need to know:

  • Graphs
https://towardsdatascience.com/graph-databases-whats-the-big-deal-ec310b1bc0ed
https://towardsdatascience.com/graph-databases-whats-the-big-deal-ec310b1bc0ed
  • Graph Databases
https://blog.cambridgesemantics.com/why-knowledge-graph-for-financial-services-real-world-use-cases
https://blog.cambridgesemantics.com/why-knowledge-graph-for-financial-services-real-world-use-cases
  • Semantics
https://towardsdatascience.com/the-data-fabric-for-machine-learning-part-2-building-a-knowledge-graph-2fdd1370bb0a
https://towardsdatascience.com/the-data-fabric-for-machine-learning-part-2-building-a-knowledge-graph-2fdd1370bb0a
  • Ontological languages
https://towardsdatascience.com/https-towardsdatascience-com-the-data-fabric-containers-kubernetes-309674527d16
https://towardsdatascience.com/https-towardsdatascience-com-the-data-fabric-containers-kubernetes-309674527d16
  • Be able to use automated tools for Machine Learning
https://www.kdnuggets.com/2017/01/current-state-automated-machine-learning.html
https://www.kdnuggets.com/2017/01/current-state-automated-machine-learning.html
  • Understanding business problems for real. This is what computers can’t do right now, and it will take some time for they to do it. See Matthew Dancho‘s amazing courses, it’s the best way to fo from zero to powerful and fast.
https://www.business-science.io/
https://www.business-science.io/

Thanks for reading and wait for more 🙂

If you want to contact me follow me on twitter:

Favio Vázquez

and LinkedIn:

Favio Vazquez – Faculty Member, Professor and SME in Data Science – EMERITUS Institute of…

References:

A Very Short History Of Data Science

Beginner’s Guide to the History of Data Science

A Brief History of Data Science – DATAVERSITY

Big Data – A Visual History | Winshuttle

An architecture for a business and information system


Related Articles