Location, Location, Location!
You have heard this many times. It is a common mantra in real estate. Does that apply in Data Science as well? How do we embrace the location component in Data Science? Is it only another column in your dataset? Or perhaps spatial is special.
Location data (big data) is ubiquitous as we create zillions of geographic data every day from tweets and geotagged images to smartphone tracking services. However, in data science, location component is not deeply integrated into the mainstream. In this article, I highlight some of these loosely integrated location data science aspects as well as the potential for establishing a tightly integrated system.
Time-series vs Location data
All it takes to realise that we consider location as a supplementary column in data science is to look at Kaggle Platform.
To illustrate my point here, let us consider comparing two closely related concepts: space and time. I guess you are already familiar with time-series analysis. It is extracting information by analysing sequence over some time or forecasting future values.
Even in empty space, time and space still exist. Sean M. Carroll
We all prefer data and facts, so I searched these three terms in Kaggle: "Time series", "Geographic" and "Geospatial". The results are not surprising at all.
As of this date 27 October 2019, searching time-series in Kaggle returns 430 datasets with 1149 notebooks.

On the other hand, the geographic search resulted in almost the same dataset numbers 391 but have a look at the Notebooks and competition results. Notebooks with Geographic word in it only amount 45 as shown below compared to 1149 Notebooks in time series.

You might say that these results are not generalizable or a bit narrow after all Kaggle is only one platform of data science projects. So I looked at google trends, and the results show a similar pattern – less Geographic data analysis coverage and more time series analysis as shown below.

It just becomes clearer that location data is only another column in the dataset, and we usually deal it cautiously or altogether drop it from our analysis. Let us look at the reasons for this detachment.
Disconnected Location data science
Spatial data science is not only producing beautiful data visualisation, i.e., maps, but a whole lot of spatial thinking perspective. Insights derived through spatial analysis provide rich and unparalleled insight into many applications. However, some challenges stand the way to incorporate the location component in data science fully.
Despite the importance of location data, we see today less converged scenarios between data science and geospatial component. Why is that?
With some exceptions, the spatial component is mostly neglected in mainstream data science. A good example is how the location component is embedded in Deep learning applications without any attention to the spatial aspects. Even most deep learning applications in Satellite images lose the spatial part once we feed them into neural networks.
Furthermore, location analysis and data science has a significant overlap but grew out disconnected for a long time. An early example is John Snow’s cholera analysis. With the outbreak of Cholera in London 1854, John Snow mapped geographic locations of deaths and discovered a cluster of death in two water sources. Arguably this is one of the first data science and data analysis cases documented.
Luc Anselin, a Professor in Spatial data science, remarked this disconnection recently as "Space scepticism". He reflects that many in big data and Machine learning communities remain unconvinced about the importance of tightly integrating the spatial perspective into data science, like John snow’s Cholera analysis.
Another major impediment **** is due to historical artefacts. Dealing with Geographic data was difficult computationally, but that is not the case anymore. Today there are mainstream integration in Spatial databases as well as tools for manipulating Big Geographic information. Google’s BigQuery, for example, has included GIS capabilities.
There are, also, few applications and projects that move from simple location incorporation into data science and towards fully integrated Geographic data science. Spatial Hadoop, and the recent release of cuSpatial – accelerated on GPUs using NVIDIA RAPIDS DataFrame library – are just a few advances.
Towards Geographic Data science
Geographic data science is the discipline that focuses explicitly on the location (Spatial) component of the data science. It brings forth theories, concepts and applications that are specific to geographic data in the realm of data science. We are all aware of the importance of where things are and how to get into them, and that affects everything from business logistics, transportation, climate change, to everything related to a location decision making.
I think it is time to move beyond the disconnection between data science and location components. We need to realise that visualising data as a map does not equate spatial data science. As Javier de la Torre put it last week in Spatial data science conference, It is time to move from visualising data on maps to analysing data using maps.
I am passionate about spatial data science, and you can connect with me at @shakasom on Twitter if you like to discuss spatial data science.