The world’s leading publication for data science, AI, and ML professionals.

The Best Earth Observation Data Science Toolkits

Platforms, Tools and Packages for Geospatial/Earth Observation Data Scientists

Photo by USGS on Unsplash
Photo by USGS on Unsplash

The satellite-based earth observation data is increasing at a rapid base, thanks to technological development in remote sensing platforms, and breakthroughs in data collection and storage. Today, we have more than 768 earth observation satellites in orbit, compared with only 150 in 2018.

As a Geospatial or earth observation data scientist, you have a vast array of tools and resources to choose. In this article, I highlight the best open source tools in the market that are integrated into the Data Science ecosystem.

1. Google Earth Engine (GEE)

Your wish granted. GEE is all in one package. Google Earth Engine(GEE) is by far the complete one in all package for Earth Observation data scientists. It does offer not only Geospatial data processing and analysis capabilities but also provides ready to use datasets to focus on analysing rather than downloading data.

With GEE, you can perform planetary-scale analysis with freely available satellite images from NASA/USGS (Landsat, MODIS), European Union (Sentinel 1 & 2) and non-satellite or derived products like elevation, climate data and land cover.

With a full-featured development environment in both Javascript and Python APIs, Google Earth Engine (GEE) is an essential arsenal for Earth Observation data scientist/analyst. It also comes with Code Editor (Javascript) where you can run your analysis and visualisation right in the browser.

Furthermore, you can create Machine Learning models right with GEE and can produce full ML models and predictions right in the browser. The python module integrates well with other python packages, and you can run in Jupyter notebooks or Google Colab.

There you go – a complete end-to-end functionality in GEE with terabytes of satellite imagery sources, data processing tools and ML algorithms, right in your browser.

It could not have been better!

2. EO-learn

Earth observation pipelines at scale running on CPU/GPU. EO-Learn is a Python package that links closely with the data science and machine learning python ecosystem to the Remote Sensing/earth observation community. With eo-learn, even non-experts can use to extract and derive valuable information from satellite images.

It also enables earth observation experts to carry state of the art deep learning and computer vision models efficiently.

Eo-learn is built on Numpy arrays and shapely geometry, so geospatial data scientists feel right at home. With bounding boxes using Geopandas, eo-learn can download satellite images right in your environment before carrying out any analysis.

Not only eo-learn enables you to carry out ML and Deep learning models, but it also has modules for batch processing, masking, IO functionalities and geometric transformation and conversion between vector and raster data.

With eo-learn, you can create pipelines to perform batches and multi-batches that can be run on parallel GPUs/GPUs for heavy processing. Forexample, this pipeline runs 12 hours for a sequence of tasks for an array of 200,000 km2, to perform land cover classification for an entire country.

Radiant MLHub

For earth observation and Machine learning, Radiant MLHub offers ready to use and open-source earth observation training datasets. All the dataset in their catalogue are SpatioTemporal Asset Catalog (STAC) complaint and the list is growing already – 14 datasets available so far.

The training datasets cover different machine learning applications including image classification, segmentation and object detection. Popular ML earth observation training datasets available here include SpaceNet, BigEarthNet and LandCoverNet.

Accessing the data is free and open for anyone via an API. Examples of Jupyter Notebook on how to access different datasets are available in MLHub Tutorials repository.

Open Data Cube

As a platform and an open-source geospatial data management tool, Open Data Cube provides easy and open data access to large indexed amounts of Earth observation data. The Python API enables earth observation analysts to query and access data with high performance, allowing them to carry out country-level to continent-scale processing of stored data.

Open Data Cube currently supports Digital Earth Australia and Africa Regional Data Cube and also provides tutorials, guides, documentation for their users.

They also offer a generous free ODC sandbox (16 GB ram) with preconfigured Jupyter Notebooks which you can run on the cloud without configurations and installation. With this feature, many Earth observation are finding it easy to start analysing petabytes of data without worrying about the software.


These platforms enable thousands of Earth observation data scientist and facilitate day-to-today tasks. Although some of these toolkits can be used as standalone and in a complete end-to-end pipeline, I tend to use them with other geospatial/earth data science packages and libraries.

So, I will conclude this article by providing the packages I use most to process earth observation datasets, that you will probably find useful.

  • Rasterio: An essential, lightweight and flexible Python package for remote sensing image reading and writing.
  • WhiteBoxTools: is an advanced geospatial data analysis tool including an extensive image processing tasks like image enhancement, filtering operations, cluster algorithms and other image processing functionalities, like hydrological, and geomorphometric analysis. It can handle Lidar Data effectively, enabling you to segment, tile or join raster lidar data and derive outputs.
  • EarthPy: s a python package that makes it easier to plot and work with spatial raster and vector data using open source tools. EarthPy bridges the gap between raster and vector data so you can work effectively between the two different data types.
  • GDAL: GDAL is a lovely tool used by most earth observation users. It is a translator library for raster and vector geospatial data formats and provides an extensive list of satellite image processing tools.
  • PDAL: is Point Data Abstraction Library. The focus of the PDAL is LiDAR data but also offers other tools as well. It also provides a simple python binding through Numpy, which enables working with earth observation tools like Jupyter notebooks and python.

Final thoughts

We live in the golden age of Earth observation. Not only do we witness the breakthroughs in space and airborne expeditions but also have incredible resources to analyse and study the earth and its environment. These platforms enable earth observation data scientists to study the climate, weather, agriculture, transportation and infrastructure as well as natural hazards.

Although the list is not extensive, I believe these are the best resources out there. If you think, I have left out some of your favourite tools; please let me know.


Related Articles