Data Exploration + Google Earth Engine as My Undergrad Thesis

A summary of my last project before graduating from Engineering Physics

Isaac Arroyo
Towards Data Science

--

Photo by NASA on Unsplash

Disclaimer

Google Earth Engine did not sponsor this. All I want is to give a general idea of what I did for my undergrad thesis project and show some data visualizations.

Introduction

I love being curious, finding patterns in different phenomena, and using visuals to share (and gain) knowledge or emphasize relevant information. (To me) The field of Data Science covers these things and generates an impact not only on the focused problem but also helps find new ones to tackle.

My undergrad thesis title is “Herramientas estadísticas y computacionales en imágenes satelitales de Earth Engine para la exploración de incendios forestales, translated from spanish to english is “Statistical and computational tools in Earth Engine satelillte imagery for wildfire exploration”.

Why wildfires?

Wildfires are complex natural phenomena that go beyond “the world is getting hotter,” they include understanding the environment’s behaviour; cultural, social, environmental, climate, and other types of variables. I focused on the state of Yucatán in Mexico.

Photo by Joanne Francis on Unsplash

What is Google Earth Engine?

Google Earth Engine (GEE) or Earth Engine (EE) is a cloud-based platform with a vast data catalogue. It enables large-scale satellite imagery processing to perform different high-impact analyses on Earth’s surface related to deforestation, changes in land surface temperature, drought, and others. GEE is also designed to help researchers easily share their work with other researchers, non-profit organizations and the general public [1].

Screenshot of the official web page of Google Earth Engine via https://earthengine.google.com

To extract the data and use all the GEE’s functionalities, the user must use one or both client libraries to access the API: JavaScript or Python. In my particular case, I used the GEE Python API; the reasons are the following:

  • I handle this programming language better than any other.
  • An advantage of using the GEE Python API is the easy integration into a Data Science/Analysis workflow with the help of libraries used in this field such as NumPy, Pandas, Matplotlib, Scikit-Learn, GeoPandas, among others; as well as a development and documentation process in Jupyter Notebooks.
  • Some libraries facilitate the manipulation of some GEE objects. The ones I used were: eemont and geemap. Both allowed me to perform various operations, from manipulation on GEE objects to transforming GEE objects to pandas.DataFrame or GeoPandas.GeoDataFrame objects. They were handy to get time series of some variables.

The workflow

Like many of today’s data-related projects, the stages are:

Workflow of the project. Diagram made by the author.

The (main) tools

I used Python throughout the project, here are the libraries I used (in an image).

Main tools (Python and R). Figure made by the author.

The data

I used datasets from three different sources:

  • Forest fire records from 2017: provided by CONAFOR (National Forestry Commission)
  • Hot spot records (2001–2020): provided by FIRMS (Fire Information for Resource Management System).
  • Collection of satellite images: provided by GEE (see the figure above)
Image Collections from Google Earth Engine. Figure made by the author.

Exploratory Data Analysis

An image says more than a thousand words, so I summarize Chapter 6 (named Exploratory Data Analysis) in the following data visualizations.

Hot spot accumulation

Between 2001 to 2020, high hot spot concentrations with a level of confidence above 84% (this level of confidence means the confidence that the observation is actually fire) is were in the centre and east of the state. (Data provided by FIRMS)

The level of confidence means the confidence that the observation is actually fire. Map made by the author.

Affected municipalities (counties)

Although Tekax had the highest records during 2017, it did not have the most significant affected surface area; that municipality was Tizimín. (Data provided by CONAFOR)

Lollipop plot made by the author.
Lollipop plot made by the author.

Problematic trimester

With the help of CONAFOR and FIRMS datasets, the months with the highest wildfire occurrences were found.

Bar chart made by the author.
Time series made by the author.

Time series of environmental variables

Knowing the variable’s data distribution is not enough. It’s essential to understand the history of the variable’s behaviour to find patterns or relevant periods.

The figure shows that March, April and May are the hottest months; also that the year 2017 was one of the driest (low soil moisture and PDSI)

Heat maps made by the author.

Interactive maps

Thanks to folium and geemap, I could visualize the data in an interactive map. The following map shows the affected surface by wildfires between 2001 and 2019 and the land surface temperature in the time range.

Land surface temperature and affected surface by wildfires from 2001–2019. Map created by the author and made with folium and Google Earth Engine data.

Conclusions

The Google Earth Engine platform is a tool rich in information and with a development environment (the Google Earth Engine Code Editor) friendly for users without a high level of programming skills, as well as for users with experience in the area of Data Science and Analysis, through its API for Python. The opportunities offered by this platform go beyond forest fires, as it can cover different environmental issues such as drought, water availability, greenhouse gas monitoring, and even social issues such as human settlements and how they affect the ecosystem in which they develop.

More information

All the processes of the methodology and the exploration of the data are documented in Jupyter Notebooks. These notebooks are located in a GitHub repository. Unfortunately, the repository is in Spanish, but you can contact me if you have any questions.

When I was doing this thesis, I had the opportunity to attend and present a poster at the 9th International Fire Ecology and Management Congress organized by the Association of Fire Ecology. As a result, I created a GitHub repository with similar information in English.

About me

My name is Isaac Arroyo, and I have recently graduated from Engineering Physics. I’m interested in Data Science, Machine Learning (especially on Unsupervised Learning algorithms) and their applications on high-impact projects.

You can contact me via e-mail (isaacarroyov@outlook.com) or social media, like LinkedIn (Isaac Arroyo), Instagram (@unisaacarroyov) or Twitter (also @unisaacarroyov ). Click here to contact me.

References

[1] Gorelick, N., Hancher, M., Dixon, M., Ilyushchenko, S., Thau, D., & Moore, R. (2017). Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote sensing of Environment, 202, 18–27.

--

--

On a journey to be a Data Journalist | I love data visualization, the arts and social impact projects