Notes from Industry

Introducing a Novel Very High-Resolution Dataset of Landfills and Waste Dumps

A novel dataset consisting of very high resolution multi-spectral satellite images of landfills from Germany, Hungary, Serbia, India and Brazil

Anupama Rajkumar

Published in

Towards Data Science

12 min readNov 14, 2021

Menace of illegal waste dumps and landfills

Waste disposal is a serious problem with most of the world’s waste finding it’s way into the dumps in south-east Asian and African countries. In addition to landfills, waste is illegally disposed off along river banks and beaches. While some of these landfills may be legal and have a proper recycle and safe disposal system in place, it is the large illegal waste dumps that pose a threat not only to the environment in terms of contaminating soil, water and air but also is hazardous to the health of the communities.

It has been challenging for the governments and municipalities to keep track of these mushrooming landfills so as to keep the menace in check. While these landfills can be monitored through drones, the possibility of using Earth Observation (EO) through satellite imagery can allow us to monitor a larger area at once resulting in a better waste monitoring and management system.

Possibility of using earth observation (EO) for detecting landfills

Thanks to the presence of a plethora of space satellite sensor programs that are currently running, the amount of satellite sensor data of various resolutions — spatial, temporal and spectral, runs into the order of petabytes. The remote sensing sensors in space can be categorized into imaging and non-imaging sensors. If we only talk about imaging sensors, there are space programs with optical, thermal and radar imaging sensors. The detailed description of each of these sensors is beyond the scope of this article.

The table below specifies some of the most popular and widely used satellite sensors.

Table with technical specifications of various satellite missions. All the satellite sensors except RADARSAT and Sentinel are optical sensors. RADARSAT and Sentinel contain Synthetic Aperture Radar (SAR) sensors. Image by author

Hence, we can see that when it comes to having access to data for EO, we are indeed spoilt for choice and this makes studying the possibility of using EO for detecting landfills a worthwhile exercise. EO has already been for several applications like detecting ships in ocean (remember Ever Given stuck in the Suez Canal?), impact of flooding, de-forestation, illegal mining, railway-track monitoring to name a few.

While most of the satellite data that is available is open-source and free of cost for research, some commercial programs like WorldView by Maxar Technologies that provide very high resolution satellite images can be purchased for a small fee.

Specification of the novel landfill dataset

In this section, I talk about the motivation and process behind choosing a certain set of satellite images in order to create the dataset. In addition, I discuss the methods used for creating accurate annotations for the images.

Process of creating the dataset

The biggest and most crucial challenge of any machine learning pipeline is choosing the right dataset. The availability, size and quality of the dataset chosen goes on the decide the machine learning architecture and the further technicalities and hyper-parameters that would be used to eventually solve the problem at hand.

For the project, in the absence of any publicly available dataset for landfill detection, I had to create my own. While there are several available datasets like UC Merced Land Use Dataset [1], DeepSat [2], Urban Atlas [3], BigEarthNet [4], these datasets mostly provide an extensive information about general land use classification categories like land, field, forest etc. BigEarthNet does provide a class for dumps but because the dataset is created from low resolution Sentinel satellite images (refer previous table), it does not suffice to detect small landfills that might be sub-meter in size. Hence, this project called for a custom dataset that catered specifically to landfills. It is important to note that in the first phase of the project, I did not classify the landfills based on the type of waste. That is to the say, the landfill sites picked for the dataset creation are waste agnostic. The classification based on the type of waste can be considered as a future enhancement of this project.

Eventually, I decided to use images from very high resolution (VHR) optical images from WorldView-2 (WV-2), WorldView-3 (WV-3)and GeoEye-1 (GE-1) satellite programs with spatial resolution of 46cm, 30cm and 41 cm respectively. The corresponding spectral resolution for these satellite programs are 8 multi-spectral (MS) bands, 8 MS bands and 4 MS bands respectively in addition to 1 panchromatic band for each of them. In addition, the temporal resolution of these programs range between 1–5 days. These parameters make the dataset created from these images highly suitable for monitoring and detecting landfills from satellite images.

These VHR images, unlike Sentinel images are not free of cost. However, European Space Agency (ESA) provides free access to a limited repository for research purposes. Once I had finalized the satellite programs that I would be working with, I requested for image tiles from ESA. In order to get access to these image tiles, I identified 13 major known landfill locations across Europe, Asia and South America and sent them to ESA who in-turn gave me access to very high resolution images of these landfills.

ESA typically provides the images as a set of GeoTIFF files— 1 black and white panchromatic image with very high spatial resolution for example 30cms for WV-3 but low spectral resolution (1 spectral band) and 1 multi-spectral image that has high spectral resolution for example 8 spectral bands for WV-3 but low spatial resolution like 1.24m for WV-3. Pan-sharpening is a technique where panchromatic and multi-spectral images can be fused together to get resulting images that have high spectral as well as high spatial resolution ie. a spatial resolution of 30cms and a spectral resolution of 8 bands for WV-3 images. Hence, pan-sharpening is a way to get the best of the both worlds. Although traditional methods like interpolation are popularly used for pan-sharpening, more recently deep learning has been employed to achieve high resolution multi-spectral images. I perform pan-sharpening on the landfill dataset to get high-resolution multi-spectral dataset. In order to do this, I used Orfeo Toolbox [5] in QGIS [6]. As a result of this, I essentially created 2 landfill datasets — one consisting of only multi-spectral images and the other with pan-sharpened images.

Landfill at Vinca, Serbia. a.) Multi-spectral image with low spatial resolution but high spectral resolution. b) Panchromatic image with high spatial resolution but low spectral resolution. c) Pan-sharpened image with high spatial and spectral resolutions. Image by author

The list of these sites and the size of the images provided by ESA, before and after pan-sharpening is provided in the table below:

Some image tiles with the landfill areas marked in black:

Landfill locations marked in black in the image tiles provided by ESA in India and Hungary. It can be seen that in the entire image tile, the number of pixels corresponding to the landfills is much smaller than the pixels that do not contain landfill information. Image by author.

The dimension of the image tiles provided by ESA is very large. To this end, I divide these tiles into smaller patches of 512x512 pixels. The number of patches obtained for each of multi-spectral and pan-sharpened datasets are listed in the table below.

Dataset image patch ratio. Image by author

It can be seen that although the total number of resulting patches is considerably high, the number of patches with landfill pixels is quite low as shown in the plot below. This leads to imbalance in the dataset. As can be deduced from the plotted ratios below, the dataset is imbalanced and patches with landfill pixels are fewer. To this end, to train the models, only those patches with landfill pixels in them are considered. This is due to the fact that even the patches with landfill pixels have more background pixels than landfill pixels.

Plots indicating the ratio of the total number of 512x512 patches and patches with landfill pixels for multi-spectral and high resolution pan-sharpened dataset. Image by author

Consequently, the created dataset consists of 55 multi-spectral and 242 pan-sharpened patches of 512x512 pixels each with landfill information that can be further divided into training, testing and validation sets as desired by the machine learning engineer to train their models.

Process of creating highly accurate labels

Once image patches were created, the next most important task was to label the landfills accurately so that it could serve as the ground truth data. Since the task at hand was to detect landfills by performing semantic segmentation, I created segmentation mask with the help of QGIS [6].

Accurate labels are almost as important as the quality of data, if not more. In any machine learning pipeline, garbage in is garbage out. So, if your training data or the corresponding labels is incorrect, there is no way the output would be desirable. Even the best architecture cannot compensate for bad data. Hence, creating accurate labels were crucial for this project. For this, I followed a 2 step strategy — visual inspection and calculation of vegetation indices of the image patches.

Visual inspection as well as VIs are some of the traditional methods that have already been employed in the literature for landfill detection. But they have proven to be unreliable and at the same time tedious and time consuming which is why we are trying to apply machine learning to find a solution to the problem. But the traditional methods of visual inspection and VIs happen to be a rather useful tool when we try to create ground truth labels. The following sub-sections discuss about how these methods were used to create the labels:

Visual inspection

This is the simplest method that could be used for labeling. Identifying the area corresponding to landfills by visual inspection served as the first level while creating labels. This method helped in eliminating the regions that did not qualify as landfills because the landfill locations were provided by me and hence I was aware of their exact geo-location.

Through visual inspection, I could identify the landfill in Erd, Hungary as marked in the black rectangle in the image below.

Landfill in Erd, Hungary. Image by author

Vegetation indices (VI)

Although it was possible through known geo-locations to identify the landfill in Erd, Hungary, it would not suffice to create a segmentation label of the entire region. If you look closely in the landfill region marked in Erd in the image above, you can see that some areas within the landfill region is covered by vegetation. Likewise, some other landfill locations had buildings or come other structures contained within them which could not be classified as landfill. Marking or labeling them as landfills would confuse the neural network resulting in incorrect predictions.

To get a better idea of the landscape within the landfill area and to rule out the possibility of labeling an area of vegetation as landfill, I made the use of VIs to make an analysis which helped me create accurate labels.

Vegetation indices (VIs) obtained from remote sensing images for canopies are an easy way for evaluating the vegetation cover, health and growth dynamics of canopies. These indices make use of visible light radiation especially green spectra region from visible and non-visible spectra to obtain the quantification of vegetation surface. A mixed combination of soil, weeds and plants of interest makes the calculation of simple VI a very difficult task. There are many VIs that provide useful information about the vegetation cover. Some of these are NDVI, SAVI and MSAVI. I calculated these VIs with the help of QGIS [6] to get an insight of the landfill region. The detailed description and calculation of VIs is the beyond the scope of this article.

The calculated of NDVI of the landfill at Erd, Hungary is shown below. The NDVI scale represented at the far right of the image represents the NDVI scale which basically signifies that deeper the green in the NDVI (or larger the NDVI value), healthier the vegetation in the landscape and vice-versa. As can be seen in the NDVI map of the landfill region, not the entire area can be marked as landfill as some regions that have been shown in green represent vegetation within the landfill. Labeling these regions as landfill pixels would lead to wrong output from the machine learning model.

Landfill at Erd, Hungary and the corresponding NDVI values with the scale. Image by author

Following the visual inspection and calculation and analysis of VIs, I could create segmentation masks that were accurate. Some of these are shown in the image below with landfill patches and their corresponding masks overlapped on them.

Landfill patches with overlapping segmentation masks used as the ground truth. Image by author

The segmentation masks were created with the help of QGIS as already mentioned earlier. In the dataset folder, the masks are stored as a GeoJSON file which consists of a list of the coordinates of each of the edge of polygon that constitutes the mask. Each landfill patch has a corresponding GeoJSON file. The patch name along with the respective GeoJSON mask file has been listed in a .csv file. This .csv file can be used for loading the landfill patches and corresponding masks by the data loader during implementation.

Detecting landfills in Hungary — some results

On applying supervised deep learning algorithms for semantic segmentation to the validation set from the landfill dataset provided some useful results. The deep learning models are successful in identifying landfills of size in the order of a few meters which is quite useful for monitoring and detecting illegal landfills as they are usually spread over smaller areas. Some of the segmentation results for landfill patches are shown below.

The specifications of the deep learning architectures that were employed along with associated parameters and results will be shared in the future articles. However a brief technical overview can be found here : https://www.euspaceimaging.com/combining-vhr-satellite-imagery-and-deep-learning-to-detect-landfills/

Process to request access to the dataset

The aim of this article is to present a novel landfill dataset consisting of VHR images with high quality and accurate segmentation masks. In addition, I wanted to discuss and share my experience of the entire process around choosing the appropriate data and the workflow and thought behind creating the annotations or the labels for this data on my own. I hope new machine learning engineers will find this useful.

In future, I will host the dataset on a server from where it can be downloaded directly. However, at the present moment, in case you want to access the dataset, kindly follow these steps:

Star the project repo at this location: https://github.com/AnupamaRajkumar/LandfillDetection_SemanticSegmentation. This allows me to keep track of the people who have requested to access the dataset
Write me an e-mail at anupamar228@gmail.com providing me your e-mail address that is linked to your DropBox and I will provide you access to the repository currently stored in my personal DropBox.

Acknowledgements and citation information

I would like to thank ESA for providing the VHR images free of cost for research purpose. They have a huge repository of satellite images of all kinds and specifications which has been a boon for research in Earth Observation.

ESA has kindly obliged to make the dataset open-source for research purpose. However, it is bound by copyright laws. In case the images from the dataset is used, “©DigitalGlobe, Inc. 2021 , Data provided by the European Space Agency” must be included in order to avoid copyright infringement. I have already included this information in LicenseInfo.txt in the dataset folder, be sure to use it.

If you decide to use the images from this dataset, please do not forget to cite as mentioned in ReadMe.txt in the dataset folder.

References

[1] Yi Yang and Shawn Newsam, “Bag-Of-Visual-Words and Spatial Extensions for Land-Use Classification,” ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM GIS), 2010, http://weegee.vision.ucmerced.edu/datasets/landuse.html

[2] Saikat Basu, Sangram Ganguly, Supratik Mukhopadhyay, Robert DiBiano, Manohar Karki, and Ramakrishna Nemani. Deepsat — a learning framework for satellite imagery, 2015.

[3] Urban Atlas. https://land.copernicus.eu/local/urban-atlas. Accessed: 2021–11–14.

[4] Gencer Sumbul, Marcela Charfuelan, Begüm Demir, and Volker Markl. Bigearthnet: A large-scale benchmark archive for remote sensing image understanding. In IGARSS 2019–2019 IEEE International Geoscience andRemote Sensing Symposium, pages 5901–5904, 2019.

[5] Orfeo Toolbox. https://www.orfeo-toolbox.org/CookBook/Applications/app_Pansharpening.html. Accessed: 2021–11–15.

[6] Developers Guide for QGIS. https://docs.qgis.org/3.16/en/docs/developers_guide/index.html. Accessed: 2021–11–15.