The world’s leading publication for data science, AI, and ML professionals.

Recycled Energy Saved in Singapore – A Data Analysis Project

Learn how much Singapore is saving energy per years by recycling plastics, paper, glass, ferrous and non-ferrous metal.

Image by Author | Elements by Macrovector, Freepik
Image by Author | Elements by Macrovector, Freepik

Introduction

Singapore has a new milestone of becoming a zero-waste nation as government is worried about the rising number of waste disposal. At the current pace, the Semakau Landfill will run out of space by year 2035 which is an alarming situation for Singaporeans (towardszerowaste.gov.sg). Making matters worse, Singapore has limited land for building new incineration plants or landfills. The government would like to motivate citizens by sharing the total energy that the combined recycling efforts have saved every year.

Overview

We will be using Recycling statistics to calculate energy saved every year from 2003 to 2020 based on five waste types, plastics, paper, glass, ferrous and non-ferrous metal.

Code

Loading Datasets

we will be using Plotly for visualizations and Pandas for Data Analysis. Our dataset is divided between two timelines, from 2003 to 2017 and 2018 to 2020. The 2020 data was manually added using the NEA website. You can find additional energy data per material type on Greentumble .

Waste Data 2018 to 2020

After initial exploration, we found that the names of the columns were different in both datasets so we will be renaming and converting values from thousand tonnes to metric tonnes for a dataset of 2018 to 2020.

Let’s add recycling rate into our DataFrame as we will be using it for further analysis.

Energy Saved Dataset

As we can see energy saved dataset is not in proper tabular format. Let’s convert it into readable format.

  1. Transpose
  2. Removed first two columns and first row
  3. Resetting index
  4. Renaming the columns

As we can see, we have three columns, material, _energysaved, and _crude_oilsaved.

Waste Data 2003 to 2017

We will be limiting our columns to match the columns of 2018 to 2020 dataset so that they can easily be concat.

Data Analysis

We will be stacking both datasets as they have similar columns. The final dataset contains samples from the year 2003 to 2020.

Let’s visualize total waste generated and total waste recycled per year.

The garbage collection has rapidly increased till 2017 and after that, we can see the decline due to government intervention. Due to COVID-19 pandemic the amount of collected waste have seen sharp decline in past year NEA.

By analyzing categorical waste type in our dataset and we can clearly see how the same category has different names. Let’s do some text processing and try to make categories similar to the material mentioned in _cleaned_energysaved.csv.

By using simple string replace, we have normalized our categories so that we can merge our data on _wastetype.

To check that we have successfully normalized the categories, let’s merge both datasets on _wastetype and material.

We need to convert energy saved from string to integer by removing ‘kWh’ and ‘Kwh’.

As we can see, the energy saved is successfully converted into an integer, so that we can use it to calculate the total energy saved per year.

We need to create new featuretotal_energy_saved by multiplying total_waste_recycled_tonne and energy_saved .

Visualization

Let’s calculate the mean recycling per waste type. Ferrous and Non-Ferous metal has high recycling rate overall and plastic have lowest recycling rate.

We have to check our final data for outliners and patterns by using boxplots. We have found that there was an anomaly in the year 2018 and to figure it out we have to check our dataset.

After going through the total waste recycled of 2018, we discovered that the total waste generated for Ferrous Metal was 126900 tonnes but the total recycled waste was 126000. As we know the mean recycling rate for Ferrous metal is 90+percent but it was showing 10 percent, which was odd, so I went back to the original data on the site and discovered the mistake. We can see in the PDF that the entire zero was missing.

Let’s update the value and check the box plot again and as we can see it looks perfect now. The moral of the story, always go back and check the data for mistakes.

The Box Plot of total energy saved is all over the place as some of the material produce higher energy kWh per metric tonne.

We be interacting more with our data and look for patterns in a multilevel scatter plot.

As we can see total energy saved from paper and plastic have significantly reduced in past few years due to government initiative to control the waste production.

Energy saved per year

It’s time to calculate energy saved every year from 2003 to 2020 based on five waste types, plastics, paper, glass, ferrous and non-ferrous metal.

  • Group by per year
  • Summarize and extract total energy saved
  • Converting it into Pandas dataframe
  • Converting total_energy_savedfrom float to integer

We are now going to make annual energy saving readable by:

  • Converting kWh to GWh
  • Rounding it to two decimal places.
  • Adding suffix of GWh
  • Showing past five-year data

Final Thoughts

We have cleaned our data and made sure that it’s ready for merging with other datasets. We have also learned how to detect anomalies in datasets and creating new features. This project was simple, but it taught us a lot of things about Data Cleaning and data visualization. Due to Covid 19 the amount of garbage collection has halted and with that recycled energy saved in 2020 was comparatively low. The government initiative has also impacted the overall garbage production as people are moving away from plastics to more nature-friendly materials. The paper and non-ferrous material produce the highest amount of energy as compared to other waste types and we have seen a reduction in both materials.

This article is beginner-friendly with detailed explanations on data cleaning and visualization. I have also shared my project code below so that you can clone and start interacting with the project files.

Code is Available at:

Deepnote

Singapore Recycled Energy

GitHub

GitHub – kingabzpro/Annual-Recycled-Energy-Saved-in-Singapore: Learn how much Singapore is saving…

Kaggle

Singapore Recycling and Waste Management

Learning Resource

DataCamp Course: Data Analyst with Python

Data Analytics Made Accessible

Data-Analysis/medium at master · WillKoehrsen/Data-Analysis


Related Articles

Some areas of this page may shift around if you resize the browser window. Be sure to check heading and document order.