The world’s leading publication for data science, AI, and ML professionals.

Using Data to Help Turn Household Waste into Local Clean Energy

Check out my project here: https://github.com/tcastanley/MSW-to-Energy-Feedstock-Analysis!

From the Bin, To the Bulb: Turning Household Waste into Local Clean Energy

Photo by Glen Carrie on Unsplash
Photo by Glen Carrie on Unsplash

For my final capstone project in Flatiron School’s Immersive Data Science Program, I decided to test my newfound skills and continue furthering my personal investigations into the relationships that exist between data, waste, and energy. Recently, I have been learning more about the various ways that Municipal Solid Waste (MSW) can be transformed into energy. The most promising and efficient technology that I have come across to date is Plasma Arc Gasification. In my research, I discovered that understanding specific composition details about the MSW to be used as feedstock is one of many critical steps in designing a plasma gasification facility. What I set out to do for my capstone project, was to see if I could find some MSW collection datasets and perform a Feedstock Analysis with the intent of calculating specific Waste Type Compositions, Energy Density (kWh/kg), and Total Energy (kWh) for each sample.

Plasma Gasification

Before we get into the data analysis required for calculating proximate MSW Energy values, lets talk a bit more about the paradigm-shifting technology, Plasma Gasification.

"The conversion of carbonaceous material into a gaseous product for the production of energy products and by-products in an oxygen starved environment." Municipal Solid Waste to Energy Conversion Process", Gary C. Young , 2010

The technical definition above is quite self-explanatory, though can be even further simplified by thinking of this process as being able to take almost any material (except metals and glass) on the planet and convert it into its basic molecular components. This product is called "Syngas" and is mostly comprised of Carbon Monoxide (CO) and Hydrogen (H2). Syngas can be further processed into many different valuable chemical products, though this depends on the facilities purpose and design. I am interested in understanding how to create electricity from MSW.

Proposals for gasification facilities require revenue projections just like most business undertakings. These projections depend on understanding the energy in the fuel that one uses as feedstock for the gasification process. This is where determining the energy density, through a feedstock analysis, comes into play. I want to build one model that can take in a specific Waste sample, measured by weight from a set list of items, and predict its energy density.

There is another by-product of this process, either vitrified slag, or biochar, that are created as a result of the gasification process. Depending on the specific composition of MSW feedstock, there exist different markets for revenue from this product.

All in all, using this technology as part of the solution the growing MSW issues globally is a no-brainer. Scrubbing technology is installed throughout the entire gasification process to ensure that air emissions comply with local heath and safety regulations. When one compares this with the alternative long-term solution of outdoor anaerobic digestion, releasing countless tons of harmful methane into the atmosphere, plasma gasification becomes just a matter of business, and how to make this solution sustainable and profitable for all stakeholders.

Data Scrubbing & Modeling

Image by Author
Image by Author

This project’s Feedstock Analysis measures Energy Density (kWh/kg) and Total Household MSW Energy (kWh) in Municipal Solid Waste (MSW) sample data collected from Belize, Solomon Islands, and Vanuatu.

This data was collected by the Asia Pacific Water Consultants (APWC), commissioned by the Commonwealth Litter Programme (CLiP) with support from the Centre for Environment Fisheries and Aquaculture Science (CEFAS). The original purpose for the collection of this data was to support local governments in developing better ocean waste management strategies.

Each dataset per country had over 95,000 rows that needed to be cleaned and reformatted for my purposes. There were many repeating items in the datasets as each MSW sample was measured using 3 different methods: weight, count, and volume. To calculate the energy density, I just needed to look at the weight values (kg), and filtering for this substantially reduced the number of rows in each dataset. After removing all redundant and unnecessary values, and reshaping the dataset I was ready to model. The final shape of the dataset used for both top models was (438, 29), representing 438 different household MSW samples with a specific combination of 29 item features each. These data, combined from 3 different countries, were used to train and test my top performing model of the project.

I went through many different model types as part of my process. I started with some basic Linear Regression models using Statsmodels and Sci-kit Learn, both of which weren’t ideal as they overfit the data substantially. From there, I moved on to Decision Trees, Random Forests, and XGBoost models which all performed much better than their basic linear regression counterparts. Finally, I ended my modeling efforts with my favourite architecture, a Multilayer Perceptron (MLP), or known more recently as a Neural Network.

Fig 1. XGBoost Model Feature Importance - Total Household MSW Energy (kWh) Image by author
Fig 1. XGBoost Model Feature Importance – Total Household MSW Energy (kWh) Image by author

One noteworthy modeling feature I want to highlight is the Feature Importance method that comes standard with XGBoost models. I visually mapped this information (Fig 1) that represents the importance, or relative influence, that each specific feature had in the model.

As is clearly displayed above, we see the top 3 important features in this order:

  1. Food
  2. Other Organics
  3. Nappies

So what does this mean? Well, since this model was trained and tested to predict the dependent variable representing total energy in kWh, we should expect to see displayed those items that contribute the most towards influencing the total energy value for each household. As it happens, these top 3 items are all organic in nature, meaning that they each contain a relatively high amount of carbon in them. Therefore, one can conclude that this model has done a good job at capturing and valuing, items with higher carbon levels rather than those with less.

Fig 2. Belize Model -Left: Average MSW Composition per Location with Energy Density (kWh/kg), Right: Collection Locations mapped, each with MSW details. Image by author
Fig 2. Belize Model -Left: Average MSW Composition per Location with Energy Density (kWh/kg), Right: Collection Locations mapped, each with MSW details. Image by author

Now it should be stated here that this is where the dangers of bias creep in to data science analyses like this one. The nature of my calculations required to label these data was done using predetermined scientific formulae (see _Net Heating Value Wiki_) that by definition insert the relative importance of highly carbonaceous materials into the data. This is a good sign that the model reflected this though, as it is a scientific fact that materials with higher concentrations of carbon are more energy-rich. It is still vital to remember however, that biases like these are always present in data, and remembering to appreciate and reflect on this fact constantly is critical to understanding as many nuances as possible for each model generated.

Results

Below is a snapshot comparison of the top models that I had for each country, as well as the final model which combined data from all 3. The Train-Test Split is how many instances of data that the model was trained on versus tested. For each dependent variable I show the median value per dataset, compared with each models respective Root Mean Square Error (RMSE). RMSE is a widely accepted primary metric to use when describing the accuracy of a model’s predictions for regression problems like this. Basically what is means is that my model expects its predictions to be off by a mean amount of x either +/- the real value.

Fig 3. Top Model Comparisons - The best model is the Combined Model, touting the most accurate metrics with increased data to boot! Image by author
Fig 3. Top Model Comparisons – The best model is the Combined Model, touting the most accurate metrics with increased data to boot! Image by author

As is evident above, the final Combined Model was the best performing model of all, outdoing the other 3 by a substantial margin.

Total Household MSW Energy (kWh) r-squared: 0.9943

Energy Density (kWh/kg) r-squared: 0.9575

This outcome is not a big surprise, as models that have more data to train on are able to see more unique combinations of features, and therefore can better account for them when making future predictions. Regardless, it is still very reassuring to see reality follow the intended trajectory and I am very pleased with how my final model turned out.

So What? Conclusions & Future Work

Ok so you might be asking yourself, so what? What does this all mean anyway? Where is the energy!? Well, there are a few different insights that can be drawn from this project’s results.

MSW Composition: Knowing the average MSW composition for each household, and therefore a particular region, can support the decision making of all stakeholders involved in MSW management. From MSW collection authorities, to the household user, better understanding waste composition can provide behavioral insights as well as potential socioeconomic ones. More qualitative data analysis is required for these insights however, as this was outside of the scope of my project.

Model 1 – Total Household MSW Energy (kWh): I believe that a crucial step to changing the way that people treat waste requires a changed mindset. What I hope this model can do is to serve as the basis of a user application, whereby a homeowner with proper incentive can input data about their own MSW and receive a potential energy output (perhaps with an energy credit as well one day). There exist avenues of development for an idea like this, I believe that there exists a delta between MSW and electricity utilities where users can start to view their ‘waste’ as an energy source. Developing and incorporating better incentive programs for users that are more responsible with their waste, I believe, will help to remind people of the cyclical nature of our planet, remembering that their garbage can be utilized later for better or worse, and is not simply something to be tossed away and forgotten about.

Model 2 – Energy Density (kWh/kg): This model can provide many of the same insights to all MSW management stakeholders as Model 1. However, the biggest difference lies in the more universal nature of energy density as a measurement metric. This model uses the relative weight amounts of each waste item to determine how rich one sample is, rather than trying to predict how much energy exists in total. This a more valuable measurement, especially for those invested in the business of MSW to energy. I hope to develop this model to a point where a finely curated list of MSW waste items can be established as the model’s features. From there, a feedstock analysis for a potential facility can be easily completed and reliably counted on to produce an accurate energy density score for per region.


I had a lot of fun doing this project, and I am getting more comfortable at writing these blogs as well! For those of you who made it this far, thank you very much! I plan to start a new blog series soon to assist me with my data science revision and future learning so stay tuned for that! If anyone has any questions, comments, or just wants to reach out for a chat please feel free to send me an email @ [email protected]! Cheers!


Related Articles