The world’s leading publication for data science, AI, and ML professionals.

How Data Scientists Can Reduce CO2

Optimize operations by shifting loads in time and space

Data scientists have much to contribute to climate change solutions. Even if your full-time job isn’t working on climate, data scientists who work on a company’s operations can have a big impact within their current role. By finding previously untapped sources of flexibility in operations, data scientists can help shift loads to times and places where the electricity grids have a higher share of carbon-free energy, such as wind and solar. This load shifting allows the grid to transition faster to higher shares of carbon-free energy, and it can also reduces operating costs as well.

Data Science contributions to climate solutions

Before getting into specific opportunities in optimizing operations or infrastructure, I would like to acknowledge the broad stage that data scientists have in working on climate. Much of the current excitement about applying data science to climate has been around applications of ML and big data, and rightly so. An excellent starting point is climatechange.ai, an organization of volunteers that has built an extensive community of people who work at the intersection of climate change and AI. Their website includes summaries of each of dozens of "climate change solution domains” described in the 2019 paper Tackling Climate Change with Machine Learning [1]. While the solution domains are intended as a guide to high impact ML climate applications, many domains also lend themselves to more "classic" data science methods from statistics and operations research. The list of possibilities is vast, and it can be difficult to know where and how to get started. For data scientists looking to get more engaged on climate problems, either on 20% projects or in pivoting their career trajectory, the Terra.do bootcamp and the workonclimate.org Slack community are good places to meet others and find resources.


A grid in transition creates opportunities: variations in carbon intensity

As the electricity grid evolves towards carbon-free energy in the coming decades, we’re in a unique transition period in which we’re living with a mix of clean and dirty energy. Even in the same region, certain hours of the day when the wind blows harder or the sun shines stronger can be much less carbon intensive than other hours of the day, and predictably so. This predictable variation in carbon intensity across regions and across hours within a day creates opportunities to reduce carbon.

If your management already cares about optimizing costs from operations, it’s already in your charter to look for optimizations that can reduce carbon footprints. As I’ll describe more below, carbon reduction can come from finding untapped sources of flexibility in operations to shift loads to cleaner sites or cleaner times of the day. Building the capability to take advantage of this flexibility will likely allow operations to reduce costs as well. So you don’t need to wait for a new "green" mandate or program to get started. Just get started in a small, concrete way to explore and demonstrate what benefits may be possible with more time and investment.

Let’s take a look at how variable carbon intensity actually is. Figure 1 shows the regional average carbon intensity by hour throughout one sample day in each of the regions where Google owns datacenter sites around the world. For perspective when reading these charts, the median carbon emissions intensities of electricity generation are about 1000 kg CO2/MWh for coal, 900 for oil, 500 for natural gas, and <50 for wind, solar, hydro and nuclear sources based on total lifecycle Emissions. In Figure 1, we see 2.5x variation in carbon intensity across sites. We also see large variations throughout the day within some sites. The dirtiest hour of the day is 46% more carbon intensive than the cleanest hour of the day, when averaging over all days of the year and all our datacenter sites. Other research shows similar results when looking at temporal and spatial variations in carbon intensity. A December 2019 report from the National Academy of Sciences on Tracking emissions in the US electricity system [2] reports the distribution of 2016 carbon intensity across 20 different US balancing authorities, and shows a __ ~4x difference in carbon intensity between the cleanest 25% and dirtiest 25% regions in the US, with similarly large temporal variations within each region as we see in our Google sites. (A balancing authority ensures that power system demand and supply are balanced within its region by controlling the generation and transmission of electricity in its region and between neighboring balancing authorities.)

These variations are fairly predictable when forecasting hourly carbon intensity, and carbon intensity data and forecasts are available from third party providers like Tomorrow and WattTime. In the forecasts that we’ve tested, we’ve found that the mean absolute percent error (MAPE) of the forecasted average carbon intensity by hour is in the 3–8% range for horizons up to 16 hours ahead, and in the 10–15% range for horizons up to 32 hours ahead. While there’s still plenty of room to improve these forecasts, this accuracy is already more than adequate to optimize operations a day in advance based on carbon intensity forecasts.

These regional variations in carbon intensity in our electricity grids will be with us for a while. Getting close to zero carbon intensity for the majority of the grids will take time – possibly decades. 22% of new power plant spending globally in 2020 still went to coal and gas plants [3]. And even with no new fossil fuel plant construction, one study estimates that it would take until 2035 before 73% of existing fossil fuel plant capacity in the US reached its typical end of life span [4]. Our observation is that intraday variation is a characteristic of grids that have a mix of fossil fuel and carbon-free energy. Figure 2 shows how the intraday variation in carbon intensity trends with the amount of carbon-free energy in the grid, with the vertical lines indicating the amount of carbon-free energy in Asia, US and Europe in 2018. On the left side of the graph, we see regions that consume energy mostly from fossil fuel plants, and there is little intraday variation. On the right side of the graph we see regions that consume mostly carbon-free energy, where average carbon intensity is very low and hence intraday variation in carbon intensity is low. In the middle of the chart, there is high intraday variation over a wide range of carbon-free energy mix between 10–80%, with the variation plateauing around 30–50%. The intuitive explanation here is that as grids add wind and solar (intermittent sources) without storage, the share of carbon-free energy increases but the impact is concentrated in certain hours of the day when wind and solar capacity are more available. From the same National Academy of Sciences report referenced earlier [2]: "The need to capture heterogeneity becomes more pressing as electric grids absorb greater amounts of renewable energy, whose availability typically varies in time and space. In such grids, demand will need to become more responsive."


Impacts of load shifting on the grid: marginal and average CO2

Shifting loads to hours and regions with lower average carbon intensity lowers a company’s carbon footprint, based on typical carbon accounting rules. But how does this actually impact the carbon emitted by the grid? This isn’t a zero sum game where consumers compete for a fixed supply of carbon-free energy. Load shifting can lead to real change by decreasing energy production from high-emissions generators (like coal plants) or by helping transition the grid to providing more carbon-free energy.

The distinction between average carbon intensity and marginal carbon intensity is important to understand when thinking about the impacts of load shifting. Average carbon intensity at a given point in time is the ratio of total emissions to total power for sources that have been dispatched to meet demand at that time. Marginal carbon intensity is based on the marginal power plant that supplies capacity to meet the next unit of demand. Plants are dispatched by the regional balancing authority in increasing order of their costs, and the market clearing price is based on the cost of the marginal plant. Because wind and solar have no fuel costs, their price bids are typically low and those sources are dispatched first. Therefore, the marginal emissions are typically between that of a gas plant and a coal plant [5].

If load shifting in real time is possible, such as with an IoT device, then following low marginal carbon intensity can be an intraday demand response strategy that can reduce carbon emissions by shifting load away from times and places with high-emitting marginal plants (typically coal generators) and towards times and places with lower-emitting marginal plants (like gas generators). However, in most large scale operations, it is likely that at least some day-ahead planning is required to shift loads at scale. Optimizing day-ahead load to follow low average carbon intensity helps the grid transition to higher shares of carbon-free energy. It shifts demand to the times and places where carbon-free sources are predicted to be most productive (where and when the sun shines or the wind blows). When loads follow low average carbon intensity, the additional demand can drive a higher market clearing price in those hours and regions that have higher shares of carbon-free energy. This in turn allows investors to gain a better return from existing carbon-free assets, and encourages more new carbon-free energy investment and a higher penetration of carbon-free energy in the grid than would otherwise be economical.


The data scientist’s role in managing carbon in operations

While predictable variations in carbon intensity create an opportunity to reduce carbon, that doesn’t mean that it’s always economical or even feasible to take advantage of it. Taking action requires an organization to go through a few steps, and data scientists are in a good position to drive organizational change in each of these steps:

  1. Make the business case: identify actions and estimate costs and benefits
  2. Measure carbon footprint at an actionable level
  3. Optimize carbon in operations

Make the business case

Find the sources of flexibility in your operations in time and location that can have the greatest impact on carbon. A good first step is a rough inventory of your company’s carbon footprint to understand where the largest emissions sources are. The Greenhouse Gas Protocol establishes international standards for carbon accounting and reporting, using three categories of emissions in their corporate accounting standards:

  • Scope 1: direct emissions from sources that are owned or controlled by the company, for example, emissions from combustion in owned or controlled process equipment;
  • Scope 2: emissions from the generation of purchased electricity consumed by the company;
  • Scope 3: indirect emissions that are a consequence of the activities of the company, but occur from sources not owned or controlled by the company; examples are extraction and production of purchased materials, transportation of purchased fuels, and use of sold products and services.

If emissions from electricity use (scope 2 emissions) make up a significant share of your or your supplier’s carbon footprint, then is there short-term, day-ahead flexibility in your operations across sites or across hours of the day that you haven’t fully exploited? Longer-term, for new site location, are there alternative sites that might otherwise have breakeven economics, but would have very different carbon footprints?

Carbon should be optimized along with other costs and business objectives. Would optimizing for carbon also lower other costs, or would you face headwinds in trading off carbon versus increases in other costs or customer service objectives? Look for opportunities with tailwinds before you try to tackle those with headwinds. Flexibility creates value in operations, and in the pursuit of carbon reduction you may find that you’re developing operational flexibility that can also reduce other costs. Can you shift loads from your operational peaks, moving loads from peak times to off-peak times, or shifting loads between sites that don’t experience peak loads at the same time? If you can shift some of your operational loads in time or across sites, you can reduce costs as well as carbon. For example, at Google, developing the day-ahead capability to shift our compute loads in time or space reduces our carbon footprint, but also gives us the ability to smooth out peak compute and power usage over the day in each location. This in turn enables us to build less datacenter and server capacity, reducing our costs. Also, utilities may offer demand response incentives to pay its customers to shift loads away from peak usage times, which typically require the greatest share of fossil fuels (and have a higher marginal cost to supply due to the gas or coal fuel input cost).

Measure carbon at an actionable level

Once there’s a business case with specific sources of operational flexibility in mind, the data scientist can play a role in building the required data sets necessary to optimally exercise that flexibility. As a start, external data on carbon intensity of the grid needs to be at a necessary granularity to support optimizations, which may mean hourly carbon intensity for each operating region. Third-party providers like Tomorrow and WattTime estimate real-time carbon intensity for regions around the world and offer APIs to provide forecasts at least 24 hours ahead with at least hourly updates.

With data on carbon intensity in each operating region, the data scientists can build models to estimate the how the carbon footprint scales as a function of specific company operations. For example, in our datacenter application, we needed to build models to estimate power consumption as a function of compute and storage resource consumption, and we needed to do so at a fairly granular level so that we could tie carbon footprints to specific product usage. These models can also be used to estimate the carbon footprint of products derived from the operations in order to help the company’s customers optimize their carbon footprints.

Optimize carbon in operations

With the opportunities identified and the necessary data at hand, data scientists can build decision support systems to optimize carbon along with other costs. Building the capability to shift operations in time and space not only lowers carbon, but likely also yields a high ROI in a traditional financial sense if it enables operations to run with less capacity or at a lower energy cost. A decision support system will need to allow for tradeoffs at a granular, hour by hour level. At Google, we use a carbon price in our day ahead load-shifting model so that we can optimize our operational flexibility with a single, unified objective function that considers all cost types: energy, carbon, capacity, and any other cost associated with enabling flexibility in time or space (like networking costs in the case of spatial flexibility of datacenter workloads). For the carbon cost, we use a $/ton price of carbon that is consistent with what economists estimate would be needed now to avoid scenarios of greater than 1.5–2°C warming. (The 2015 Paris Agreement in Article 2 sets forth a goal of "Holding the increase in the global average temperature to well below 2°C above preindustrial levels and pursuing efforts to limit the temperature increase to 1.5°C above preindustrial levels.") From the 2018 paper The Economics of 1.5°C Climate Change [6], median estimates of 2020 carbon prices needed to avoid 1.5°C and 2°C warming are $105 and $35 per metric ton, respectively. Industry does not need to wait for governments to enact a carbon tax to take action. We can embed a carbon price in our models that optimize operations, and we can do so while still providing a high ROI on a load shifting capability that can reduce both carbon and other operating costs. This is not a zero sum game: we have been able to realize both benefits by including both carbon and other costs in an objective function. Most of the time the carbon and business cost objectives are not in conflict, but our framework allows for economic tradeoffs in those hours or days when the objectives are competing.


Example: how we built carbon-intelligent computing into Google’s datacenters

In Google’s case, a significant fraction of its carbon footprint is from the electricity consumed by its datacenters. Google’s 24×7 carbon-free energy strategy is to consume carbon-free electricity in every datacenter in every hour of the day and every day of the year. That won’t happen by load shifting alone, but it is a significant step. Ana Radovanovic, the creator of our carbon-intelligent computing program and our energy tech lead in our Operations Data Science team, wrote about Google’s work on compute load shifting in the blog post carbon-intelligent computing. In brief: some of our compute loads, like ML training or Youtube video processing jobs, are latency tolerant, meaning that we can delay them by a number of hours without any impact to end users of our products. Also, some loads have spatial flexibility, meaning that we have the flexibility to run them in any cluster of computers within a campus, or in some cases even in different campuses across the world. Some clusters have newer and therefore more power efficient computers than others, and as we saw in Figure 2, some sites have much lower carbon intensity than others. This load shifting in time and space, planned a day ahead every day based on the forecasted hourly carbon intensity of the grid in each region in which we operate a datacenter, is the flexibility option we found to exploit predictable variations in carbon intensity. While we were first motivated by carbon reduction, we found that the load shifting capability had great potential to also reduce the cost of compute and datacenter capacity by smoothing out peaks from our load.

Ana and her collaborators plan to publish more technical details and impact results on carbon-intelligent computing in the future. What I’d like to share here is more about how this work started and moved forward in the hope of inspiring other data scientists to take action.

Ana did not wait for a new green mandate or directive from management for a new project proposal. This began as a 20% project, in the true spirit of Google’s 20% projects. Ana and a partner research engineer and energy expert, Ross Koningstein, had an idea and a shared passion to make a difference, and they started to spend some time on that idea. Then they identified executive sponsors and enlisted their input and support. This early support and input was critical to validate that management was willing to consider fundamentally different approaches to scheduling our workloads, and to support pilot efforts to prove that these methods were viable without taking on additional operational risk. Ana and Ross recruited other 20% time volunteers who brought key engineering and subject matter expertise to develop and pilot a production scale system. The idea of load shifting wasn’t new to Google, but what was novel is that it had never been presented in the context of reducing our carbon footprint. That vision quickly built a broad base of allies and helped a small team of highly motivated volunteers to work through challenges that had impeded earlier efforts at load shifting. They were practical and scrappy, making minimal changes to existing engineering infrastructure for job scheduling. They built on existing systems and initiatives wherever possible instead of competing with them.

Our exec sponsors approved the roll out of the time-shifting capability, and it is now deployed globally. With the program and business value established, the team has grown and is now tackling large-scale spatial load shifting, greater operational efficiencies, and the development of more comprehensive systems for carbon accounting. From concept to development to implementation, this was a project where analytical leadership and vision from a small group of individuals drove an idea forward.

Don’t wait to be asked, and don’t wait for permission. Just get going. There’s no time to waste.


Acknowledgements

Thank you to Ana Radovanovic, Ross Koningstein, and the carbon-intelligent computing team at Google for helping to show us what is possible. Special thanks also to team member Ian Schneider for help with analysis of carbon intensity data and for developing our carbon accounting data pipelines at Google.

References

[1] D. Rolnick et al, Tackling Climate Change with Machine Learning (2019), arXiv:1906.05433

[2] J. de Chalendar, J. Taggart, and S. Benson, Tracking emissions in the US electricity system (2019), Proceedings of the National Academy of Sciences of the USA

[3] World Energy Investment 2020 (2020), IEA, Paris

[4] E. Grubert, Fossil electricity retirement deadlines for a just transition (2020), Science

[5] D. Callaway, M. Fowlie, and G. McCormick, Location, Location, Location: The Variable Value of Renewable Energy and Demand-side Efficiency (2018), Journal of the Association of Environmental and Resource Economists

[6] S. Dietz, A. Bowen, B. Doda, A. Gambhir, and R. Warren, The Economics of 1.5°C Climate Change (2018), Annual Review of Environment and Resources


Related Articles