The world’s leading publication for data science, AI, and ML professionals.

Automatizing the heating system scheduling with reinforcement learning

An example based on experience of my own family

In the context of climate change and the energy crisis, energy efficiency has been a topic in focus. Buildings, responsible for the majority of energy consumption, are worth considering a smart strategy to improve the usage of energy. Some people will argue that things on a global scale are too far away from me, but even in the context of today’s increasingly expensive energy, even from the perspective of saving money, this issue deserves more discussion.

In this context, I write this article inspired by the experience of my own family:

  1. We have three people, two adults, and a baby, all with different daily routines, i.e. the adults go to the office or stay home for remote work and the baby goes to the nursery on fixed working days and working hours.
  2. The adults and the baby have different demands of comfort level, i.e. I can tolerate lower temperature while the baby has to stay warmer. Besides, we usually heat less when we are sleeping.
  3. In reality, the majority of people, we included, have no idea what exactly our ideal temperature is. All we can do is press the "plus button" on the interface of the heating system every time we feel cold. And the heating system will take action to reply to our demands until we are satisfied with the environment. Generally speaking, press the "plus button" to tell the system that "I am cold" is the only interaction we users have with the heating system and it will have no access to other information. See the figure below.
Image by author: how users interact with the heating system
Image by author: how users interact with the heating system

The problem I am going to discuss today is whether the heating system can learn to execute the optimal action to save energy and guarantee the comfort level of all family members by the interaction with users.

To simplify the problem, I suppose that the heating system only has two modes: on and off.

In this context, I would like to build a reinforcement learning model such that the heating system would when to turn on and off based from the interaction with users in order to reduce the energy consumption while guaranteeing the comfort of all family members.

Simulation of the heating system environment

Let us quickly the the goal of reinforcement learning is for the agent (the heating system in our case) is to learn optimal actions that maximize the reward function through trial and error by interacting with the environment.

I would like to build a simplified heating system environment (in winter) inspired by experiences of my family:

  1. There are three people with two different situations: at home or out of home. Each of them has a lower temperature bound, that is, the person will press the "plus button" to tell the system that "I am cold". Of course, no interaction would happen when nobody is at home. Moreover, we suppose we can have lower temperature at night. The temperature is set to be when nobody is home.
  2. We discretize the model by every 10 minutes, i.e. we compute the indoor temperature every 10 minutes and record if users press the button or not every 10 minutes. As a consequence, the heating system has to execute an action every 10 minutes based on the state of the environment.
  3. The value of indoor temperature at every instance depends on the indoor temperature value and if the heating is on of the last instance. We computes its value by discretizing a simplified model described by an ODE.

Back to the language of reinforcement learning language. In this simulation, the state of the RL environment is described by two parts: if the users feel cold (the heating system learns from the interaction) and the time (day of the week, hour, minute]). And the agent, or the heating system learns to learn if would turn on (1) or off (0) based on such a state.

We define the reward function as:

*Reward=-sum(action+1.5(feel_cold)),**

that is, we penalize every time when the heating is on and if anyone home feels cold. Remark that we penalize more if some one feels cold to guarantee the comfort level at home.

Image by author: illustration of the RL system
Image by author: illustration of the RL system

Q-learning for optimal actions

To learn the optimal actions, we would apply the Q learning algorithm which is simple yet powerful. The method constructs a Q table with a dimension of the product of the state space and the actions space dimensions. It updates the Q values of corresponding cells at each step as the weighed average of the old value and the new information of the obtained next state with the current state and an executed action.

Below is the core part of the Q-learning algorithm:

Automatized scheduling

Now the environment and the learning algorithm are built, all we have to do now is to put every piece together and learn the best scheduling of the heating system. After 1e4 training epochs, the reward goes up to -533 from -1173. Below is a plot of the learned heating scheduling for one week (1 for on, 0 for off given by gray dots) and the simulated indoor temperature (the blue line). The orange line represents the lower bound of the indoor temperature. You can see that the system takes as less "on" action as possible and let the indoor temperature stay above the lower bound.

Image by author: Heating scheduling and indoor temperature in one week
Image by author: Heating scheduling and indoor temperature in one week

I will also give another similar plot but with a duration of only one day during which nobody is home for hours. You see more clearly that the heating keeps off until 30 minutes before family members start to come back home.

Heating scheduling and indoor temperature in one day
Heating scheduling and indoor temperature in one day

Conclusion

In this article, I give an example of how reinforcement learning can help to automatize the heating system, succeeding in saving energy and guaranteeing a certain level of comfort in a simulated environment inspired by my own family’s situation. I believe this procedure can be implemented in other buildings, including office buildings with a much larger occupancy and the Heating Systems would learn from interactions with users to reach a globally optimal solution.


Related Articles