The world’s leading publication for data science, AI, and ML professionals.

AI for Industrial Process Control

Tuning a process oven using Reinforcement Learning

Figure 1. Reflow Oven
Figure 1. Reflow Oven

Determining optimal control settings for an industrial process can be tough. For instance, controls can interact, where adjusting one setting requires readjustment of other settings. Also, the relationship between a control and its effect can be very complex. Such complications can be challenging for optimizing a process. This article explores a reinforcement learning solution for controlling an industrial conveyor oven.

Introduction

An example of this type of equipment is a reflow oven used for soldering electronic components to a circuit board (Figures 1 & 2). The oven has a moving belt that transports the product through multiple heating zones. This process heats the product according to a precise temperature-time profile required to ensure reliable solder connections.

Figure 2. Product Exiting Oven
Figure 2. Product Exiting Oven

The reflow oven discussed in this article has eight heating zones, each with a control for setting the temperature of the zone’s heater. Sensors record the temperature of the product at ~300 points as it travels through the oven. The temperature at each point is determined by the heat transferred to the product from the heaters.

Reinforcement Learning Solution

An operator typically takes the following steps to learn the heater settings:

  • run one pass of the product through the oven
  • observe the temperature-time profile from the sensor readings
  • adjust the heater settings to (hopefully) improve the profile
  • wait for the oven to stabilize to the new settings
  • repeat this procedure until the profile from the sensor readings is acceptably close to the desired profile

The Reinforcement Learning system replaces the operator steps with a two-stage process. In the first stage, an intelligent agent learns the dynamics of the oven and creates a policy for updating the heater settings under various oven conditions.

In the second stage, the agent follows the learned policy to find optimal heater settings. These settings will produce the closest match between the actual product profile and the desired temperature-time profile. Figure 3 shows the agent following the policy to find the optimal settings. The red trace is the desired temperature-time profile, and the blue trace is the actual profile as the agent is discovering the optimal heater settings.

Figure 3. red: desired profile - blue: actual profile
Figure 3. red: desired profile – blue: actual profile

The Agent

Since considerable time is required for a pass through the oven (>300 seconds) and to stabilize the oven (many minutes), an oven simulator is used to greatly speed up the process. The simulator emulates the heating action of the oven on the product.

In each step of the first stage, the reinforcement learning agent passes the simulator the settings for the eight heaters. After the simulation run, the simulator returns the product temperature readings (~300 readings taken at one-second intervals).

The agent uses a selection of the readings to determine the state of the system. It also calculates a reward for the current run by comparing the difference between the returned readings and the desired temperature-time profile. If the difference for the current run is less than the previous run, the reward is positive; otherwise, it is negative. The reward is used to update the policy.

Figure 4. Reinforcement Learning System
Figure 4. Reinforcement Learning System

After repeating this process thousands of times, the agent will have learned an extensive policy for updating the heater settings under various oven conditions. In the second stage, the agent follows the learned policy to find optimal heater settings that will produce the closest match between the actual product profile and the desired temperature-time profile.

A Deeper Dive

The reinforcement learning system used in the project uses a Double Deep-Q¹ model that incorporates two neural networks and experience replay². After the stage one process, one of the neural networks holds the learned policy used by the agent in the second stage. For more details, please check out the papers referenced at the end of this article.


[1]: van Hasselt, H., Guez, A., and Silver, D. Deep reinforcement learning with double Q-learning. arXiv preprint arXiv:1509.06461, 2015.

[2]: Mnih et al., Human-level control through deep reinforcement learning. Nature, 518 (7540):529–533, 2015. [Deepmind]


Related Articles