Learning to Simulate

How learning to simulate better synthetic data can improve deep learning

Published in

Towards Data Science

8 min readOct 13, 2019

The paper presented at ICLR 2019 can be found here. I also have slides as well as a poster explaining the work in detail.

Deep neural networks are an amazing piece of technology. With enough labelled data they can learn to produce very accurate classifiers for high dimensional inputs such as images and sound. In recent years the machine learning community has been able to successfully tackle problems such as classifying objects, detecting objects in images and segmenting images.

The caveat in the above statement is with enough labelled data. Simulations of real phenomena and of the real world can sometimes help. There are cases where synthetic data has improved performance in deep learning systems in computer vision or robotic control applications.

Simulation can give us accurate scenes with free labels. But let’s take Grand Theft Auto V (GTA) for example. Researchers have leveraged a dataset collected by free-roaming the GTA V world and have been using this dataset to bootstrap deep learning systems among other things. Many game designers and map creators have worked on creating the intricate world of GTA V. They painstakingly designed it, street by street, and then fine-combed the streets adding pedestrians, cars, objects, etc.

An example image from GTA V (Grand Theft Auto V)

This is expensive. Both in time and in money. Using random simulated scenes we might not do much better. This means important edge cases might be severely undersampled and our classifier might not learn how to detect them correctly. Let us imagine we are trying to train a classifier which detects dangerous scenes. In the real world we will run into dangerous scenes like the one below with very low frequency, yet they are very important. If we generate a large number of random scenes, we will have very few dangerous scenes like the one below as well. A dataset which undersamples these important cases might yield a classifier which fails on them.

Example of a dangerous traffic scene. These important cases can be undersampled when randomly sampling synthetic data. Can we do better?

Learning to simulate is the idea that we can potentially learn how to optimally generate scenes such that a deep network can either learn a very good representation or can perform well in a downstream task.

To test our work, we create a parameterized procedural traffic scene simulator using Unreal Engine 4 and the Carla plugin. Our simulator creates a road of variable length with different types of intersections (X, T, or L). We can populate the road with buildings on the side and cars of 5 different types on the road. The amount of buildings and cars are controlled by tunable parameters, as well as the type of cars. We can also change the weather between 4 different weather types, which control for lighting and rain effects. The main idea will be to learn the optimal parameters which control these scene characteristics for different tasks (for example semantic segmentation, or object detection).

A demo of our procedural scene simulator. We vary the length of the road, the intersections, the amount of cars, the type of cars and the amount of houses. All of these are controlled by a set of parameters.

To get sensor data we place a car on the road of our generated scenes which can capture RGB images from the generated scene which automatically have semantic segmentation labels and depth annotations (for free!).

An inside view of the generated scenes from our simulator with a fixed set of parameters

However, the learning to simulate algorithm is more general than this. We don’t have to use it exclusively for traffic scenes, it can apply to any type of parameterized simulator. By this we mean that, for any simulator that takes in parameters as an input, we present a way to search for the best parameters such that the data generated is optimal for a deep network to learn the downstream task. Our work, to the best of our knowledge, is the first to do simulation optimization to maximize performance on a main task, as well as apply it to traffic scenes.

Moving on to the crux of our algorithm. A traditional machine learning setup is the following, where data is sampled from a distribution P(x,y) (x is the data and y is the label). Usually this happens by collecting data in the real world and manually labeling the samples. This dataset is fixed, and we use it to train our model.

By using a simulator to train a main task network, we can generate data from a new distribution Q defined by the simulator. This dataset is not fixed and we can generate as much data as our computation and time constraints allow. Still, the data generated in this domain randomization setup is randomly sampled from Q. The data needed for obtaining a good model could be large and performance can be suboptimal. Can we do better?

We introduce learning to simulate which optimizes a metric of our choice on a main task — the pipeline is trained by defining a reward function R which is directly related to this metric (usually is identical to the metric itself). We sample data from a parameterized simulator Q(x,y|Θ), with which we train the main task model at every iteration of the algorithm. The reward R that we defined is then used to inform the update of the Policy which controls the parameter Θ. The reward R is obtained by testing the trained network on a validation set. In our case, we use vanilla policy gradient to optimize our policy.

Informally, we are trying to find the best parameter Θ which gives us the distribution Q(x,y|Θ) which maximizes accuracy (or whichever metric) for the main task.

The mathematical formulation of the learning to simulate problem is a bi-level optimization problem. Attempting to solve it with a gradient-based approach poses smoothness and differentiability constraints on the lower-level problem. In this case the simulator should also be differentiable, which is generally not true! This is why a derivative-free optimization approach like vanilla policy gradients makes sense.

Mathematical formulation of the bi-level learning to simulate optimization problem

We demonstrate our approach on instance counting and semantic segmentation.

The car-counting task that we explore is simple. We ask the network to count how many individual cars of each specific type are in the scene. Below is an example scene with the correct labels on the right.

We use learning to simulate to solve this problem and compare to what happens using only random simulation. In the graph below, focus on the red and grey curves, which show how learning to simulate (LTS) achieves a much higher reward (lower mean absolute error of cars counted) after 250 epochs. The random sampling case briefly improves, but performance decreases once the sampled random batch is not adequate for the task. The grey curve rises slowly over several iterations but learning to simulate converges on the best possible accuracy shown by the blue curve (where we use the ground-truth simulation parameters).

Reward for the car counting task. Note how learning to simulate converges to the best possible reward (on a simulated dataset) shown by the blue curve.

What is happening? A nice way to look at it is by visualizing the probabilities of different scenarios and objects in our scene. We plot the weather probabilities over time. The ground-truth validation dataset that we generated oversampled certain weathers (clear noon and clear sunset) and undersampled the rest. This means there were more images with clear noon and clear sunset weather than other types of weather. We can see that our algorithm recovers the rough proportions!

Weather probabilities (logits) over time

Let’s do the same with car spawning probabilities. Our ground-truth dataset oversampled certain types of cars (silver Nissan and Green Beatle). Learning to simulate reflects these proportions after training as well. In essence, the algorithm pushes the simulator parameters to generate datasets which are similar to the ground-truth dataset.

Now we show an example of how learning to simulate improves accuracy over random simulation on the KITTI traffic segmentation dataset which is a dataset captured in the real world.

An example image from the KITTI dataset.

An example of ground-truth semantic segmentation labels on our simulator. In a simulator, you can get object labels for free — no need for a human annotator

As our baseline we train the main task model separately 600 times, with data generated by the simulator using different sets of random parameters for each one. We monitor the validation Car IoU metric for each of these networks and pick the one with highest validation reward. We then test it on the unseen KITTI test set. We train learning to simulate for 600 iterations and obtain a Car IoU (widespread segmentation metric) of 0.579, much higher than the 0.480 achieved using the random parameter baseline (random params). We also show our results using another derivative-free optimization technique (random search) which did not achieve good results in this experiment (although it did work pretty well in car counting). Finally, we also show the actual performance of the ResNet-50 network we used for segmentation by training on 982 annotated real KITTI training images (KITTI train set), to show an upper bound.

Results for semantic segmenation on the unseen KITTI test set for car semantic segmentation

Learning to simulate can be seen as a meta-learning algorithm that adjusts parameters of a simulator to generate synthetic data such that a machine learning model trained on this data achieves high accuracies on validation and test sets, respectively. We show that it beats domain randomization in real problems and believe it is a very promising research area. It will be exciting to see what can happen with extensions and applications of this in the near-future and I encourage everyone to look into how simulation and learning to simulate can help you in your applications or research.

All questions are more than welcome! My website is below.

Nataniel Ruiz

Research I have explored several topics in computer vision including face and gesture analysis, simulation and…

natanielruiz.github.io

Learning to Simulate

How learning to simulate better synthetic data can improve deep learning

Nataniel Ruiz

Research I have explored several topics in computer vision including face and gesture analysis, simulation and…

Written by Nataniel Ruiz