Better Simulations will revolutionize Machine Learning

How simulations will solve the biggest problem in ML.

3 min readAug 17, 2019

There is no doubt that Machine Learning often feels magical. As a Machine Learning Engineer myself, I am still fascinated when my model does solve some very hard high-dimensional problem that would have been unsolvable otherwise. I am convinced that data-driven solutions will solve the most challenging problems in the future like self-driving cars, and that Software 2.0 will play a very important role.

The performance of those algorithms highly depends on your data though. If you have ever trained a Neural Network yourself, you will quickly find out that you are limited by both the quality and the quantity of your data.

We could summarize it like this:

If you have enough good quality data → Performance could be good.
If you have bad data → Performance will be bad.

Where do I get enough good data from?

Getting the data is the most difficult problem in Machine Learning in my opinion. If you want to teach a car to drive itself, you will need thousands of miles of human driving. If you want to detect when a window breaks, you will need to break thousands of windows yourself and record their sounds (I actually know a company that did that).

Breaking thousands of different windows is costly and takes a lot of time. This is the biggest bottleneck in solving most problems with a data-driven solution like Machine Learning.

The simulation

But what if you could build a good simulation where we could break windows and drive cars. We could generate millions of training samples in no time and with no effort.

This is exactly what OpenAI did for example with their robot hand system called Dactyl:

Our system, called Dactyl, is trained entirely in simulation and transfers its knowledge to reality, adapting to real-world physics using techniques we’ve been working on for the past year.

It is entirely trained in a simulation, let that sink in for a second. Really think about what that means. The whole training process happened in the simulation where training is super cheap. After training they transfered the knowledge learned inside the simulation to the real world. The result are amazing, as you can see in the video below:

Learning Dexterity

Good simulations are not only a dream for some cool VR games. It would solve the data problem entirely. You need millions of samples of some edge case in self-driving? Just re-create this scene in a simulation and look at what happens. There is already a very cool simulation for training self-driving systems. It is called CARLA.

CARLA Autonomous Driving

This is a huge step into the right direction and maybe the only way to achieve fully self-driving capabilities. Just think how hard it would be to record every edge case in the real world.

Conclusion

I think in the future we will not be limited by our resources in the real world anymore, but on how good our simulations are. The simulation should be that good, that if something works inside it, it should work outside it (in the real world) too.

Thank you for reading and as always keep up the learning!

If you want more and stay up to date you can find me here:

Twitter: @elfouly_sharif
GitHub: SharifElfouly
LinkedIn: https://www.linkedin.com/in/sharif-elfouly-975146142/

Better Simulations will revolutionize Machine Learning

How simulations will solve the biggest problem in ML.

Where do I get enough good data from?

The simulation

Conclusion

Written by shafu.eth