How a High School Junior Made a Self-Driving Car

Sully Chen
Towards Data Science
7 min readDec 9, 2018

--

Questions related to this repository from a project I created almost three years ago are among the most numerous questions I receive. The repository itself is really nothing too special, just an implementation of an Nvidia paper that was released about a year prior. A graduate student later managed to implement my code in an actual full-sized car, which is really cool. The story behind my code’s creation is the interesting part.

My fascination with machine learning began in early 2015, when I stumbled across genetic algorithms and neural networks. Popular videos on YouTube showed virtual organisms seemingly magically evolving complex behaviors without any human input. The videos explained that the algorithms were as simple as the crossing over and random mutation that I’d learned in biology class, but I was still in disbelief that such a computer simulation was possible. So naturally, I wrote my own simulation to verify it was possible. After it actually worked, I was captivated, and I took a deep dive into MIT’s open courseware series on AI. I spent about two weeks watching one or two lectures per day, essentially binging an entire semester’s worth of introductory AI material.

Now, the real fun began — the mathematics. To be completely honest, I struggled with backprop for weeks before I really understood it. All of the papers, guides, and blogs I found never explained it in a way that I could really absorb, and I spent many long hours trying to implement it with little to no success. Finally, I found a blog that I clicked with, and I made a video on YouTube condensing the information I learned so that confused people in my position could learn the way I did. Lastly I wrote a neural network library from scratch in C++ to make sure I understood what I had learned, and (much) later refined it to a small repository designed to help beginners understand how neural networks work.

My fascination with self-driving cars was almost entirely inspired by Tesla (the company, not the genius), which had recently released crazy videos of their autopilot “driving” their cars. With my newfound naivety and shallow knowledge of machine learning, I set out to make my own.

The first (and perhaps most obvious) challenge was data collection. I needed to record video and corresponding steering wheel angles. Video was easy: I just sloppily duct taped a webcam to my windshield. Steering wheel angles, on the other hand, were an entirely different challenge. My first approach was to attach an accelerometer and an Arduino to the steering wheel and sync up time data with the video. To anyone who has ever been thrown against the side of a vehicle when a bad driver takes a sharp turn, you’ll immediately understand why this approach drastically failed. Firstly, the accelerometer picked up the acceleration from any tiny motion of the car. Secondly, precisely syncing accelerometer data and video data from separate devices is an immense challenge that I didn’t want to deal with.

My second approach was to interface with the car directly, accessing the CAN-BUS using the OBD-II port that every modern car is equipped with. This presented many challenges, but the payoff (super precise steering wheel measurements) was absolutely worth it.

Challenge 1: How will I even read the CAN-BUS from the OBD-II port? Decoding and processing CAN-BUS signals is a complicated process that would take ages to write and debug. Luckily, someone already did it. Using this code and the relatively cheap Arduino shield, I was able to extract and read CAN-BUS data from the car with ease.

Challenge 2: How do I turn the avalanche of CAN-BUS data into steering wheel readings? Below is an image of just 10 lines (a fraction of a second) of CAN-BUS data I received.

All of this data contains information about every system in the car connected to the CAN-BUS, from the windshield wipers to the torque sensors on the steering wheel. On the left-most column of the data are the addresses corresponding to the data packets received (right columns). For example, on line 1, we receive [7, 108, 255, 160, 7, 4] from address 86. Car companies really don’t like publicly releasing which addresses correspond to which parts of the car, and they especially don’t like people tinkering with their cars’ hardware. As a result, I had to devise some way to parse through the hundreds of CAN-BUS channel and find the single channel that gave steering wheel angle information.

I essentially did this by manually monitoring each channel individually while I turned the steering wheel slowly, looking for smooth changes in the data received. After a lot of experimentation, I managed to figure out which channel corresponded to the steering wheel, and a few other components of the car (throttle, brake, speed, etc.).

Challenge 3: I have the data and the channel, now what? The data from the CAN-BUS doesn’t come out in a nice format. You don’t receive messages like “channel 64 sends a message: 43.5 degrees.” Instead, you get a jumble of data, and that jumble somehow corresponds to an angle. I needed to determine a sort of “conversion function” between a few bytes of data and the angle of the steering wheel. To do this, I essentially moved the steering wheel around to different positions and recorded the corresponding bytes associated with the position. For example, if I moved the steering wheel to 90 degrees and received [0, 128, 0, 0], then moved it to 135 degrees and received [0, 192, 0, 0], I could make the rough approximation that the second data point corresponded to a multiple of approximately: (192–128)/(135–90) degrees. I got very lucky that the data was a simple linear transformation of the steering angle. Using this method, I eventually experimentally determined a linear transformation I could apply to the data to obtain the steering wheel angle.

Here is a video of this process in action, with all three challenges solved. I later went out and just drove around my town for hours collecting labeled data, with my 2015 Macbook Air in the passenger seat. It hadn’t even been a year since I’d gotten my driver’s license, so this really wasn’t the best idea. Another fun fact: I ditched school a few times to go out collecting data too, so sorry Mr. S and Ms. N for all of the classes I missed!

Now the interesting part: applying machine learning to the task.

My first attempt was to use Caffe to train a classification model on AlexNet. I binned my data into bins of size 10 degrees each, obtaining a set of images for steering wheel angles between 0–9 degrees, 10–19, 20–29, etc. At the end I took a linear combination of the classification output to obtain a final prediction. The motivation was something along the lines of: “If the model predicts an angle of 20 degrees and 30 degrees with equal likelihood, then the true angle is probably around 25 degrees.” A quick disclaimer: this was not a good way to approach things. Thanks to the magic of statistics that I didn’t really understand at the time, this actually didn’t turn out horribly, and I later turned it into a dumpster-fire of a repository. The repository has a lot of terrible coding practices embedded into it, and I keep it as a reminder of my successes and failures, using it as a time capsule of the many things I’ve learned since then.

My second attempt was to replicate the Nvidia paper that recently came out, with a slight modification. The Nvidia model uses the following architecture:

The final output is a simple linear combination of the previous ten neurons, which I thought could be improved. I changed this by applying an inverse tangent function to the linear combination, which I thought made more intuitive sense. The inverse tangent gives the network a way to sort of tool to “recover” the angle of the curvature from the visual data, instead of having to relearn a way to convert slopes or tangents into radian measures. In practice, it really didn’t make a difference, but I kept it just for fun.

I wrote up the code in about a night in TensorFlow, and trained it on my cheap 750ti GPU which I had at the time. The Nvidia paper didn’t really specify much about their training process, and in general the paper didn’t give too much information about how they accomplished the things they did. I used Adam optimizer along with L2-normalization and dropout for training. In the end, I obtained pretty good results!

Overall, this massive project taught me an immense amount of information, technique, and coding practices. It also caught a lot of attention and landed me a bunch of interviews and a few job offers! Nvidia even flew me out to their self-driving lab to get a tour of their tech, which was really cool! They offered me a rather generous package to come intern with them over the next year, but I reluctantly declined to pursue undergraduate study instead. I met a ton of amazing people along the way through the internet after I posted my work, which may be the most rewarding part of this whole experience.

Feel free to ask any questions!

--

--