Accelerating Deep Neural Networks

Published in

Towards Data Science

3 min readSep 20, 2017

Accelerating Deep Neural Networks

Neural networks are “slow” for many reasons, including load/store latency, shuffling data in and out of the GPU pipeline, the limited width of the pipeline in the GPU (as mapped by the compiler), the unnecessary extra precision in most neural network calculations (lots of tiny numbers that make no difference to the outcome), the sparsity of input data (lots of 0s), and many many other factors.

Neural Networks compile for A LONG TIME. (Credit: IT Experts)

How can we make deep neural network training, testing, and predictions faster? One way is to write faster algorithms, like the relu activation function, which is much faster than tanh and sigmoid, and another is to write better compilers to map the neural network into the hardware. A third approach is what I want to tell you about today. Making better hardware, and by better I mean faster processing speed. Matrix multiplication and indexing is at the core of deep learning, and it’s an “embarrassingly parallel problem.” That’s what gets the hardware guys really interested: the fact that a solution should be “easy”, or at least not impossible. In a recent article, I pointed to a nice comprehensive review of recent progress in accelerating deep neural networks prepared by MIT+Nvidia.

There are some really cool custom hardware solutions invented these past few years, like the Volta GPUs from nVidia, the TPUs from Google, and a bunch of FPGA accelerators.

Without tooting my horn too hard, let me tell you about the FPGA stuff, because the TPU and Volta stuff are a lot more “commercial”. Intel (Altera) and Xilinx are the only ones to benefit from selling FPGAs, and so you don’t hear as much about them, but AWS has FPGA instances, and the I/O you can get on an FPGA is pretty nuts.

Way back in 2011, my collaborators and I built custom processors on FPGAs to speed up neural network computations. Back then we were really into Bidirectional and Hopfield associative memory neural networks (unsupervised learning), whereas today we use much more supervised learning approaches like DNNs, CNNs, and RNNs. Put simply, you get better results on most problems with supervised learning.

ASIPs for artificial neural networks - IEEE Conference Publication

Customized application-specific processors called ASIPs are becoming commonplace in contemporary embedded system…

ieeexplore.ieee.org

In another paper, also in 2011, my collaborators and I used lookup tables on an FPGA to speed up the most common calculations encountered by a neural network.

Artificial neural network acceleration on FPGA using custom instruction - IEEE Conference…

In this paper, we present the acceleration of a pre-trained feedforward artificial neural network executing on a NIOS…

ieeexplore.ieee.org

These approaches still apply today. You can even go embedded and get a tiny parallela board for $100 and map a small neural network right into the on-system FPGA. It has an ARM9 processor, runs linux, and just for kicks it has a 16 CPU network-on-chip for coprocessing. The main downside is that this credit-card sized supercomputer will not literally melt and catch on fire if you push it too hard, but in many cases the chip will damage itself from the heat, even with a passive heat sink.

Parallella Introduction

The Parallella computer is a high performance, credit card sized computer based on the Epiphany multi-core chips from…

www.parallella.org

OK maybe I’m an alarmist. Here is a raspberry pi heat sink discussion with a lot less flash. One more for good measure. In my work on neural networks we use p instance GPUs in AWS for most projects. FPGAs are just too full-custom for the clients I’ve interacted with. And there is no nice and easy way to connect Keras to the FPGA. Oh well.

I will leave you with this video of a brave hacker cooking baloney with his overheated CPU:

If you enjoyed this article on artificial intelligence, then please try out the clap tool. Follow us on medium. Go for it. I’m also happy to hear your feedback in the comments. What do you think?

Happy Coding!

-Daniel
daniel@lemay.ai ← Say hi.
Lemay.ai
1(855)LEMAY-AI

Accelerating Deep Neural Networks

ASIPs for artificial neural networks - IEEE Conference Publication

Customized application-specific processors called ASIPs are becoming commonplace in contemporary embedded system…

Artificial neural network acceleration on FPGA using custom instruction - IEEE Conference…

In this paper, we present the acceleration of a pre-trained feedforward artificial neural network executing on a NIOS…

Parallella Introduction

The Parallella computer is a high performance, credit card sized computer based on the Epiphany multi-core chips from…

Written by Daniel Shapiro, PhD