Photo by Mukil Menon on Unsplash

What is a GPU and do you need one in Deep Learning?

In Deep Learning, everyone seems to recommend using a GPU. What is it, can you do without one, and who is it exactly for?

--

Any data scientist or machine learning enthusiast who has been trying to elicit performance of training models at scale will at some point hit a cap and start to experience various degrees of processing lag. Tasks that take minutes with smaller training sets may now take more hours — in some cases weeks — when datasets get larger.

But what are GPUs? How do they stack up against CPUs? Do I need one for my deep learning projects?

If you’ve ever asked yourself these questions, read on…

I recently open-sourced my Computer Vision library that utilizes the GPU for image and video processing on the fly. I’ll leave the link to the Github repo in case you’re interested :)

Any data scientist or machine learning enthusiast would have heard, at least once in their life, that Deep Learning requires a lot of hardware. Some train simple deep learning models for days on their laptops (typically without GPUs) which leads to an impression that Deep Learning requires big systems to run execute.

This has created a myth surrounding deep learning which creates a roadblock for beginners.

Every book that I’ve referred to in the past few years has the author always mentioning the following:

Deep learning requires a lot of computational power to run on.

But I don’t have datacentres at my command and when I built my first deep learning model on a sizable laptop, I knew that the consensus was either wrong or portrayed with some truth.

You don’t have to take over Google to be a deep learning expert.

Why do we need more hardware for deep learning?

For any neural network, the training phase of the deep learning model is the most resource-intensive task

While training, a neural network takes in inputs, which are then processed in hidden layers using weights that are adjusted during training and the model then spits out a prediction. Weights are adjusted to find patterns in order to make better predictions.

Both these operations are essentially matrix multiplications. A simple matrix multiplication can be represented by the image below

Source: jeremyjordan.me

In a neural network, we can the first array is the input to the neural network, while the second array forms its weight.

Easy, right?

Yes, if your neural network has around 10, 100 or even 100,000 parameters. A computer would still be able to handle this in a matter of minutes, or even hours at the most.

But what if your neural network has more than 10 billion parameters? It would take years to train this kind of systems employing the traditional approach. Your computer would probably give up before you’re even one-tenth of the way.

“A neural network that takes search input and predicts from 100 million outputs, or products, will typically end up with about 2,000 parameters per product. So you multiply those, and the final layer of the neural network is now 200 billion parameters. And I have not done anything sophisticated. I’m talking about a very, very dead simple neural network model.” — a Ph.D. student at Rice University

Making deep learning models train faster

Deep Learning models can be trained faster by simply running all operations at the same time instead of one after the other.

You can achieve this by using a GPU to train your model.

A GPU (Graphics Processing Unit) is a specialized processor with dedicated memory that conventionally perform floating point operations required for rendering graphics

In other words, it is a single-chip processor used for extensive Graphical and Mathematical computations which frees up CPU cycles for other jobs.

The main difference between GPUs and CPUs is that GPUs devote proportionally more transistors to arithmetic logic units and fewer to caches and flow control as compared to CPUs.

While CPUs are mostly applicable for problems that require parsing through or interpreting complex logic in code, GPUs are designed to the dedicated graphical rendering workhorses of computer games, and which were later enhanced to accelerate other geometric calculations (for instance, transforming polygons or rotating verticals into different coordinate systems like 3D).

A GPU is smaller than a CPU but tends to have more logical cores (arithmetic logic units or ALUs, control units and memory cache) than the latter.

Source: fast.ai

In the chart above, you can see that GPUs (red/green) can theoretically do 10–15x the operations of CPUs (in blue). This speedup very much applies in practice too.

If you consider a CPU as a Maserati, a GPU can be considered as a big truck.

The CPU (Maserati) can fetch small amounts of packages (3 -4 passengers) in the RAM quickly whereas a GPU(the truck) is slower but can fetch large amounts of memory (~20 passengers) in one turn.

This video outlines the concept further:

Why choose GPUs for Deep Learning

GPUs are optimized for training artificial intelligence and deep learning models as they can process multiple computations simultaneously.

They have a large number of cores, which allows for better computation of multiple parallel processes. Additionally, computations in deep learning need to handle huge amounts of data — this makes a GPU’s memory bandwidth most suitable.

There are a few deciding parameters to determine whether to use a CPU or a GPU to train a deep learning model:

Memory Bandwidth:

Bandwidth is one of the main reasons why GPUs are faster for computing than CPUs. With large datasets, the CPU takes up a lot of memory while training the model.

Computing huge and complex jobs take up a lot of clock cycles in the CPU — CPUs take up jobs sequentially and has a fewer number of cores than its counterpart, GPU.

A standalone GPU, on the other hand, comes with dedicated VRAM (Video RAM) memory. Thus, the CPU’s memory can be used for other tasks.

Comparison of bandwidth for CPUs and GPUs over time

Dataset Size

Training a model in deep learning requires a large dataset, hence the large computational operations in terms of memory. To compute the data efficiently, a GPU is an optimum choice. The larger the computations, the more the advantage of a GPU over a CPU.

Optimization

Optimizing tasks are far easier in CPU. CPU cores, though fewer, are more powerful than thousands of GPU cores.

Each CPU core can perform on different instructions (MIMD architecture) whereas, GPU cores, who are usually organized within the blocks of 32 cores, execute the same instruction at a given time in parallel (SIMD architecture).

The parallelization in dense neural networks is highly difficult given the effort it requires. Hence, complex optimization techniques are difficult to implement in a GPU than in a CPU.

Photo by Alex Knight on Unsplash

Should I use a GPU?

As with any data science project, it depends. There are tradeoffs to consider, between speed, reliability, and cost:

  1. If your neural network is relatively small-scale, you can make do without a GPU
  2. If your neural network involves tons of calculations involving many hundreds of thousands of parameters, you might want to consider investing in a GPU

As a general rule, GPUs are a safer bet for fast machine learning because, at its heart, data science model training consists of simple matrix math calculations, the speed of which may be greatly enhanced if the computations are carried out in parallel.

See this Reddit post on the best GPUs to invest in for Deep Learning

Cloud GPU Instances

You should also give Cloud GPUs a thought. If you don’t want to buy a bunch of expensive GPUs, you can leverage GPUs on-demand with a cloud-hosting company. They’ll save you from configuring the hardware and best of all, they’re not that expensive — costs can be as little as US$0.25 per hour while you’re using it.

Once you’re done, remember to shut down your cloud instance.

You will be renting a foreign computer/server, not running something on your own. It’s not enough to close your browser or shut down your PC, those will merely sever the connection between your device and this distant server, not shut down the thing for which you’re paying. Otherwise, you’ll be charged for all the time it runs and gets surprised with a nasty bill!

CPUs are best at handling single, more complex calculations sequentially, while GPUs are better at handling multiple but simpler calculations in parallel.

GPU compute instances will typically cost 2–3x that of CPU compute instances, so unless you’re seeing 2–3x performance gains in your GPU-based training models, I would suggest going with CPUs.

As always, thanks so much for reading! Please tell me what you think or would like me to write about next in the comments. I’m open to criticism as well!

See you in the next post! 😄

--

--

I write libraries and sometimes blog about them | Top Writer | Creator of Caer, the Vision library for Python