Making Sense of Big Data

A 2022-Ready Deep Learning Hardware Guide

What is a GPU? Why does it matter? How much RAM do I need? Do you want to understand those terms better, and even put them to use? Read on.

Nir Ben-Zvi
Towards Data Science
22 min readNov 15, 2020

--

Image: Pixabay

This is as up to date as: 3/1/2022

Intro

This is a vastly revised version of the older version you all know and love.
Almost every part of this guide has been thoroughly rewritten. The original guide has been getting updated over the course of 6 years, so I decided it’s time to basically (almost) write it from scratch.
This time I tried to make this a bit more thorough and general. I’ll keep updating this, but I also want to make sure my readers can understand the topic even if I stop doing so one day.

So, you’ve decided you want to purchase a machine dedicated to training machine learning models. Or, rather, you work in an organization where the buzzwords of this guide are constantly thrown around and you simply want to know a bit more about what they mean. This isn’t a terribly simple topic, so I’ve decided to write this guide. You can discuss those terms from various angles, and this guide will tackle one of them.

A note on the various tables in this article; I made them, and getting all the required information took some time. Please don’t use them without my consent.

Who Am I?

I’m Nir Ben-Zvi, a Deep Learning researcher and a hardware enthusiast from early middle school days, where I would tear computers apart while friends were playing basketball (tried that too, went back to hardware pretty quickly).

In the past few years I’ve advised organizations on building deep learning machines, and ultimately decided to put that knowledge into a guide.

Today I’m a computer vision consultant, working with various companies and startups developing image-based products. A lot of the knowledge for this guide came from the decisions made towards building deep learning machines for my various clients.

How I Maintain This Guide

I originally wrote this guide in Hebrew around 5–6 years ago (but who’s counting) and ever since then I make sure it stays up to date. This time I actually decided to re-write most of it. Note that some parts almost haven’t changed a bit and the reason is that I felt they are still relevant.

It’s pretty amazing how little hardware has changed over the past 4–5 years. For example, between November 2018 to April 2020, NVIDIA hasn’t updated its line of consumer graphics (GeForce) cards at all. Intel on the other hand has updated its desktop lines twice. Another thing that ended up being pretty anti climatic was AMDs new line of consumer and server-level processors (I’ve seen this is a pretty delicate topic so more on this later).

So why is this guide still relevant and what’ll keep it relevant in one year’s time? Well, for one, I try to update it from time to time and I actually do it when something special affects the market. Additionally, I have also taken out parts which I felt were too generation-specific. For example, Intel has just announced its 12th generation silicone, but I’m not too sure that’s going to make building a DL machine too different from building one based on the current 11th generation, so I tried to make the CPU discussion more generic. If something drastic changes between generations I’ll naturally make the required changes.

GPU-laptops are outside the scope of the article. Image: Wikimedia

A Few Words on GPU Laptops

This guide is not for choosing a laptop. In my opinion the deep learning laptop doesn’t exist anymore, at least not for computer vision tasks. Modern DL models are simply too large to fit on such laptops, which usually have graphics cards which are meant for gamers (or, occasionally, for rendering tasks or for giving Photoshop some extra juice). Even the most powerful gaming laptops, those often called “Desktop Replacements” (or DTRs) are probably not strong enough to actually train (even fine-tune) a ResNet-101 based model in a reasonable time frame.

In days where Google supplies T4 and P100 based Colab environments, I don’t see a reason to buy a strong laptop for DL purposes.

You still want your laptop to be strong; at least 16GB of memory and 4+ cores. But it will mostly be a machine running a terminal to a remote instance. Those 16GBs are for Chrome. I use a Mac, by the way.

If you still want a GPU Laptop that’s also portable, I’d buy the Razor Blade. End of discussion. You can look for other gaming oriented rigs but that’d be my favorite pick.

What About Portable Graphics Cards?

I’ll admit I’m not too familiar with this field, and I haven’t seen those being used for non-gaming purposes. It’s still a single graphics card which probably wouldn’t suffice in the long run.

So, What’s in this Guide?

I’ll begin by splitting the options into four “categories”:

  1. A single desktop machine with a single GPU
  2. A machine identical to #1, but with either 2 GPUs or the support for an additional one in the future
  3. A “heavy” DL desktop machine with 4 GPUs
  4. A rack-mount type machine with 8 GPUs (see comment further on; you are likely not going to build this one yourselves)
Rack-mounts typically go into server rooms. Image: Pixabay

On 8-GPU Machines and Rack Mounts

Machines with 8+ GPUs are probably best purchased pre-assembled from some OEM (Lambda Labs, Supermicro, HP, Gigabyte etc.) because building those quickly becomes expensive and complicated, as does their maintenance. Note that they also probably require setting up a modest server room in your office with proper cooling (and a fail safe for said cooling). These are very, very noisy and aren’t meant to be placed near human beings.

You theoretically could build one yourselves — I’ve seen that done — but the monetary gain here will be too modest to justify in my opinion. I will note that we are situated in Israel, where access to parts (notably PSUs and enclosures) for such machines is difficult and required having them shipped from the US. Since this is the English guide I’ll note that perhaps building such a machine will be easier and cheaper where you live (and if you have regular access to Newegg).

One extra reason not to build one is that OEMs will usually give you a much needed on-site support package.

Since ultimately these are rarely self-built, I’m not going to talk about them further.

Which Graphics Card Do I Want?

TL;DR: you want a 3080/3080ti/3090. They belong to NVIDIA’s series (or generation) 30 of cards.

Not so fast, though; it’s actually ridiculously hard to get your hands on cards due to the ongoing chip shortage. For the sake of this guide I’ll assume you can, but in reality you might have to build your rig with what’s available rather than what’s best.

Image: Provided by Nvidia

NVIDIA Series 3 Discussion

NVIDIA’s Ampere-based series 30 has now been available for around one year, with the 3060ti and 3080ti appearing later on. Let’s first go through those parts and compare the things that I believe matter:

Consumer-Level Card Comparison along with the data center grade A100. Image: Me.

Okay I’m Lost!

So what do we have here? The lowest-end part, the 3070 is supposed to bring 2080ti-equivalent performance at half the older model’s street price. The lower memory size can be an issue for some training tasks. For a pretty small price upgrade the 3080 will offer significantly higher performance — NVIDIA claims twice for some tasks.

3080

When poised against the 3090 after NVIDIA’s initial announcement, the 3080 seemed like an easy value-for-money choice (at MSRP). For 700$ you get close to double the performance of the older king, the 2080ti. It does provide less memory but higher speed should still make it great. My older opinion hasn’t changed.

3080ti

When initially announcing Series-30, most DL enthusiasts had to choose between the 3080 and the 3090. Both had a lot going for them and differed by a significant amount of money, making them ideal for different users. Since then, NVIDIA announced the in-between and much expected 3080ti. Does the “ti” part make the no-brainer choice it has in previous generations? I’ll get to that soon.

Compared to the 3090, the 3080ti basically gives almost identical performance at a slightly cheaper price, with half the memory. The official 3090 spec requires three slots in your motherboard but this has changed a bit since announcement, making this limitation somewhat less relevant.

3090

The 3090 should be the most interesting card of the bunch in terms of price-performance, but at 350W it’s a bit of a challenge to power. When originally writing about it I mentioned that 2-slot solutions will be mandatory for it to see major adoption. Since then GIGABYTE, Asus and EVGA has done just that. I’m still against buying the (probably cheaper) three slot solutions. The 350 watts of power still make using this part difficult but the hardware support has improved in the past year.

Memory

Anachronism is awesome; a year ago I wrote that:

“The 24GB memory is interesting but if in it becomes a choice of fitting 5 3090s rather than 8 3080s in the same chassis, I see system builders opting for the latter, cheaper card. “

Well, with two slot blower solutions for the 3090, if money is no issue, you can now put 8 3090 in the same chassis for an incredible 192GB of GPU memory in a single machine.

Another thing to get excited about is the extremely fast GDDR6X chosen for the more expensive variants (up from GDDR6). Those are the 3080, 3080ti and 3090.

I couldn’t get memory bandwidth figures from the press release but those cards will have insanely fast memory interfaces, which can potentially outweigh their lower memory sizes compared to Series 20 cards. This might sound counterintuitive, but if the card can get the data from its memory and process it fast enough, it can require less memory for similar throughputs (if we’re measuring images/second for training or inference).

So Which One Should I Get?

First things first; if you want to get something cheap for the purpose of learning how to deep learn — I’d get a 3070. But that’s not what this guide is about.

For a startup (or a larger firm) building serious deep learning machines for its power-hungry researchers, I’d cram as much 3090s as possible. The double memory figure literally means you can train models at half the time, which is simply worth every penny. This goes for someone building a single or double GPU machine as well. If you do this for a living, the 3090 is an amazing value for money. For “only” 300$ less, I fail to see how the 3080ti fits there. Double memory is a big thing.

If, however, the 3080ti is already stretching your budget, then it’s also a terrific value for money.

Finally, the original 3080 at an MSRP of 699$ is still a ridiculously strong value for money and for someone maximizing her budget for a single GPU rig, adding a second one later on — get it. It’ll be amazing.

Another way I’d look at this; if the choice is 3080ti vs 3080 and budget it tight — the 3080 is an amazing value for money that will serve you well.

MSRP Makes Plans, God Laughs

All of my recommendations are based on MSRPs, which sometimes vary a lot based on supply and demand, especially while the worldwide chip shortage continues. If, for instance, a 3090 runs closer to 2000$ — the 3080ti immediately becomes an insane value for money.

Should I Upgrade from Series 20?

No. Unlike the jump from Pascal to Turing (1080ti to 2080ti) — this generation, at least at the moment, provides a modest speed bump. Moving from 1080tis to 2080tis three years ago netted a very nice performance boost due to using mixed precision training or FP16 inference — thanks to their novel TensorCores. This time around we are getting the usual ~30% performance jump (task dependent of course) but nothing else.

At this point in time I don’t see a reason to buy a 2080ti-based system unless you get a very good bargain on them.

Let’s Talk Graphics Cards

Card Generations and Series

NVIDIA usually makes a distinction between consumer level cards (termed GeForce) and professional cards aimed at professional users.

Say Bye to Quadro and Tesla

In the past, NVIDIA has another distinction for pro-grade cards; Quadro for computer graphics tasks and Tesla for deep learning. With generation 30 this changed, with NVIDIA simply using the prefix “A” to indicate we are dealing with a pro-grade card (like the A100). What used to be Quadro is now simply called a “Nvidia Workstation GPU” and Teslas are “Nvidia Data Center GPUs”. GCI-targeted GPUs use Axxxx designations such as A6000 while the deep learning ones use Axx and Axxx.

So why aren’t we only discussing those “professional grade” GPUs? Well, they are hella expensive and for development (and learning) purposes aren’t really justified. Data center targeted cards are rated to run 24/7 for years, have global on-premise support and sometimes offer passive cooling solutions — all things that you definitely want for your final product, and are the reason you don’t see GCP or AWS using GeForce hardware. Well, that and the fact that NVIDIA’s EULA explicitly forbids that.

The current performance leaders for both lines are the GeForce 3090 and the A100. There are also an A40 and recently announced A10, A16 and A30. While the A16 and A40 aren’t meant for DL machines, the A10 and A30 are interesting and I’ll discuss them next.

Inference-Oriented Cards

A brave new world that didn’t exist until recently. Currently (and unsurprisingly) dominated by NVIDIA with their T4 (“T” for Turing, meaning the older generation) card, with crippled training capabilities but completely reasonable inference speeds. It’s still more expensive than a GeForce 2080ti/3080/3080ti/3090, but if you need fast inference with professional-level perks (passive cooling, great international support, robustness etc) it’s the only game in town. If that’s not something you need you probably know this by now. For series 30, NVIDIA has replaced this part with the A30, but it’s not an exact replacement, as it’s priced higher and is comparatively much more powerful. See table below, also comparing them to their relevant top of the line data-center counterparts.

NVIDIA’s Inference-targeted parts. Image: Me.

Graphics/Rendering Cards

As mentioned above, “Nvidia Workstation GPUs” (previously Quadro) are NVIDIA’s rendering-targeted level of cards. On paper they shouldn’t belong here, but they can sometimes offer a sweet-spot of value for money between the even more expensive data-center cards (previously Tesla) and consumer cards. They are much more resilient to intense loads (compared to consumer-level GeForce cards) but still don’t cost as much as data center parts. I’m not too familiar with them, but I will note that for training purposes I don’t see the appeal. For inference tasks they can offer a viable alternative, but again, you need to fully understand what you are doing if you go there since they are “tweaked” to reach top performance when rendering graphics rather than running deep learning models.

For this generation (Ampere), NVIDIA has several parts announced — the A40 and A6000 initially, followed by an A4000 and an A5000. It seems that the A40 and A6000 are going to be almost identical in terms of specs, with the big difference being that the A40 will be passively cooled.

Not much information was released otherwise, but they are targeted towards data centers requiring strong GPU processing for rendering tasks rather than for DL. If they end up being cheaper than an equivalent A100 there might be more reason to discuss them in the future. This table compares them to previous generation similar cards and to their DL-oriented counterparts — the V100 and A100.

NVIDIA’s Rendering-oriented graphics cards. Image: Me.

FP16 and TensorCores

Since the previous (20) series of cards, all of NVIDIA’s parts have TensorCores as well as CUDA cores. Those are cores meant to run single precision inference. When initially introduced this simply exploded, since performance doesn’t seem to be affected while inference speed sometimes doubles (and training becomes much faster as well).

Starting generation 30 this is somewhat the norm and NVIDIA doesn’t even mention the number of distinct Tensor cores in it’s marketing material.

Following generations (Turing, which was somewhat of a “generation 20 and a half”) added support for 8-bit and 4-bit inference as well, but those require careful calibration.

A “Bit” More on INT8

INT8 is a complicated matter, which should be very well understood inside your organization. Unlike the move to half precision/FP16, which is almost hassle-free, moving to INT8 can significantly hurt performance and should be thought of as a new research task rather than a quick engineering solution for speedup.

NVLINK; Connecting Cards Together

NVLINK is a hardware protocol by NVIDIA that allows connecting their cards (two or more) and forming a unified memory pool — which allows them to communicate without having to “pass through” the CPU memory. This actually speeds up many computations, hence is pretty cool, but unfortunately it is also disabled for GeForce (ala consumer-level) cards for connecting more than two cards. You can still use it (in the consumer-level variant called SLI) for two cards if that’s all that you have — but for more than two I wouldn’t really bother. How awful is the fact that your 2080ti/30XX-based, 8-GPU machine won’t have NVLINK support? Not too awful. For uses that require fast connection between GPUs (think synchronized batch-normalization) this might offer a nice speed boost but for general purposes you can do without it.

Cloud instances are an alternative to building an instance, but know the costs. Image: Pixabay

And What About Cloud Instances?

Google and Amazon both offer A100 and V100 parts in their cloud. Microsoft also added V100 instances in some locations but is extremely restrictive in actually letting people use them in 8-GPU setups (I am no longer using Azure, if this changed let me know). Google also offers P100 basd instances which present very good value for money. Google also offers their own TPUs which are blazing fast and are a good solution if you use TensorFlow.

Despite all of those, quick maths will show that if you are going to train models on a large number of GPUs and do so frequently (meaning such a machine will be fully utilized almost all the time) — purchasing a top-of-the-line training machine will be much more cost effective in the long run. If instead you are likely to train models occasionally but not all the time, a cloud instance can definitely be the wiser choice.

It’s also probably best to try using a cloud instance for a few weeks and make the decision afterwards. Also note that even organizations with a wide array of on-prem instances occasionally use cloud based instances during peak times. It’s also important to note that on-prem instances mean you need someone extremely well versed in handling their maintenance. Losing a day’s work for 2–3 data scientists can cost a lot more than the money saved from buying a deep learning machine.

A Bit on CPUs and PCI-E Lanes

How Many CPU Cores Do I Need? What are PCI-E Lanes?

Now that we’re done with the topic of graphics card, we can move over to the next part of training-machine-in-the-making — the Central Processing Unit, or, the CPU.

A GPU generally requires 16 PCI-Express lanes. The PCI-Express the main connection between the CPU and GPU. Thankfully, most off the shelf parts from Intel support that.
It’s connecting two cards where problems usually arise, since that will require 32 lanes — something most cheap consumer cards lack. Moving over to 4 cards we’ll be completely dependent on expensive Xeon-series (server-targeted) cards.

Do I Really Need That Many Lanes?

Well, short answer? Not really. Despite what I just said, we can actually generally do just as well with 8 lanes per GPU for training and inference. It’s said that extremely speed constrained tasks (such as low-latency trading) will definitely need all 16 supported lanes, though. Also note that aforementioned NVLINK also removes some of this requirement, if your cards support it.

As a general rule-of-thumb, don’t go lower than 8 lanes per GPU. Another important thing to note is that other parts which require PCI-E lanes, such as super-fast NMVe-based SSD drives.

Image: Wikimedia

Now Let’s Choose a CPU

Intel Core-series CPUs

During the first iteration of this guide, Intel had some parts with 28, 40 and 44 PCI-E lanes at very competitive prices (this was generation 8). Since then more generations came into the market (12, Alder Lake, was just announced) and those parts have been replaced with the more expensive enthusiast oriented “series X” parts. In turn, those parts are now the reigning champions of deep learning hardware due to both their speed and PCI-E lane abundance.

If you plan on building a machine with a single GPU, most i7/i9 parts of generation 11 have 20 lanes and will suit you perfectly. Generation 10 is still a terrific value for money and they all have 16 lanes.

If you need more cowbell (or lanes), the X-designated parts have 48 lanes — enough for 4 GPUs with 8 lanes each, and even some juice left for a pair of NVMe-based SSD drives. So far they only exist for generation 10 (Icy Lake), though. The cheapest part, the 10900X, sports 48 lanes and 10 cores, with 12, 16 and 18 core parts following suite. It seems that Intel has no intention of introducing the X-series for it’s current generation 11 (Rocket Lake) of CPUs.
Intel Ice Lake CPUs also have extensive hardware support for running deep models on the CPU. If you are into that kind of stuff I suggest reading more and understanding if it can be beneficial.

Before moving on, make sure you fully understand how CPU sockets work. I assume the reader knows what this means at this stage. X-series part use a more complex socket arrangement which in turn requires more expensive motherboards. Since we are discussing very expensive machines to begin with, this shouldn’t make a huge difference in practice.

Xeon Series CPUs

Xeon CPUs, mentioned briefly earlier, are Intel’s series of CPUs for servers and data-centers (unlike Core for desktops). They are more expensive for various reasons outside of this discussion. Among other things they support multi-CPU setups, which can in turn provide the full 16 PCI-E lanes per GPU you might want and need (since you can combine a couple of CPUs and their respective lanes). Also important is that those CPUs are much more resilient — similarly to NVIDIA’s Data-Center series of GPUs vs consumer-level GeForce GPUs — and should withstand much higher computational loads over long periods of time.

It’s also important to note that Xeon CPUs will increase overall system price for a few different reasons:

  • Xeons require ECC (error correcting code) memory, which is more expensive.
  • Memory, motherboards and enclosures supporting Xeons tend to be more expensive (but more resilient).
  • Xeon parts are themselves significantly more expensive versus equivalent Core i7/i9 parts.

Despite the above, if you care about multi-CPU systems running under extreme loads 24/7, you probably want a Xeon based machine. Xeon based machines were actually more popular in the past, but Intel since then added the aforementioned i9-X CPUs with plenty of PCI-E lanes for most consumer/enthusiast needs. Before moving on I’ll admit that I’m not as knowledgeable in building Xeon based machines and can’t recommend a specific CPU that is a good value for money among them. On top of that, there are a lot of different options for Xeon CPUs. Much more than for consumer level CPUs. One of the things that bump up their prices considerably is the number of supported CPU connections (some support 2-CPU systems while others support 4 and 8 CPU configurations and cost significantly more). They also come with much larger cache sizes, which can actually help a lot for various tasks but this is again outside the scope of this article.

If you are building an 8-GPU machine, you are 100% dependent on Xeon CPUs for enabling 8x8 PCI-E lanes (=64 total). Unless you go AMD, that is.

So, Where’s AMD?

This hasn’t been updated since November 2018 and will be left as-is. Please don’t kill me, AMD Fans!

Update: It does seem that people are now building DL machines with AMD CPUs; I might actually have to rewrite this if demand rises.

Well, AMD is offering something that — at least on paper — should’ve turned the entire industry on its face but in turn hasn’t really. That thing is its Epyc-named series of server-targeted CPUs, and the accompanying consumer-level part named Threadripper. Both of those offer 128 and 64 PCI-E lanes, respectively — potentially blowing Intel out of the water. Those also have plenty of cores (as high as 32), much more than an equivalently priced Intel CPU. So, why haven’t those taken the market by a storm?

  • Lower single-CPU performance (important for a lot of tasks that aren’t as fond of parallelization)
  • People are generally conservative and aren’t as willing to adapt new hardware by a less popular company (at the server space; the consumer space, especially for gamers, is very familiar with AMD and has been for years)
  • A large number of scientific computing libraries make extensive use of MKL acceleration; something that only intel supports in its CPUs; One notable example is OpenCV
  • PCI-Express lane abundance isn’t as simple as it sounds, and I will explain:

So, unlike Intel which has its own proprietary connection between CPU and motherboard, AMD CPUs actually use PCI-Express for that. This means that a 64 lane CPU actually has only 56 usable lanes. This in turn means that for 4 cards, you are again left with a 16x/16x/8x/8x configuration. This is still awesome, and more than Intel can provide, but compared to Intel’s i9 CPUs this is less of an issue.
Also, when building a machine costing thousands of dollars, saving 400–500$ for a cheaper CPU is perhaps not that important.

So it turns out that AMD is still mainly competitive with Intel for gaming machines, or people that are heavily into saving a few hundred dollars. Also, for building a twin-GPU machine I will probably go with them, due to enabling of 2x16 PCI-E and CPU savings vs. overall system price being larger.

A Note About Memory (RAM)

Even if a certain CPU fits (physically, socket-wise) to a specific motherboard which in turn supports a lot of memory (again, physically, via large number of slots) — it’s still possible that the CPU itself doesn’t. One of the differentiating factors (in terms of price) between CPUs is their ability to support a lot of memory. 64+ GB memory is usually only supported by expensive CPUs.

More on Hardware

It’s a bit difficult to remain up to date with respect to memory and motherboards since those update constantly with hardware refreshes. They are also (in my opinion) much less exciting and interesting to care for.

My advice to you? Following a choice of CPU/GPUs, understand the rest of your needs and pick a good motherboard that supports those. Read the fine print carefully and check that it supports what you need it to support. Some common mistakes are:

  • Buying a motherboard with an insufficient number of graphics-card slots.
  • Not having enough space between the graphics-card slots (that GPUs we are dealing with require two slots each due to their huge cooling solutions).
  • Buying a motherboard that doesn’t support the memory type we wanted to buy.

Motherboards

Motherboards supporting the latest and greatest tend to be similar and similarly-priced. Go with one of the leading companies; MSI, Asus, GIGABYTE to name a few. As mentioned several times, make sure it supports the CPU socket, memory amount and type, and required amount of PCI-E slots, spacing and number of lanes. Regarding the number of lanes, note that some motherboards have GPU-compatible slots that don’t actually allow the full 8/16 PCI-E lanes through (they are x4 lanes underneath). Avoid those. As for specific makes — look next in chipsets. Note that most motherboards, save for the most expensive variant, usually only support 3 GPUs.

On Chipsets

TLDR: The relevant chipsets you want to buy are series X299 for the latest Intel X-models and (coincidentally) series X399 for AMD CPUs.

Like the earlier part on CPUs and their motherboards, chipsets are a crucial part of motherboards and are what actually makes it support various CPU features. Flagship motherboards from leading brands will usually include the latest and greatest chipset which in turn supports everything you need.

CPU makers use different “sockets” for connecting its CPUs to motherboards. These are generally represented by terms such as “LGA2066”, which mean the number of physical connections (pins) between the processor and the socket. LGA means “land grid array”, implying that the pins are actually part of the motherboard rather than the CPU (this used to be the other way around).

Intel Generation 10 and 11 use socket LGA1200, replacing the older socket LGA1151 used until gen 9. For X-parts (also called HEDT for high-end desktop) it uses a different socket configuration due to their high power requirement — LGA2066, which still persists but should be replaced in the near future. Know those numbers when choosing your motherboard.

Memory

The general rule of thumb is having the CPU memory twice as lage as GPU memory. So for four 2080tis I’d get 2x11x4 = 88GB of memory. Since memories tend to come in 16GB increments, we will probably have a 96GB memory machine for such a configuration. Reigning memory makers off the top off my head; Corsair (have been up there for years!), HyperX, Patriot, G.Skill and also A-Data at the budget end of things.

Power Supply Unit — PSU

Your power supply should be able to carry the insane power requirements of the machine you are building. Machines with more than two GPUs will generally have two separate power supplies. So, if you are planning on adding more cards in the future, make sure your enclosure supports a second PSU for later on. Two GPU machines can settle for a single 1000W PSU. Good makers of PSUs are Antec, CoolerMaster, Corsair, SeaSonic, EVGA etc.

Cooling/Enclosure

Assuming this whole machine will be in its own separate, well-chilled room, we don’t really need specialty cooling solutions. If it won’t, you’ll want one that is quiet enough to work next to.

As for the enclosure, you’ll want one that is convenient to work with for installing everything and occasionally opening up for fixes/hardware swaps. Also note that PSUs that are given as a bundle with enclosures are usually junk — unless they come from real companies.

Hard Drives

This boils down to budget. The cheap option is a good, fast 1TB SSD for the currently-used dataset, with 4+ TBs for regular, slower and cheaper storage (model checkpoints etc).

If budget is no concern it’s of course nicer to only purchase fast SSDs. Since prices tend to go down all the time, if you feel like splurging you can definitely buy more SSDs. Also worth mentioning is backups; if you absolutely can’t afford to lose your data, consider a RAID solution (outside scope of this article).

Closing Comment — Gaming Machines

A lot of the hardware described above has a lot in common with popular high-end gaming machines. Save for the pretty over-the-top GPU setups, most components (motherboard, RAM, PSU etc) will feel right at home in a high-end gaming setup. If you are having some trouble Googling all the information here, change “deep learning” to “gaming” and you’ll be right at home

Good Luck!

Image: YouTube/Nvidia

--

--

I’m a deep learning consultant and leader with 10 years’ experience working on computer vision tasks. Ex: trigo.tech, Amazon, Disney Research.