RTX 2060 Vs GTX 1080Ti Deep Learning Benchmarks: Cheapest RTX card Vs Most Expensive GTX card

Training time comparison for 2060 and 1080Ti using the CIFAR-10 and CIFAR-100 datasets with fast.ai and PyTorch libraries.

Eric Perbos-Brinck
Towards Data Science

--

TLDR #1: despite half its VRAM, and half its retail price, the RTX 2060 can blast past the 1080Ti in Computer Vision, once its Tensor Cores are activated with ‘FP16’ code in PyTorch + Fastai.

Less than a year ago, with its GP102 chip + 3584 CUDA Cores + 11GB of VRAM, the GTX 1080Ti was the apex GPU of last-gen Nvidia Pascal range (bar the Titan editions).
The demand was so high that retail prices often exceeded $900, way above the official $699 MSRP.

In Fall 2018, Nvidia launched its newest Turing line-up, named “RTX”, with Raytracing Cores and Tensor Cores. Prices as a whole jumped significantly: for example, the RTX 2080Ti retails for $1,150 and more.

One key feature for Machine Learning in the Turing / RTX range is the Tensor Core: according to Nvidia, this enables computation running in “Floating Point 16”, instead of the regular “Floating Point 32", and cut down the time for training a Deep Learning model by up to 50%.

About a month ago (Jan 7, 2019), Nvidia released the cheapest GPU of the Turing line-up: the RTX 2060.

Using Jupyter notebooks, I trained ResNet models 18 to 152 on each Cifar dataset with FP32 then FP16, to compare the time required for 30 epochs.

With Fastai, switching from FP32 to FP16 training is as simple as adding `.to_fp16()`at the end of your regular code.

  • The regular FP32 version, with a pre-trained Resnet 18 model:

learn = create_cnn(data, models.resnet18, metrics = accuracy)

  • The FP16 version:

learn = create_cnn(data, models.resnet18, metrics = accuracy).to_fp16()

That’s it and you now have access to the RTX Tensor Cores !

Note: for more information on training with “FP16”, also known as “Mixed-Precision-Training (MPT), check those excellent posts.

The TLDR #2 in two charts.

Note: ‘bs * 2’ indicates a batch_size twice larger, as in 256 vs. 128.

The Setup

Hardware: I use a “real-life” high-end gaming PC with the following specs

  • AMD Ryzen 7 1700X 3.4GHz 8-cores
  • MSI X370 Krait Gaming motherboard
  • 32 GB DDR4–2400 RAM
  • 1 TB Nvme Samsung 960 EVO
  • Asus GTX 1080Ti-11GB Turbo ($800)
  • Palit RTX 2060–6GB ($350)

These parts are from my personal usage, and are not paid nor sponsored by any company, publisher or vendor.

Software: I use Ubuntu and Windows 10 in dual-boot.

  • Ubuntu 18.04 + Anaconda/Python 3.7
  • CUDA 10
  • PyTorch 1.0 + fastai 1.0
  • Nvidia drivers 415.xx

Note: before each training session, the cards were switched to a secondary PCie slot (x8) not handling the PC dual-monitors, thus assuring their computing power was 100% focused on the training.
I mention this as I’m using two monitors on a 2007
Ergotron LX Dual-Stand (my best long-term PC purchase ever !): one 24" in 1080p (vertical) and one 27" in 1440p (landscape) both connected to the same GPU, thus one can assume it would steal “some” computing power away from training.

A screenshot of my dual monitors: the 24" vertical on left, the 27" landscape on right.

BTW, if you want to check the impact of dual-monitor display on training performance, scroll down the bottom of the article for a comparison using the 1080Ti.

A quick summary of the two GPUs specs.

The 1080 Ti in the GTX 10 line-up (the last one):

The 2060 in the RTX 20 line-up (the first one):

Key points:

  • The RTX 2060 has roughly HALF the number of CUDA cores of the 1080Ti (1920 vs. 3584)
  • Its memory bandwith is about 70% of the 1080Ti (336 vs 484 GB/s)
  • It has 240 Tensor Cores (source) for Deep Learning, the 1080Ti has none.
  • It is rated for 160W of consumption, with a single 8-pin connector, while the 1080Ti is rated for 250W and needs a dual 8+6 pin connector.
  • It costs less than HALF the retail price of the 1080Ti (in Stockholm, Sweden).

Additional information:

  • Methodology: to keep things comparable, I ran every benchmark in three versions.
    - The version “FP32” and version “FP16” used the same batch_size for the 1080Ti and the 2060 (one could argue that the 1080Ti has approx twice the VRAM of the 2060, but I chose that approach. Feel free to run the tests yourself).
    - The version “FP16 bs*2” used a batch_size twice larger, to benefit from the theory behind FP16 training (see the two linked posts above for details).
  • The Jupyter notebooks I used, including all duration for 30 epochs, are available in my GitHub repo.
    You’ll need Fastai V1 to run them.
  • The spreadsheet I used for duration, time-scale and charts, is located the repo as well.

CIFAR-10 benchmarks

Resnet 18

  • Duration for “Time-To-Complete 30 epochs” in seconds:
  • Time-scaled:

Resnet-34

  • Duration in seconds:
  • Time-scaled:

Resnet 50

  • Duration in seconds:
  • Time-scaled:

Resnet 101

  • Duration in seconds:
  • Time-scaled:

Resnet 152

  • Duration in seconds:
  • Time-scaled:

CIFAR-100 Benchmarks

Resnet 18

  • Duration in seconds:
  • Time-scaled:

Resnet 34

  • Duration in seconds:
  • Time-scaled:

Resnet 50

  • Duration in seconds:
  • Time-scaled:

Resnet 101

  • Duration in seconds:
  • Time-scaled:

Resnet 152

  • Duration in seconds:
  • Time-scaled:

BONUS:

I compared the performance of the 1080Ti as a stand-alone GPU (no display) in red color vs. a main GPU (handling a dual-monitor, see earlier discussion) in blue color.

Note: I used Cifar-10.

  • Duration in seconds:
  • Time-scaled:

--

--

Deep Learning practitioner// Founder: BravoNestor, Comptoir-Hydroponique, Maison-Kokoon, My-Tesla-in-Paris// Carrefour Hypermarket executive. Insead MBA:1 PhD:0