A Review of NVIDIA GTC 2018 Conference — New GPUs, Deep Learning Acceleration, Data Augmentation, Autonomous Driving ..

Victor Dibia
Towards Data Science
9 min readApr 26, 2018

--

The conference theme was “AI and Deep Learning”.

NVIDIA DGX-2 Workstation, 10x faster than the DGX-1 released 6 months ago. Source.

4 days of excellent talks, keynote, demos, posters and interactions at the NVIDIA GPU Technology Conference. As a HCI researcher interested in applied AI, it was exciting to learn about advances in AI hardware (GPUs), technical talks covering both the science of AI and industry use cases of AI. This post summarizes my notes from the conference keynote and technical sessions I attended.

TLDR — some highlights

  • NVIDIA AI announced new GPUs, and native integration of their deep learning inference optimizer (TensorRT) with TensorFlow.
  • There is growing interest in building simulated environments that allow for testing self-driving car algorithms. NVIDIA announced a self-driving platform.
  • For many real world applications of AI, data augmentation strategies are increasingly important. See the Fatkun plugin for gathering training images, and imagaug library for image augmentation. Also see this blog post on data augmentation steps in python/Tensorflow.
  • Generative AI (GANs, evolutionary algorithms) are being explored for CAD and game design use cases — artistic content generation (characters, backgrounds), content reuse etc.
  • Researchers are exploring applications of AI in security — e.g. detecting Domain Generation Algorithms.
  • AI is being used to accelerate scientific research by speeding up simulations.

Keynote —Advances in GPUs, Applications in Graphics, AI, Autonomous driving.

GPUs

Several new GPUs were announced — the Quadro GV100 (10,000 cores) and the DGX-2 workstation (2 PFlOPS, 512GB HBM2, 10Kw, 350lbs). To put performance in perspective, the DGX-2 is 10x faster than the DGX-1 released 6 months ago. Examples in model training time further demonstrate the impact of GPU advances on the research and practice of AI.

Just 5 years ago, it took took 6 days to train Alexnet using 2 GTX 580s. Today, Alexnet can be trained in 18 minutes on a DGX-2. Pretty serious progress!

The Keynote also highlighted how GPUs now enable real time raytracing for motion graphics, and how work previously done on super computers can now be efficiently done on DGX workstations at a fraction of the price, power consumption and space requirements.

Medical Imaging Supercomputing Platform

CLARA — NVIDIAs system to enable medical imaging. Source.

NVIDIA also mentioned their foray into the creation of a medical imaging super computer — CLARA. This project is promising as it aims to extend the capabilities of existing medical imaging equipment using advances in AI/Deep Learning. There was an interesting video on how deep learning algorithms were used in 3D reconstruction of the heart (details on chamber size, blood flow etc) based on images captured on a 15 year old ultrasound scanner. More on CLARA here.

Autonomous Driving, Sensor Fusion

Obtaining training data and designing algorithms that fuse input from multiple sensors required for autonomous driving (sensor fusion) is hard. NVIDIA mentioned efforts to create a platform — NVIDIA DRIVE — that helps address these issues. Pretty good news for self driving car researchers.

The NVIDIA DRIVE platform combines deep learning, sensor fusion, and surround vision to change the driving experience. It is capable of understanding in real-time what’s happening around the vehicle, precisely locating itself on an HD map, and planning a safe path forward. Designed around a diverse and redundant system architecture, the platform is built to support ASIL-D, the highest level of automotive functional safety.

There were also a few other technical talks that highlight the need for high quality simulation environments to train/test self-driving car algorithms.

Tensorflow + TensorRT

NVIDIA AI also announce native integration of TensorRT with Tensorflow. NVIDIA TensorRT is a deep learning inference optimizer and runtime which speeds up deep learning inference through optimizations and high-performance runtimes for GPU-based platforms. If you are running your Tensorflow applications on NVIDIA GPUs, you can now add some lines of code that automatically enable TensorRT optimizations and speed ups!

Technical Talks

A selection of the technical talks I attended are detailed below.

Data Augmentation Strategies by Winston Hsu [Slides]

I found this talk to be interesting as it provided practical advice on ways to satiate the data hungry demands of supervised deep learning. The speaker begins with the premise that human annotation of data is an expensive process and proposes four approaches they use in their workflow to address this.

Web scraping: Approaches to efficiently scrape labelled data from websites and social networks.
Weakly Supervised Methods: Given a small dataset labelled by experts, we we can learn labelling strategies and apply this in labelling larger datasets.
Data Transformations: We can augment datasets by generating additional examples using simple linear transformations — e.g cropping, shifting, color casting, lens distortion, vignetting, random backgrounds etc. Example of a library for this sort of transformation is imageAug.
Data Synthesis: We can generate texturized CAD models as training data, we can add specific features to data such as adding glasses to facial images, and altogether synthesizing new images using GANs.

More can be found on the presenter’s slides here.

Driver Drowsiness Detection — Siddarth Varier from NVIDIA

Researchers from NVIDIA demonstrated some early work detecting drowsiness in drivers. The authors train a scaled down VGG16 model, and augment their training dataset using synthetic data generated from 3D face models. For classification they rely on predicted eye pitch angle over a time period.

Generative AI

Generative Design, Autodesk: In this talk, the presenter discussed some interesting ways in which evolutionary algorithms were used in generating CAD designs. Given a design challenge, the goal is usually to balance the cost (of materials) and performance. To this end, they have experimented with evolutionary algorithms that generate design candidates, while optimizing on parameters such cost/performance/manufacturing method etc and use automated stress tests (FEA analysis) as part of feedback. A specific example was given where an evolutionary algorithm came up with a high performance (and unusual looking) part of a motorbike.

A.I. Disrupting the Future of Content Creation for Games — Eric Risser, Artomatix

Left — original: We can see a clear vertical and horizontal seam line running through the center of the image, Right — Artomatix output:The seam was intelligently repaired with new features that appear realistic. Source — Artomatix Blog.

This talk focused on how AI accelerated workflows can be applied to aspects of the media industry (e.g. movies, video games). The presenter reveals that the video game industry spends 61% of its budget on artistic content generation — main character as well as background. Much of this efforts include a manual workflow. Creative or generative AI offers opportunities to improve this, across areas such as Texture Synthesis, Material Enhancement, Hybridization and Style Transfer. This includes methods that enable artists paint with structure, example based workflow (scanning real world objects and improving with AI) and photogrammetry. AI can also help with recycling old content e.g. up-res video. An industry use case was given with IKEA being able to easily scan 3D models of products which were then used in websites (studies showed having 3D models led to 20% higher sales on websites). See more details on the presenters company blog.

Growing Generative Models — Samuli Laine et Al
Researchers from NVIDIA presented some interested work on how to generate high resolutions images using Generative Adversarial Networks (GANs). Their approach addresses a known problem with GANs (mode collapse), speeding up and stabilizing the training process. The key idea is to grow both the generator and discriminator progressively: starting from a low resolution, and adding new layers that model increasingly fine details as training progresses. They highlight the potential of this work in generating assets for games and films and conditioning the GAN to determine output (e.g. male or female faces). More details can be found in their paper.

Examples of high resolution faces generated by their GAN. Paper.

Cyber Defense — Fighting DGAs with Machine Intelligence

Another interesting talk looked at how ML can be used to address some issues in security — detecting domain generation algorithms. Domain Generation Algorithms DGAs, are algorithms seen in various families of malware that are used to periodically generate a large number of domain names that can be used as rendezvous points with their command and control servers. They are used by hackers to communicate and exfiltrate data from networks, designed to circumvent traditional cyber defenses and have been extremely successful. They cite a recent paper “Inline DGA Detection with Deep Networks”.

Seeing AI, an app for Visually Impaired Persons — Anirudh Koul

Interesting talk by Anirudh Koul from Microsoft Research showing how his team built the seeingAI mobile app for visual impairment persons (VIP) .

He motivates the design of apps for accessibility by noting how this direction breed innovation. For example Ray Kurzweil invented a reading machine inspired by a conversation with a blind individual. Similar story with Graham Bell who invented the telephone while working on hearing aids.

What is niche for individuals with disabilities will become mainstream tomorrow.

The seeingAI app can help a VIP read text, identify people, scenes, handwriting, documents, products, currency, light sources etc. Perhaps the rather innovative part of this work is that most of these capabilities are performed locally on device. The presenter shares very interesting thoughts on their experience with this project.

  • Training the Vision Models: They experiment with several approaches in assembling their dataset and customvision.ai to train it. They suggest the Fatkun plugin for scrapping data with appropriate rights. They also discuss experiments with generating controlled synthetic data for currencies — e.g taking a currency, flipping, occluding, adding background etc. An important idea here is to ensure the model does not learn to predict a note just by seeing a non-discriminant feature (e.g. an edge with a zero .. which could be a 10 or 20 dollar bill).
  • Picking the best model for User Experience and Not for Validation Accuracy. The presenter makes an important point that a high validation accuracy may not translate well to good user experience. For example a model that is 95% accurate but occasionally labels a 100 USD bill as 10 USD is much less attractive to one which os 90% accurate but does not make this mistake. Optimize for precision. There are also UX considerations on when to speak during frame by frame analysis.
  • How are people using the app: A blind person who changes their pitch based on information from the app — face, emotion, etc. A blind school teacher who has their phone docked and which announces each student as they enter the classroom etc. Parents being able to read their kids homework or read Christmas messages for the first time.
    The presenter also noted that people self train to understand the limit of AI — what it can and cannot do.

Using ML to Accelerate Research

There were interesting presentations showing how machine learning can be used to accelerate scientific discovery by replicating expensive, time consuming simulations at a fraction of the time and cost. Especially in the physical sciences, it is common to have simulators that enable scientists test ideas — e.g. simulators for crash tests, chemical reactions, stress tests etc. Many of these simulations are complex, computationally expensive (sometimes up to billions of CPU hours per year) and can take days or weeks to complete a single simulation.The idea is to train ML models that learn the the processes used in these simulators and replicate their function — at a fraction of the time and costs. Related work from Google Research has demonstrated machine learning models that are as accurate and faster (300,000x) than existing simulators for predicting properties of organic molecules. At GTC there was a similar presentation on use of GANs in improving high energy physics simulators.

Accelerating Science with GANs — Michela Paganini

Accelerating Science with Generative Adversarial Networks: An Application to 3D Particle Showers in Multi-Layer Calorimeters. Paper here.

Michela Paganini gave an interesting presentation on how she was applying GANs to simulate calorimetry — one of the most computationally expensive aspects of simulations in high energy physics experiments. While the results are not perfect, they demonstrate some promise, with the potential of much faster research cycles. More details found in their arxiv paper here.

Conclusions

It was an overall interesting conference. There were over 900 sessions at the conference and what I cover here is a really small subset. Common themes I found were around the use of various data augmentation strategies, AI for creativity, simulation environments for self-driving cars, AI and security, and AI for improving accelerating research.

Got feedback or comments? Feel free to reach out — twitter, linkedin.

--

--

HCI/Applied AI Researcher with interests in usable AI. Principal RSDE at Microsoft Research, Google Dev Expert for ML. Additional posts on victordibia.com