The world’s leading publication for data science, AI, and ML professionals.

Speed Cubing for Machine Learning

Episode 2: Using GPUs, RAPIDS, CuPy and VisPy libraries

Photo by Michael Dziedzic on Unsplash
Photo by Michael Dziedzic on Unsplash

Introduction

Previously on "Speed Cubing for Machine Learning"…

Speed Cubing for Machine Learning

In Episode 1 [1], we described how to generate 3D data as fast as possible to feed some Generative Adversarial Networks, using CPUs, multithreading and Cloud resources. We reached a rate of 2 billion data points per second ! In this Episode 2, we are going to benefit from GPUs through a dedicated framework called RAPIDS. Also, we will see how to visualize the generated data, thanks to a GPU accelerated library named VisPy.

Our new objective is to go even faster at creating cubes and to be able to properly visualize them.

As a reminder, our raw data are still represented by a numpy.ndarray, built like this:

# CREATING THE RAW DATA (NUMPY FORMAT)
arr_data = np.zeros(shape=(100, 1000, 1000))  # numpy.ndarray
# 'files_list' is a list of parquet file paths
for i, j in enumerate(files_list):
    df = pd.read_parquet(j)  # (1000 x 1000) random-valued DataFrame
    layer = df.to_numpy()
    arr_data[i, :, :] = layer

And the function to generate a cube is written as follows:

# FUNCTION GENERATING A CUBE FROM RAW DATA (FULL NUMPY VERSION)
def create_cube_np(dimc):
    rnd_cols = random.sample(range(1, 1000), dimc)
    # First 100 rows, 100 random columns (vectorization)
    cube = arr_data[:, :100, rnd_cols]
    return cube

Also, here are the specs of our local machine:

Table 1: specs of our local machine.
Table 1: specs of our local machine.

RAPIDS Setup

RAPIDS [2] **** is a GPU oriented data science framework. RAPIDS uses optimized NVIDIA CUDA primitives and high-bandwidth GPU memory to accelerate the execution of your Python code. It is simple to use, and is available as some conda packages, docker images or source builds. We recommend you to choose the selected configuration bellow (Figure 1), except for the CUDA version that must be compatible with your hardware. Don’t forget to follow the prerequisites on their website before diving in:

  • GPU: A modern NVIDIA graphic card with compute capability 6.0+
  • OS: Linux Ubuntu 16.04/18.04 or CentOS 7 with gcc/++ 7.5+
  • Docker: Docker CE v19.03+ and nvidia-container-toolkit
  • Compatible CUDA & NVIDIA Drivers
Figure 1: The RAPIDS Release Selector (source: https://rapids.ai/start.html)
Figure 1: The RAPIDS Release Selector (source: https://rapids.ai/start.html)

Once you execute the above command (Figure 1) in a terminal, you will download the docker image corresponding to the selected configuration (this can take some time).

Then, you will be able to run a local Jupyter Lab (an improved Notebook), where you have the whole technical environment already set up for you (Linux, Python, RAPIDS, CUDA library, etc.), along with several demos to explore the usage of tools like CuML, CuGraph or XGBoost. Time to have fun with your graphic card !

CuPy Library

In order to harness the power of your GPU, you will use the CuPy library [3]. CuPy is an open-source array library accelerated with NVIDIA CUDA, higly compatible with NumPy and provides GPU accelerated computing with Python. CuPy is already included in the RAPIDS framework. You will find the CuPy official documentation here.

Now, this is where the magic happens ! In only one line, we are going to perform what we call a data transfer by moving our previous numpy.ndarray to the current GPU device. Our data array will now be considered as a GPU object :

# DATA TRANSFER - MOVING DATA TO THE GPU DEVICE
import cupy as cp
arr_data_gpu = cp.asarray(arr_data)

Now, Let’s see how long it takes to generate 10,000 cubes of dimension 100 x 100 x 100 in a simple loop (we don’t need to care about multithreading anymore):

# GENERATING CUBES USING GPU
n = 10000
for _ in range(n):
    c = create_cube_np(100)

Actually, the total time is only 2.04s. Hence a rate of ~4900 cubes/s !

This is quite impressive because it’s nearly 2,5x faster than the best solution we previously found in the Cloud (the ‘n2-highcpu-80′ virtual machine instance, with 80 cores, Intel Cascade Lake @2.80 GHz, 80GB of RAM). And, it’s only on our single local GPU (NVIDIA GeForce GTX 1080 Ti).

CPU vs GPU

Source: https://blogs.nvidia.com/blog/2009/12/16/whats-the-difference-between-a-cpu-and-a-gpu/
Source: https://blogs.nvidia.com/blog/2009/12/16/whats-the-difference-between-a-cpu-and-a-gpu/

"The Central Processing Unit (CPU) has often been called the brains of the PC. But increasingly, that brain is being enhanced by another part of the PC – the GPU (Graphics Processing Unit), which is its soul."

Here, we will share some simple insights that may explain why GPUs are so fast compared to CPUs. CPUs have a few complex cores which run processes sequentially, with a lot of cache memory that can handle a few software threads at a time. In contrast, GPUs have hundreds of simple cores which allow parallel computing, that can handle thousands of threads simultaneously (Table 2). GPUs were designed with one goal in mind: process graphics really fast. Bandwidth is one of the main reasons why GPUs are faster for computing than CPUs.

Table 2: CPU vs GPU (source: https://blogs.nvidia.com/blog/2009/12/16/whats-the-difference-between-a-cpu-and-a-gpu/)
Table 2: CPU vs GPU (source: https://blogs.nvidia.com/blog/2009/12/16/whats-the-difference-between-a-cpu-and-a-gpu/)

VisPy library

VisPy [4] is a high-performance interactive 2D/3D Data Visualization library leveraging the computational power of modern GPUs through the OpenGL library. You will find the official documentation here.

We finally can visualize one generated cube (Figure 3). Here, we adapted a code from the volume rendering example you can find on their GitHub repository.

Figure 3: 3D Visualization of a generated cube with random values (image by author).
Figure 3: 3D Visualization of a generated cube with random values (image by author).

Conclusion

In this episode, we have seen how to take advantage of the Gpu to increase the rate of 3D data generation and how to visualize them. In the last episode of this series, we will tackle the multiGPUs challenge. See you in Episode 3 !

References

[1] N. Morizet, "Speed Cubing for Machine Learning – Episode 1" (2020), Towards Data Science.

[2] RAPIDS: open GPU Data Science (2020).

[3] CuPy: a NumPy-compatible array library accelerated by CUDA (2020).

[4] VisPy: a Python library for interactive scientific visualization (2020).

About Us

Advestis is a European Contract Research Organization (CRO) with a deep understanding and practice of statistics, and interpretable machine learning techniques. The expertise of Advestis covers the modeling of complex systems and predictive analysis for temporal phenomena. LinkedIn: https://www.linkedin.com/company/advestis/


Related Articles