GPU-Optional Python

Write code that exploits a GPU when available and desirable, but that runs fine on your CPUs when not

Carl M. Kadie

Published in

Towards Data Science

5 min readNov 11, 2020

The wonderful CuPy library allows you to easily run NumPy-compatible code on a NVIDIA GPU. However, sometimes

you (or your users) don’t have a compatible GPU, or
you don’t have a CuPy-compatible Python environment, or
your code runs slower on our GPU than on your multiple CPUs.

By defining three simple utility functions, you can make your code GPU-optional. (Find the utility functions’ definitions in this small GitHub project.)

When we tried to use CuPy to add GPU-support to FaST-LMM — our open-source genomics package — we ran into three problems. We solved each problem with a simple utility function. Let’s look at each problem, the utility function that solves it, and examples.

Problem 1: Control whether NumPy or CuPy is Used

Suppose you want to generate an array of simple random DNA data. Rows represent individuals. Columns represent DNA locations. Values represent an “allele count” that can be 0, 1, 2, or NaN (for missing). Additionally, you want to

default to generating a NumPy array
generate a CuPy array when requested via a string, array module, or environment variable
fall back to NumPy when a request for CuPy fails — for example, because your computer contains no GPU or because CuPy isn’t installed.

The utility function array_module (defined in GitHub) solves the problem. Here is the resulting data generator that use uses array_module:

def gen_data(size, seed=1, xp=None):
    xp = array_module(xp)
    rng = xp.random.RandomState(seed=seed)
    a = rng.choice([0.0, 1.0, 2.0, xp.nan], size=size)
    return a

Input:

a = gen_data((1_000,100_000))# Python 3.6+ allows _ in numbers
print(type(a))
print(a[:3,:3]) # print 1st 3 rows & cols

Output:

<class 'numpy.ndarray'>
[[ 1. nan  0.]
 [ 0.  2.  2.]
 [nan  1.  0.]]

As desired, the DNA generator defaults to returning anumpy array.

Notice gen_data’s optional xp parameter. As xp passes through thearray_module utility function, here is what happens:

If you haven’t installed the CuPy package, xp will be numpy.
Otherwise; if you specify the strings'cupy' or 'numpy', your specification will be respected.
Otherwise; if you specify array modules cupy or numpy, your specification will be respected.
Otherwise, if you set the 'ARRAY_MODULE' environment variable to either 'cupy' or 'numpy', your specification will be respected.
Otherwise, xp will be numpy.

Let’s see it work on my machine which has a GPU and CuPy installed:

Input:

a = gen_data((1_000,100_000),xp='cupy')
print(type(a))
print(a[:3,:3]) # print 1st 3 rows & cols

Output:

<class 'cupy.core.core.ndarray'>
[[ 0. nan  0.]  
 [ 2.  2.  2.]
 [ 0. nan  1.]]

As expected, it generates a cupy array, as requested.

Aside: Notice that NumPy and CuPy generate different random numbers, even when given the same seed.

Next, let’s request CuPy via an environment variable.

Input:

# 'patch' is a nice built-in Python function that can temporarily
# add an item to a dictionary, including os.environ.
from unittest.mock import patchwith patch.dict("os.environ", {"ARRAY_MODULE": "cupy"}) as _:
    a = gen_data((5, 5))
    print(type(a))

Output:

<class 'cupy.core.core.ndarray'>

As expected, we can request a cupy array via an environment variable. (Also, using patch, we can set an environment variable temporarily.)

Problem 2: Extract the "`xp`" Array Module from an Array

Suppose you want to “standardize” an array of DNA data. Here “standardize” means to make the values for each column have mean 0.0 and standard deviation 1.0, and to fill in missing values with 0.0. Additionally, you want this to work

for NumPy arrays even when you haven’t or can’t install the CuPy package
for both NumPy arrays and CuPy arrays when you can

The utility function get_array_module (defined in GitHub) solves the problem. Here is a standardizer that uses get_array_module:

def unit_standardize(a):
    """
    Standardize array to zero-mean and unit standard deviation.
    """
    xp = get_array_module(a)    assert a.dtype in [
        np.float64,
        np.float32,
    ], "a must be a float in order to standardize in place."    imissX = xp.isnan(a)
    snp_std = xp.nanstd(a, axis=0)
    snp_mean = xp.nanmean(a, axis=0)
    # avoid div by 0 when standardizing
    snp_std[snp_std == 0.0] = xp.inf    a -= snp_mean
    a /= snp_std
    a[imissX] = 0

Notice how we use get_array_module to set xp to the array module (either numpy or cupy). Then we use xp to call functions such as xp.isnan.

Let’s standardize a NumPy array:

Input:

a = gen_data((1_000,100_000))
unit_standardize(a)
print(type(a))
print(a[:3,:3]) #1st 3 rows and cols

Output:

<class 'numpy.ndarray'>
[[-0.0596511   0.         -1.27903946]
 [-1.32595873  1.25433129  1.21118591] 
 [ 0.          0.05417923 -1.27903946]]

On my computer, this runs fine, returning an answer in about 5 seconds.

Next, let’s standardize a CuPy array:

Input:

a = gen_data((1_000,100_000), xp='cupy')
unit_standardize(a)
print(type(a))
print(a[:3,:3]) #1st 3 rows and cols

Output:

<class 'cupy.core.core.ndarray'>
[[-1.22196758  0.         -1.23910541]
 [ 1.24508589  1.15983351  1.25242913]
 [-1.22196758  0.          0.00666186]]

On my computer, standardize runs faster on the CuPy array, returning an answer in about 1 second.

Aside: So is the GPU faster? Not necessarily. The CPU run above only used one of my six CPUs. When I use other techniques — for example, Python multiprocessing or C++ code with multithreading — to run on all 6 CPUs then the runtimes become comparable.

Problem 3: Converting between NumPy and "`xp`"

Suppose your data starts as a NumPy array and you need to covert it to whatever your desired xp array module is. Later, suppose your data is an xp array and you need to covert it to an NumPy array. (These situations happen when using packages such as Pandas that know about NumPy, but not CuPy.)

The built-in function xp.asarray and the utility function asnumpy (defined in GitHub) solves the problem. Here is an example:

Input:

a = gen_data((1_000,100_000))
print(type(a)) # numpy
xp = array_module(xp='cupy')
a = xp.asarray(a)
print(type(a)) # cupy
unit_standardize(a)
print(type(a)) # still, cupy
a = asnumpy(a)
print(type(a)) # numpy
print(a[:3,:3]) # print 1st 3 rows and cols

Output 1:

<class 'numpy.ndarray'>
<class 'cupy.core.core.ndarray'>
<class 'cupy.core.core.ndarray'>
<class 'numpy.ndarray'>
[[-0.0596511   0.         -1.27903946]
 [-1.32595873  1.25433129  1.21118591]
 [ 0.          0.05417923 -1.27903946]]

This example generates a random NumPy array. Converts it to CuPy (if possible). Standardizes it. Converts (if necessary) to NumPy. On my computer, it runs in about 2 seconds.

If CuPy is not installed, the code still runs fine (in about 5 seconds), producing this output:

Output 2:

WARNING:root:Using numpy. (No module named 'cupy')
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
[[-0.0596511   0.         -1.27903946] 
 [-1.32595873  1.25433129  1.21118591] 
 [ 0.          0.05417923 -1.27903946]]

Aside: Notice that we now see the same results whether CuPy is installed or not. Why? Because this example always uses NumPy to generate the random data.

Conclusion

We’ve seen how three utility functions enable you to write code that works with or without a GPU.

array_module — if possible, sets xp according to a user’s input (including an environment variable). Defaults and falls back to NumPy if necessary.
get_array_module— sets xp according to an array. Works even if the cupy package isn’t or can’t be installed.
xp.asarray and asnumpy — Converts from and to NumPy

In addition to enabling code that works with or without a GPU, writing your code GPU-Optional offers an extra benefit. GPUs will sometimes fail to speed up your work. By making your code GPU-optional you gain the ability to turn off GPU processing when it doesn’t help.

You can find the definitions of these utility functions in this small GitHub project.

GPU-Optional Python

Write code that exploits a GPU when available and desirable, but that runs fine on your CPUs when not

Problem 1: Control whether NumPy or CuPy is Used

Problem 2: Extract the "xp" Array Module from an Array

Problem 3: Converting between NumPy and "xp"

Conclusion

Written by Carl M. Kadie

Problem 2: Extract the "`xp`" Array Module from an Array

Problem 3: Converting between NumPy and "`xp`"