Faster YOLOv4 Performance with CUDA enabled OpenCV

Build OpenCV with CUDA 11.2 and cuDNN8.1.0 for a faster YOLOv4 DNN inference fps.

Published in

Towards Data Science

6 min readFeb 25, 2021

Photo by Akash Rai on Unsplash | Detections by author

YOLO, short for You-Only-Look-Once has been undoubtedly one of the best object detectors trained on the COCO dataset. YOLOv4 being the latest iteration has a great accuracy-performance trade-off, establishing itself as one of the State-of-the-art object detectors. Typical mechanisms of employing any object detector in an intelligent video analytics pipeline involve accelerating model inference using a library like Tensorflow or PyTorch which are capable of operations on an NVIDIA GPU. OpenCV is used for image/video-stream input, pre-processing and post-processed visuals. What if I told you that OpenCV is now capable of running YOLOv4 natively with the DNN module utilizing the goodness of NVIDIA CUDA? In this blog, I will walk you through building OpenCV with CUDA and cuDNN to accelerate YOLOv4 inference using the DNN module.

Introduction

Most enthusiasts I know have GPU enabled devices. The goal for me has always been to make GPU acceleration mainstream. Well, who doesn’t like to go faster? I have used OpenCV 4.5.1, CUDA 11.2 and cuDNN 8.1.0 to get the ball rolling and make inference easier! First, you need to setup CUDA, then install cuDNN and finally conclude with building OpenCV. Also, the blog is divided into sections so that it is easier to follow!

CUDA 11.2 and cuDNN 8.1.0 installation

The section that has the highest chance of rendering your machine un-bootable. Just kidding! Do everything right and this should be a breeze.

Installing CUDA 11.2

Begin with downloading the deb file from the CUDA repository based on your platform.

Image by author | CUDA platform selection

Once you have selected your platform appropriately, you will be provided installation commands. If your platform is similar to that of mine, you can install it as follows —

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pinsudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600wget https://developer.download.nvidia.com/compute/cuda/11.2.1/local_installers/cuda-repo-ubuntu2004-11-2-local_11.2.1-460.32.03-1_amd64.debsudo dpkg -i cuda-repo-ubuntu2004-11-2-local_11.2.1-460.32.03-1_amd64.debsudo apt-key add /var/cuda-repo-ubuntu2004-11-2-local/7fa2af80.pubsudo apt updatesudo apt -y install cudasudo reboot

If done right, you should have the following output when you run nvidia-smi

Finally, finish off by pasting the following in your .bashrc or .zshrc

# CUDA
export CUDA=11.2
export PATH=/usr/local/cuda-$CUDA/bin${PATH:+:${PATH}}
export CUDA_PATH=/usr/local/cuda-$CUDA
export CUDA_HOME=/usr/local/cuda-$CUDA
export LIBRARY_PATH=$CUDA_HOME/lib64:$LIBRARY_PATH
export LD_LIBRARY_PATH=/usr/local/cuda-$CUDA/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH
export NVCC=/usr/local/cuda-$CUDA/bin/nvcc
export CFLAGS="-I$CUDA_HOME/include $CFLAGS"

Don’t forget to follow up with source ~/.bashrc or source ~/.zshrc

Installing cuDNN 8.1.0

For this, you will need to have an account with NVIDIA, so make sure you sign on. Once you do, head here and download the marked files.

Once you have the deb files downloaded, run the following commands —

sudo dpkg -i libcudnn8_8.1.0.77-1+cuda11.2_amd64.deb
sudo dpkg -i libcudnn8-dev_8.1.0.77-1+cuda11.2_amd64.deb

This marks the completion of NVIDIA CUDA and cuDNN installation!

Build OpenCV 4.5.1 from source

Here’s the fun bit, that gets me excited! This section will help you build OpenCV from source with CUDA, GStreamer and FFMPEG! There’s a long list of commands to execute, so get started.

First, install python developer packages —

sudo apt install python3-dev python3-pip python3-testresources

Next, let’s install dependencies needed to build OpenCV

sudo apt install build-essential cmake pkg-config unzip yasm git checkinstall
sudo apt install libjpeg-dev libpng-dev libtiff-dev
sudo apt install libavcodec-dev libavformat-dev libswscale-dev libavresample-dev
sudo apt install libgstreamer1.0-dev libgstreamer-plugins-base1.0-dev
sudo apt install libxvidcore-dev x264 libx264-dev libfaac-dev libmp3lame-dev libtheora-dev
sudo apt install libfaac-dev libmp3lame-dev libvorbis-dev
sudo apt install libopencore-amrnb-dev libopencore-amrwb-dev
sudo apt-get install libgtk-3-dev
sudo apt-get install libtbb-dev
sudo apt-get install libatlas-base-dev gfortran
sudo apt-get install libprotobuf-dev protobuf-compiler
sudo apt-get install libgoogle-glog-dev libgflags-dev
sudo apt-get install libgphoto2-dev libeigen3-dev libhdf5-dev doxygen

Numpy is one crucial python package for this build. Install it using pip —

pip3 install numpy

Now, you should have everything ready for the build. Run the following commands to download and extract the source —

mkdir opencvbuild && cd opencvbuild
wget -O opencv.zip https://github.com/opencv/opencv/archive/4.5.1.zip
wget -O opencv_contrib.zip https://github.com/opencv/opencv_contrib/archive/4.5.1.zip
unzip opencv.zip
unzip opencv_contrib.zip
mv opencv-4.5.1 opencv
mv opencv_contrib-4.5.1 opencv_contrib

Let’s prepare the recipe!

cd opencv
mkdir build && cd build

Make sure to change CUDA_ARCH_BIN based on your GPU.

cmake \
-D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_C_COMPILER=/usr/bin/gcc-7 \
-D CMAKE_INSTALL_PREFIX=/usr/local -D INSTALL_PYTHON_EXAMPLES=ON \
-D INSTALL_C_EXAMPLES=ON -D WITH_TBB=ON -D WITH_CUDA=ON -D WITH_CUDNN=ON \
-D OPENCV_DNN_CUDA=ON -D CUDA_ARCH_BIN=7.5 -D BUILD_opencv_cudacodec=OFF \
-D ENABLE_FAST_MATH=1 -D CUDA_FAST_MATH=1 -D WITH_CUBLAS=1 \
-D WITH_V4L=ON -D WITH_QT=OFF -D WITH_OPENGL=ON -D WITH_GSTREAMER=ON \
-D WITH_FFMPEG=ON -D OPENCV_GENERATE_PKGCONFIG=ON \
-D OPENCV_PC_FILE_NAME=opencv4.pc -D OPENCV_ENABLE_NONFREE=ON \
-D OPENCV_EXTRA_MODULES_PATH=../../opencv_contrib/modules \
-D PYTHON_DEFAULT_EXECUTABLE=$(which python3) -D BUILD_EXAMPLES=ON ..

You should see a successful build similar to this —

Make sure CUDA is detected and the build paths are accurate. If everything looks good, go ahead and execute the following commands to initiate the build —

make -j$(nproc)
sudo make install

To check if you built OpenCV successfully, run this command —

pkg-config --libs --cflags opencv4

It should give you an output like so on successful installation —

Image by author | OpenCV successful build

It’s great to see you make it this far! Now you should be all set to run the sample application.

Run the application

Go ahead and clone this repository and pull the weights. Start with install git-lfs

sudo apt install git git-lfs

Clone the repository with the model files

# Using HTTPS
git clone https://github.com/aj-ames/YOLOv4-OpenCV-CUDA-DNN.git
# Using SSH
git clone git@github.com:aj-ames/YOLOv4-OpenCV-CUDA-DNN.gitcd YOLOv4-OpenCV-CUDA-DNN/git lfs install
git lfs pull

You can run the application on either image, video webcam, or RTSP inputs.

# Image
python3 dnn_infernece.py --image images/example.jpg --use_gpu# Video
python3 dnn_inference.py --stream video.mp4 --use_gpu

# RTSP
python3 dnn_inference.py --stream rtsp://192.168.1.1:554/stream --use_gpu

# Webcam
python3 dnn_inference.py --stream webcam --use_gpu

P.S — Remove the --use-gpu flag to disable the GPU. Counter-productive isn’t it?

Some benchmarks for the geeks!

We wouldn’t be doing this if the gain wasn’t substantial. Trust me, it is! Running on GPU gave me an increase in FPS by 10–15x!

I have tested on two configurations

Intel Core i5 7300HQ + NVIDIA GeForce GTX 1050Ti
Intel Xeon E5–1650 v4 + NVIDIA Tesla T4

I’ll let the numbers do the talking!

|     Device     |     FPS      |    Device      |     FPS      |
| :------------: | :----------: | :------------: | :----------: |
| Core i5 7300HQ |     2.1      |   GTX 1050 Ti  |     20.1     |
| Xeon E5-1650   |     3.5      |   Tesla T4     |     42.3     |

The takeaway

GPU acceleration is percolating into several libraries and applications enabling users to run heavier workloads faster than ever! Computer Vision was once a piece of technology not accessible to all, but with improvement in neural networks and an increase in hardware compute capability, the gap has narrowed down significantly. With AI booming faster than ever, we are in for a lot of hardware flex! 💪