The world’s leading publication for data science, AI, and ML professionals.

Upgrading and rebuilding the host

Things went South fast after my last all Night hack

DOING DATA SCIENCE FROM SCRATCH TASK BY TASK

Photo by Denny Müller on Unsplash
Photo by Denny Müller on Unsplash

If you drive and have experienced a left front blow out, I can comfortably say I know how you feel seeing that picture from Denny Müller available on Unsplash. During my driving career, I had two dangerous blowouts. Once while entering a severe bend, I got a blowout, losing control, and that was utterly terrifying. More recently, while navigating another severe turn, I hit a rock, which shredded my left front tire. An utterly incredible and gut-wrenching experience. Perhaps you will not necessarily agree when I say that ‘no Cuda capable device found’ in my log files was a similarly painful and frustrating experience for me and one that wasn’t as easy to fix as a blowout.

If you have been reading my articles, you will know from my last contribution that I Pulled an all-nighter – Configuring GPU libraries on Ubuntu 20.4 LTS. It was difficult to contain my sheer excitement having TensorFlow GPU, and OpenCV 4.0 configured on my Data Science Machine. I wasn’t all that confident though, like those flat tires, things can and usually do go wrong. Literally, I expected the whole thing to break, and it did.

The dependency Gods have won another battle. Let us discuss what happened and how I fixed it. This time I feel more comfortable but only time will tell.

Photo by Hasan Almasi on Unsplash
Photo by Hasan Almasi on Unsplash

What happened

What I know and what appear to be the facts are as follows. On December 31, I installed TensorFlow 2.0 GPU and compiled OpenCV with the Cuda libraries. I also installed Virtual Box and did some work with some Ubuntu and Windows VM’s. For the first week in January 2021, everything was fine. Since everything was working nice, I created a FastApi endpoint and wrapped my Yolo Class, providing me with an endpoint that I could use. I was delighted with the set-up. Using Curl or Postman, I could send a photo to the endpoint and get back the objects detected in JSON. The response time from the API was pretty good. My next step was to integrate the webhook and calls from the Motion Eye devices moving me towards real-time triggers and alerts. It seems crazy that I was making so much progress.

I had some professional stuff that required 120% of my attention, but I remember some updates on Ubuntu and I choose to implement those on my build. That seems to have been a wrong move! After the update, I used my Bird detection camera and wondered if Yolo would correctly detect and classify the local birds. So I made an API call to my FastAPI endpoint and sent the first picture. In Postman, I got an error message from the endpoint. Looking at the logs, I had the ‘no Cuda capable GPU found’ error. Whether you agree with my analogy or not, I felt like I was stabbed by a knife in the heart. It seemed yet another blowout in my car journey through life. In a way, I was saddened at the loss of the configuration and capability. Perhaps it is overly dramatic for a development machine, but this error on a production host would be troublesome.

It wasn’t only my Deep Learning toolchain that was disrupted, though! Virtual Box started giving me errors as well. So I couldn’t use my VM’s either, and that was also painful. I had done some OSINT (open source Intelligence) configurations and was enjoying that journey as well.

There were some lessons that I had to embrace at this point and let me share those so you can reflect on your own practice.

What did I learn?

For me, the biggest thing was that I had to accept that how I went about my all-nighter might have contributed to my problems.

The first thing is I had added Ubuntu 18.04 repositories to my Ubuntu 20.04 system. Worse I might even have added older repositories back to 16.04. I suggest that this really was a horrible idea. Before you issue sudo add-apt, it might be wise to consider the impact on the System.

sudo add-apt-repository universe
sudo add-apt-repository multiverse
sudo apt update

Next, there were bugs and discrepancies in some binaries. Virtual Box 6 has some problem with copy & paste from host to guest or guest to host. To get around that problem, I installed Virtual Box 5, but I had a mix of libraries in my APT repository on the System.

The biggest lesson for me is that age-old adage – "if it ain’t broke then don’t fix it!" and that applies to accepting potentially unnecessary updates or upgrades without doing any research. That was the real killer.

The final lesson is that we cannot blindly copy & paste commands from tutorials and hop from tutorial to tutorial until something works! Searching google and executing commands in the hope of a fix will only lead to an unstable system and eventual ‘no Cuda capable device found’.

The root cause of some of my difficulties appeared to be unsigned binaries with Ubuntu requiring all binaries to be signed. So I am guessing a security update was installed that triggered some of the trouble with the older repositories.

Ok, you can say, and I can hear you, these things are complicated and many dependencies must be loaded. It certainly isn’t easy, and things can go wrong. How did I fix it? Well, I got it by being methodical and careful! That is the bottom line. Let me explain! How I fixed it

How I fixed it

Fixing my issues involved a couple of different things. First, I had to accept the lessons and reconcile myself to doing things differently. Then there was a period of reflection and a review of the host hardware. Finally setting a direction and executing a plan to get back on track. Let me explain the steps in a short summary.

Reflecting on the host and defining the hardware required

My Linux Data Science workstation was initially built as a gaming Rig for my Daughter. It was configured and made specifically to play the Sims 4 at maximum performance. Thinking back I figured it was about 6 years ago. When I bought the components, I was on a budget, so I added a GPU which was already superseded at that time. I came to the conclusion that the hardware needed to be checked and updated.

Now, you might say and think that this upgrade could be avoided. After all, I had the CUDA Open CV DNN module compiled on the GTX 750TI, and I was happy with the response time from my endpoint before the error. The reality is I also had compilation errors. Consider the cmake command for building OpenCV with Cuda.

cmake -D CMAKE_BUILD_TYPE=RELEASE 
-D CMAKE_INSTALL_PREFIX=/usr/local 
-D INSTALL_PYTHON_EXAMPLES=ON 
-D INSTALL_C_EXAMPLES=OFF 
-D OPENCV_ENABLE_NONFREE=ON 
-D WITH_CUDA=ON 
-D WITH_CUDNN=ON 
-D OPENCV_DNN_CUDA=ON 
-D ENABLE_FAST_MATH=1 
-D CUDA_FAST_MATH=1 
-D CUDA_ARCH_BIN=7.0 
-D WITH_CUBLAS=1 
-D OPENCV_EXTRA_MODULES_PATH=~/opencv_contrib/modules 
-D HAVE_opencv_python3=ON 
-D PYTHON_EXECUTABLE=~/.virtualenvs/opencv_cuda/bin/python 
-D BUILD_EXAMPLES=ON ..

CUDA_ARCH_BIN=7.0, in particular, is the troublesome line. Again you cannot just copy & paste and execute this cmake command. On my System the CUDA_ARCH_BIN = 5.0

From Nivida's guide to GPU, CUDA Library and Compute capabilities
From Nivida’s guide to GPU, CUDA Library and Compute capabilities

Still, compiling Cuda 10, OpenCV 4 with CUDA_ARCH_BIN = 5.0 just produced Cuda compilation errors. So I tried Cuda 9, Open CV 3 with CUDA_ARCH_BIN = 5.0. I then got an error message telling me that 5.0 wasn’t an allowed value and needed CUDA_ARCHBIN > 5.3. So you know where this is all headed._ It is difficult to explain or even recall how many times I wiped the System, installed a fresh Ubuntu installation, calling CMake, and installing TensorFlow GPU. TensorFlow is very well documented and wasn’t really the problem; instead, I struggled with compiling OpenCV to get my CV2 Python bindings with Cuda. In the end, I decided that it really was time for a new Graphics Card.

My current GPU is the GTX 750Ti, and that according to NVIDIA was a "first-generation NVIDIA® Maxwell™ architecture". You can find the full specifications on the NVIDIA product site if you wish. The latest GPU architecture is Ampere – NVIDIA’s 2nd gen RTX architecture and there is a good description over at NVIDIA’s product-specific page. Now jumping from a 1st Generation Maxwell to a 2nd generation RTX will involve a pile of cash, which will also be painful. Pain in that there is the need to exchange many beer tokens for another GPU and then those cards are in limited supply, impossible to get in Ireland, during the COVID-19 global pandemic.

Cloud versus on premises

Another argument that we will all grabble with is the available GPU instances over on IBM, Google, Oracle, AWS, and Microsoft. Should we rent a GPU instance rather than buy, install and configure a local workstation. There will never be a straight forward answer to this one. It will come down to the project, budget available, security and privacy required.

I won’t feel comfortable connecting my front door system to the public cloud. Consider the article "Dozens of people who say they were subjected to death threats, racial slurs, and blackmail after their in-home Ring smart cameras were hacked are suing the company over "horrific" invasions of privacy." from the Guardian.

It is crucial to consider the number of systems required to run a service. Those are:

  • The production instance or instances;
  • Pre-production, staging, or User Test environments;
  • Development environments to integrate code for testing;
  • Development machines for individual engineers

When we look at the situation in terms of the number of environments, it becomes easier to see the value of using Cloud-Based services. When we talk about a personal workstation to do experiments, design services and write books or papers, the cloud doesn’t make that much sense. I do have GPU based VM’s over on AWS, but occasionally I get huge bills because I forget to spin down development machines.

All things considered, I decided I would purchase a new card and install that in my existing System. I know the cloud arguments well, but I also know that things can quickly get out of hand.

Putting a plan in place

Having made the decision to purchase a new card, I set about discovering what I could get. The COVID-19 pandemic is worse than ever here (Ireland) and to add to the problems we are impacted by BREXIT. After a bit of hunting around, I managed to get a Quadro P2000 delivered. The Quadro P2000 product description includes the salient points for me. It has a "Pascal GPU with 1024 CUDA cores, large 5 GB GDDR5 on-board memory", from the Nvidia product page.

Moving from the first-gen. Maxwell architecture to the Pascal framework allows me to jump from Compute Capability 5.0 to 6.1. To me, that seemed reasonable for the money involved. With the new device available, all that remained was to install the card and make it work! Execute the plan dude!

Executing the plan

Since I built the workstation myself, I had little difficulty removing the cover and gaining access to the machine’s guts. Removal of a couple of screws allowed me to wiggle the GPU free from the slot. Please don’t forget that the motherboard has a locking mechanism that you must open before pulling on the card. Otherwise, you will break things! Installing the card, replacing the cover and connecting up the peripherals is really easy. I do not plug my monitors into the GPU card directly as I also have a GPU onboard my CPU chip. When training deep neural networks, it is better to remove the screen workloads or even just boot into the terminal. Using the computer with the UI is a significant workload for the GPU and slows things down. You cannot watch Netflix while your Deep Neural Network is training on the same machine!

I decided that it would be best to wipe the drives and install Ubuntu 18.04 as a new install to re-do the configuration. Looking around, reading NVIDIA CUDA documentation, showed a relatively minimal reference to Ubuntu 20.04, so I figured I’d want to stay with 18.04.

To configure TensorFlow 2.0 I used the following tutorial

How to install TensorFlow 2.0 on Ubuntu – PyImageSearch

To configure OpenCV DNN and Cuda I used this tutorial

How to use OpenCV’s "dnn" module with NVIDIA GPUs, CUDA, and cuDNN – PyImageSearch

For Virtual Box I am using version 6.0.24, as shown below, but I am already getting notified of the newer version. Resistance is not futile! Ignore it!

Image by author showing the About Virtual Box dialogue
Image by author showing the About Virtual Box dialogue
Image by the author showing the upgrade notice for Virtual Box.
Image by the author showing the upgrade notice for Virtual Box.

Bi-directional copy & paste functions do not work, but I have decided to live without those. Installation of Virtual Box 5.0 failed on my new Ubuntu system.

This time I did not add any new repositories. No sudo apt-add! But yet I have already gotten two ‘system issues’ reported to the UI from Ubuntu 18.04. Again I am nervous that this new build will also run into problems shortly.

Bulls Eye – it all works again. OMG! What a moment when my Yolo Class finally worked again. ‘module was not compiled with CUDA!’ is no more! Below you can see a piece of the code that deals with devices to be used.

coprocessor = {
'cuda': {
 'backend' : cv2.dnn.DNN_BACKEND_CUDA,
 'target' : cv2.dnn.DNN_TARGET_CUDA
},
'cpu': {
 'backend': cv2.dnn.DNN_BACKEND_DEFAULT,
 'target': cv2.dnn.DNN_TARGET_CPU
},
'myriad': {
 'backend' : cv2.dnn.DNN_BACKEND_INFERENCE_ENGINE,
 'target' : cv2.dnn.DNN_TARGET_MYRIAD
}
}
Photo by Anne Nygård on Unsplash
Photo by Anne Nygård on Unsplash

With the System working, but clearly, OS-level issues exist, I can think again about moving forward.

Moving forward

There remains a lot of work to be done to get back to where I was. Nginx must be installed, Gunicorn, FastAPI, and I need to re-do my OSINT Virtual Machines. But it is a good day! I have an upgraded GPU and improved performance. Naturally, I installed Visual Code as the absolute first thing I did though! To illustrate that new performance, I ran some tests.

from myYoloC import myYolo
my = myYolo(device='cuda')
a = my.run()
myY = myYolo(device='cpu')
a = myY.run()

Previously I created a class called myYolo, and in the above code, I simply create an instance of the object with two different devices. The example ‘my’ is made with the Cuda backend and the ‘myY’ model assigned to the CPU. Five code lines could literally break my spirit, but I had to be brave and execute those. One of the features I really like, about Visual Code, is the virtual environment support.

Image by the author showing the available Python environments
Image by the author showing the available Python environments

My opencv_cuda environment contains the compiled with Cuda version of OpenCV 4.0. My ‘dl4cv’ virtual environment has a TensorFlow GPU configuration. This was in keeping with the tutorial from pyimagesearch.com.

Selecting the correct environment allows me to run the code.

Image by the author from Visual Code after code execution
Image by the author from Visual Code after code execution

So loading Yolo from disk, and doing a forward pass with 192 photos took 7.57 seconds with an average time of .039 seconds per image. That is a fantastic performance from the new GPU. On the CPU it took 44.75 seconds with an average of .23 seconds per image. That isn’t bad either, but users wouldn’t be happy with such a delay. There are around 7 seconds of construct and de-construct tasks around CPU and GPU signalling and of course, module loading in both code fragments.

When I ran the same code on the GTX 750TI, I got a different answer. That was a first Gen. Maxwell versus a Pascal class after all so I’d have been surprised if the answer was the same. As the screenshot below, from my last article shows, the GPU average was .13 for Maxwell compared to .04 for Pascal. That is approximately 1/3th of the time. Amazing! The same job took 31 seconds on the old GPU but only 14 seconds with the new card.

Image by the author based on the previous article.
Image by the author based on the previous article.

CPU based processing will not change irrespective of what GPU we install. I did not install OpenVino on the Linux box so I do not know how the Intel Myriad device would do. On the Raspberry Pi board, we know that Myriad is quick.

Images by the Author from the previous article
Images by the Author from the previous article

From the above images, top right shows CPU based processing on ARM with Raspberry Pi 4 8GB – the job took 32 minutes, and everything got hot. On Myriad, top left the same workload took 3 minutes. I guess those statistics demonstrate why using the Pascal architecture was so attractive to me. The better equipped Linux workstation can do forward pass neutral network inference in 14 seconds with 192 images. These facts all leave me super excited to continue my mission of counting passing cars using Yolo, Computer Vision and cheap Raspberry Pi based motion detectors.

The icing on the cake is perhaps demonstrated using TensorFlow

import tensorflow as tf
print(tf.__version__)
print(tf.test.is_gpu_available())
Image by the author of the result of is_gpu_available()
Image by the author of the result of is_gpu_available()

is_gpu_available() = True

Isn’t that amazing. There is a sense of having had my Tyre fixed. All the wheels on my car are right, and I am out driving on the open road. Enjoy!

Photo by KAUE FONSECA on Unsplash
Photo by KAUE FONSECA on Unsplash

Related Articles