Compare brute force to image integral using Google Colab
In my quest to learn more about computer vision and python, I’ve been reading about the Viola-James object detection frame work, best summarized by the paper here.
In it, they describe the concept of an image integral, a sort of summed area table for an array describing an image, with their point being – brute force calculating the integral is much slower than calculating the image integral ahead of time and just pulling the number from the array. This calculation is frequently used in computer vision applications.
I was curious – how much faster? The paper came out in 2001, surely we have processors now that are much much faster (20 years later) that this does not make a difference? Well, I was very wrong.
Follow along: Google Colab Notebook
Image Integral

Using the top left as index 0,0 (x=0, y=0), the integral is calculated horizontally and vertically.
In the integral image (right), the index (x=1, y=0) is 5 which is the sum of the values contained in (0,0) and (1,0) in the original image (so, 4+1=5)
Similarly, integral image index (2,2) = 9, which is the sum of the top left 2×2 block: 4+4+1+0 = 9.
I’ve given a few more simple examples:

Image Setup
Let’s walk through the setup of the image, uploading and converting it, etc. First, import your libraries:
I’m going to use this picture of a pug:

Great, now that the image is uploaded, time to do a few things: convert it to black and white and then into a Numpy array:
We will use the image integral function from open cv (cv2). It pads the integral with zeroes so we will remove those in line 2 below.
Starting line 4, we will define a brute force function for comparison – all it does is sum over rows and columns:
Multiple CPU Iterations
I’m going to do three types of iterations and record the time it takes to perform each one of these calculations. We will run each iteration 500 times to try to minimize variability.
When doing deep learning on images, 500 times is probably a very low number; image calculations are probably on the order of millions to billions.
We will also select position (1000,1000) in the image above for our calculations.
Iteration 1 – Brute Force the Image Integral
This is the ‘control’ time:
Iteration 2 -Repeat Image Integral Calculation
Here we will repeat the cv.integral with each repeat, instead of calculating it once:
The result? 121.8x! Even if we recalculate the integral with each repeat.
Image Integral is: 121.8 times faster
Iteration 3 – Use the Pre-calculated Image Integral
This is analogous to calculating the integral once for an image and then doing hundreds of calculations on the same image.
Result = 33073.65x. Thirty three thousand times faster.
Image Integral is: 33073.65 times faster
GPU Calculation
Of course, all deep learning students are big fans of the GPU but in this case, it does not pan out.
There is GPU overhead such as CUDA initialization, kernel invocation, memory allocation that makes this process much slower. Doing 500 repeats was very slow and I had to drop it to 50 repeats so I don’t sit around forever.
Result? CPU Image Integral calculated once (Iteration 3 above) is almost 37,000x faster than the GPU.
The GPU calculations were actually slower than all of the CPU iterations.
If you missed it above, here is the Colab Notebook
References
[1] Viola-Jones Object Detection Framework, Wikipedia, Accessed October 2020
[2] OpenCV-Python, Accessed October 2020
[3] CuPy, Accessed October 2020