
We live in the era of video calls. Conducted over the internet and using whatever camera that comes with your laptop or computer, we broadcast our lives to our classmates, coworkers, and families.
Sometimes, though, we don’t want to broadcast our space. My office, like many others, has a few perennial pieces of clutter. I also have a guitar on the wall behind me, which doesn’t always scream professionalism.
As a result, Zoom and other video calling software includes a feature to hide your background, usually behind an image of your choice. While most don’t think of it much, the actual task of determining what determines the foreground and the background in an image is hardly trivial.
Foreground Detection
Foreground detection is one of the most prominent applications in Computer Vision. Aside from the example of video calls, foreground detection may be used in finding and reading text in an image, determining where obstacles are in autonomous vehicles, and many other applications.
As a result, many sophisticated methods have been developed to distinguish the foreground from the background.
Opencv provides a couple of "out-of-the-box" solutions; however, without any other context, these are black boxes that don’t present much opportunity to learn. Instead, I’ll use a custom-built algorithm that takes advantage of several OpenCV modules to achieve a similar result.
Edge Detection and Contours
The method I’ll demonstrate is foundational on two concepts: edge detection and contours.
Edge detection, like the name implies, attempts to find the lines of contrast, or edges, in an image. This key first step pre-processes the image to help differentiate any objects. Several methods of edge detection exist, but the Canny method is both immensely popular and comes packaged with OpenCV.
Once the edges are found, finding contours become much easier and more accurate. In computer vision, contours are simply the continuous boundary lines between areas of contrasting color or intensity. Unlike edge detection, finding contours will find prominent shapes within the image.
The Algorithm
As previously mentioned, the pre-packaged background removers in OpenCV will not be used. Instead, the below flow-chart outlines the method I’ll use:

First, we’ll take the image and convert it to black and white. Next, edge detection will be applied and the contours in the image will be found. Any contours that are too either too big or too small to be the foreground will be removed.
The remaining contours will be considered the foreground. This makes some intuitive sense, as especially small details in a busy background will generate very small contours. By contrast, very large contours which take up most the screen probably aren’t the foreground, but some visual artefact of the background.
Finally a mask is generated from the remaining contours and is blended into the original image.
Implementation
import numpy as np
import cv2
Before doing much, two libraries need to be imported. NumPy works to make some the number-crunching more efficient. OpenCV handles the image manipulation.
# Parameters
blur = 21
canny_low = 15
canny_high = 150
min_area = 0.0005
max_area = 0.95
dilate_iter = 10
erode_iter = 10
mask_color = (0.0,0.0,0.0)
Next, a set of variables are assigned that will influence how the background is removed. Each variable has a unique effect, which may need to be fine tuned based on the subject of the video. In short:
- blur: affects the "smoothness" of the dividing line between the background and foreground
- canny_low: the minimum intensity value along which edges will be drawn
- canny_high: the maximum intensity value along which edges will be drawn
- min_area: the minimum area a contour in the foreground may occupy. Taken as a value between 0 and 1.
- max_area: the maximum area a contour in the foreground may occupy. Taken as a value between 0 and 1.
- dilate_iter: the number of iterations of dilation will take place on the mask.
- erode_iter: the number of iterations of erosion will take place on the mask.
- mask_color: the color of the background once it is removed.
Some of these explanations may not make sense, yet, but they’ll be explained further as they appear in the code. In the meantime, feel free to take the values provided as defaults to get started.
# initialize video from the webcam
video = cv2.VideoCapture(0)
Next, the web camera is initialized, if available. The 0 may be substituted for the path of video file if you don’t have one.
while True:
ret, frame = video.read()
An infinite loop is started by reading the frames from the camera. The read method returns 2 values:
- A boolean to tell if the camera worked properly, stored in the ret variable
- An actual frame from the video feed, recorded in the frame variable.
if ret == True:
# Convert image to grayscale
image_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
# Apply Canny Edge Dection
edges = cv2.Canny(image_gray, canny_low, canny_high)
The if-clause allows the code to proceed only if the camera correctly captured video. The frame is rendered into grayscale so the next step, edge detection, may take place.
Setting the intensity value minimum (the canny_low variable) dictates how sensitive contrast must be to be detected. Adjusting it too low may result in more edges detected than necessary.
Setting the intensity value maximum (the canny_high variable) dictates that any contrast above its value will be immediately classified as an edge. Adjusting it too high may affect performance, but pushing it too low may miss out on important edges.
edges = cv2.dilate(edges, None)
edges = cv2.erode(edges, None)
This step is strictly optional, but dilating and eroding the edges make them more pronounced and returns a nicer final product.
# get the contours and their areas
contour_info = [(c, cv2.contourArea(c),) for c in cv2.findContours(edges, cv2.RETR_LIST, cv2.CHAIN_APPROX_NONE)[1]]
There’s a lot going on in this line, but it’s written this way for performance. Essentially, the OpenCV function findContours returns an array of information. Of this information, we only care about the the contours, which are indexed at 1.
For all the contours found, a tuple of the actual contour itself and its area are stored in a list.
# Get the area of the image as a comparison
image_area = frame.shape[0] * frame.shape[1]
# calculate max and min areas in terms of pixels
max_area = max_area * image_area
min_area = min_area * image_area
The area of the image is calculated and then maximum and minimum areas are determined.
Ideally, this step would be taken outside of the loop, since it hinders performance. Taking an initial frame before streaming or simply knowing your camera’s dimensions beforehand would be more performant. With that said, to keep the code a little simpler for demonstration, this is sufficient.
# Set up mask with a matrix of 0's
mask = np.zeros(edges.shape, dtype = np.uint8)
Next, a mask is created, which at this point is a matrix of 0’s.
# Go through and find relevant contours and apply to mask
for contour in contour_info:
# Instead of worrying about all the smaller contours, if the area is smaller than the min, the loop will break
if contour[1] > min_area and contour[1] < max_area:
# Add contour to mask
mask = cv2.fillConvexPoly(mask, contour[0], (255))
For all the contours found, the area of the contours is compared against the minimum and maximum values. If a contours is bigger than the minimum and smaller than the maximum, the contour is added to the mask.
If the contour is either smaller than the minimum or bigger than the maximum, it is not considered part of the foreground.
# use dilate, erode, and blur to smooth out the mask
mask = cv2.dilate(mask, None, iterations=mask_dilate_iter)
mask = cv2.erode(mask, None, iterations=mask_erode_iter)
mask = cv2.GaussianBlur(mask, (blur, blur), 0)
Like before, dilating and eroding the mask are technically optional, but creates a more aesthetically pleasing effect. The same principle applies to the Gaussian blur.
# Ensures data types match up
mask_stack = mask_stack.astype('float32') / 255.0
frame = frame.astype('float32') / 255.0
These lines convert both the mask and the frame to the required data types they need to be to blend together. It’s a mundane, but important pre-processing step.
# Blend the image and the mask
masked = (mask_stack * frame) + ((1-mask_stack) * mask_color)
masked = (masked * 255).astype('uint8')
cv2.imshow("Foreground", masked)
Finally, the mask and the frame are blended together so that the background is blacked out. The last line then displays the result.
# Use the q button to quit the operation
if cv2.waitKey(60) & 0xff == ord('q'):
break
else:
break
cv2.destroyAllWindows()
video.release()
As some last minute clean up, the first few lines create an exist condition. If the "q" is pressed on the keyboard, it will break the loop and terminate the program.
The else connects back to the if-statement made earlier about the camera correctly capturing a frame. If the camera fails, it will also break the loop.
At last, once the loop is broken, the window displaying the resulting image is closed and the camera is shut down.
Results
If all goes well, an output window should be created displaying real-time background removal. While the algorithm here works well enough for very simple background, it may have more trouble distinguishing more complex backgrounds that are "busy" or cluttered. Overall, however, it works well enough to demonstrate the concept.
The below gives the ideal case, where I stand against a plain white wall:

The algorithm is easily able to distinguish myself from the wall. There’s some stuttering which may need to be smoothed, but for a first attempt, it does well.
Conversely, here’s the result for a worst case scenario where I leaned up against a bookcase:

Very busy backgrounds, such as bookcases filled with books and other accessories, will confuse the algorithm and lead to less than perfect results. It struggles to distinguish the foreground from background as large swaths of my arm and face flicker into the background.
I did exaggerate this result a little. I put my back against the bookcase, which amplifies the effect. If I stood further in front of the bookcase, the results wouldn’t have been so bad; however, it illustrates the difficulty of background subtraction under less than ideal circumstances.
In reality, most attempts will produce something in between the best and worst case scenarios.
Conclusions
The intertwined concepts of foreground detection and background subtraction are among some of the most studied aspects of computer vision. While many methods exists, a simple application of edge detection and finding contours within an image provides a good basis.
Using OpenCV’s built-in functions, the approach used was able to render background removal in real-time. Under ideal conditions, the algorithm worked near flawlessly, but some additional tweaking may be needed for complex or busy backgrounds.