
The notion that machines or computers could perceive images in the real world and interpret them accordingly was once deemed impossible. In the modern generation, we know that there have been several rapid advancements that have lead to the swift progression in the field of computer or machine vision. In the upcoming years, there are numerous more developments that will take place in computer vision to make it more exceptional.
Computer vision has especially risen to popularity since the last decade. While theoretical concepts have existed for quite some time, it is due to modern technologies that have helped to propel the subject to a whole another level. Computer vision is the branch of Artificial Intelligence that enables systems to gauge an overall high-level understanding of images, videos, or other real-time entities and activities.
While the objective of this article is to focus on some significant terms of computer vision, I have previously covered the topic of OpenCV and the basics of computer vision in great detail. If you are new to the field, it is recommended that you check out the following article provided below first to gauge a deeper understanding of this subject and ensure that you can achieve the best results reading the contents of this article.
OpenCV: Complete Beginners Guide To Master the Basics Of Computer Vision With Code!
In this article, we will aim to cover ten of the most essential terminologies that are required for numerous Computer Vision applications. With the correct information and knowledge about these aspects, you can construct any type of computer vision projects with a mixture of machine learning or deep learning algorithms. Let us start exploring these ten concepts and learn more about them individually.
1. Image Formatting:
Image formatting and image manipulations are two of the most common terminologies that you will hear in the world of computer vision. Let us understand these key concepts in an intuitive manner to avoid any future confusion. The normal images that we perceive in the natural world are often images that have some width, some height and are normally of three channels because they usually contain a color composition of RGB images. Below is a typical representation of the image parameters.
Height of the Image = 512
Width of the Image = 512
Number of channels = 3
With image manipulations, you can perform a variety of useful operations that will help to reduce the computation requirements for machine learning or deep learning algorithms. Some of the methods of image manipulations include resizing, cropping, or converting them into grayscale images (this point will be discussed further in the next section). Image manipulations play a crucial role in computer vision applications.
With image formatting, our task is usually to achieve the best representation of the image, making it suitable for our particular task. This step could involve certain image manipulations, as discussed previously, and storing it in the desired format. The usual storage formats for images include PNG, JPG, JPEG, TIF, and other similar formats.
2. Grayscale:

In computer vision, grayscale images play a crucial role in a majority of the operations that have practical applications. Usually, the 0th range refers to the black color, and as the range consecutively increases, we start to reach the lighter shades until we finally encounter white on the 255th value. Hence, grayscale images can also be considered as a range of images that are grayish in color, and follow different shades of gray.
There are a lot of uses of converting your colorful RGB images into other formats, especially into grayscale images, to reduce the stress of computation as well reduce the CPU (or sometimes GPU) limitations. With the datasets converted into grayscale, you can process your computation faster and more effectively. You can also use it to simplify algorithms and extract the descriptors without consuming too many additional requirements. The simple code for conversion to grayscale is as follows.
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) # Convert the color image to grayscale image
cv2.imshow("Gray Picutre", gray) # Display Grayscale image
cv2.waitKey(0)
Another essential concept to understand is the topic of masking. A mask is a binary image consisting of zero- and non-zero values. If a mask is applied to another binary or to a grayscale image of the same size, all pixels which are zero in the mask are set to zero in the output image. All others remain unchanged. Refer to the following reference for further information.
3. IoU and Dice Loss:
While working with images and most computer vision tasks, we make use of certain essential computational metrics to analyze the performance of our models. The two main metrics for ensuring that the model is performing effectively during training and during the evaluation stages are the Intersection over Union (IoU) score and the dice loss.
The Intersection over Union (IoU), as the name suggests, is the overlap between the ground truth of the particular image and the prediction made by the model. It is often considered to be a better measure than accuracy and is used extensively for computer vision applications. Oftentimes, an IoU score above 0.5 can be considered a good score for practical applications.
Dice loss, on the other hand, is one of the measures that is used in image processing to gain an understanding of the similarity between two or more desired entities. The formula for the following may be written as follows after considering the true positive, false positive, and false negative values:

4. Anchor Boxes and NMS:

The following two terms, anchor boxes and non-maximum suppression, are two common words used in computer vision, especially in the implementation of object detection algorithms. Hence, it is significant to understand their importance in specific sequences and why they are utilized for these tasks.
Anchor boxes are useful for setting bounding boxes of predefined height and width in an image. They often capture numerous scales and aspect ratios of different objects that you are looking to specifically incorporate in your training model. There are numerous anchor boxes that are used for this purpose.
When there are many anchor boxes are used for a specific purpose, it could lead to certain issues and problems, especially due to overlapping. Hence, the concept of non-maximum suppression (NMS) is used to filter out excessive overlapping tiles and pass the processed result over to the neural networks for further computation and processing.
5. Noise:

Noise in images is basically any kind of disturbance that can potentially occur due to the interference of external factors that are polluting (depreciating) the quality of the image. Noise in an image can frequently be found as undesirable interferences in your pictures or images. While sometimes you may try to add some noise of your own for image generations (like in GANs), it is usually not something you want in your perfect images.
In the conditions where you deem the added noise is not required in your specific image, you can try to filter out this noise. Noise filtering is one of the methods of experimenting with the various tricks in the OpenCV module to get rid of the existing noise. Below is an example code that shows a similar demonstration of the same.
def custom_blur_demo(image):
kernel = np.ones([5, 5], np.float32)/25
dst = cv2.filter2D(image, -1, kernel)
cv2.imshow("custom_blur_demo", dst)
src = cv2.imread("lena_noise.png")
img = cv2.resize(src,None,fx=0.8,fy=0.8,interpolation=cv2.INTER_CUBIC)
cv2.imshow('input_image', img)
custom_blur_demo(img)
6. Blur Techniques:
blur = cv2.GaussianBlur(gray, (19,19), 0)
cv2.imshow("Blur Picture", blur) # Display Blur image
cv2.waitKey(0)
The Blurring or smoothing process is a crucial technique in computer vision for reducing the number of outliers in a particular image. Since we read about noise in the previous section, we have a brief understanding of the side-effects of the interference of noise in an image. One such method to prevent these interferences and filter this noise is to use blurring techniques.
In the above code block, we have used a popular blurring technique called the Gaussian Blur, which makes use of the Gaussian function and helps to smoothen the given image accordingly. In the specific Gaussian Blur code provided by OpenCV, the objective is to provide the source and the kernel size, which must always be a tuple of odd numbers to achieve the resulting smoothened image.
7. Edge Detection:

Edge detection is another important objective that is accomplished with computer vision. Edge detection helps to identify the various portions of an image with varying brightness or discontinuities in patterns due to which the required area can be found. It is also used to extract the structure of particular objects from an image and finds its utility in other computer vision applications like object detection.
The above image is an example of how a certain action of edge detection can be performed and how you can vary the detection capabilities by varying the thresholding factor. While there are many methods for performing object detection and there are a lot of unique applications for the same, below is a sample code block demonstrating the use of the Canny edge detector in OpenCV.
# Canny Edge Detection
edges = cv2.Canny(image=img_blur, threshold1=100, threshold2=200) # Canny Edge Detection
# Display Canny Edge Detection Image
cv2.imshow('Canny Edge Detection', edges)
8. FPS:
Frames Per Second (FPS) is another essential concept of computer vision that is usually used in the computation of video footage or real-time images. To understand the concept with further simplicity, let us consider a couple of examples. Assume you are watching a video or movie on a particular platform such as YouTube or your local software. You have the option to see how many frames per second the particular video is playing and what is the specific quality.
Another example is when you are trying to play a video game, you will have a certain number of frames per second which is the amount of content your device can process as the game progresses. For these tasks, 60 frames per second are usually considered to be a good figure. Similarly, for computing projects such as real-time object detection, the concept of FPS is paramount for determining the performance of your specific model.
For an understanding of a lot of concepts related to computer vision and deep learning, I am personally of the opinion that developers should try to construct at least a single game during their learning stages. Check out one of my previous articles that covers the necessity of constructing a game with Python and five reasons you should consider doing so yourself from the link provided below.
9. Segmentation:

One of the most significant operations that you can perform with computer vision and deep learning is the task of segmentation. Segmentation specifically means segregating the most essential elements or a specific desired element from the image so that you can perform further computation on the image, such as classification or localization. Clusters of a particular image are taken into consideration, and they are segmented accordingly, distinguishing them from other elements.
The normal process of segmentation includes considering a particular dataset along with their respective masks. With the respective image and ground truth for that particular image, we can train a model for certain samples with methods like a U-Net or other similar architectures. Once you successfully construct the model, you can perform numerous segmentation tests on other images and isolate (segregate) the essential requirements.
10. Object Detection:

Another essential contribution of computer vision that is paramount to the performance of several real-life problems is the task of object detection. Object detection is one of the most crucial applications in today’s world and finds its use cases in numerous problems related to object tracking (like tracking a cricket ball or baseball), facial recognition for recognition of faces, robotic applications, self-driving cars, and so much more, which will ultimately benefit humans on a large scale.
Since object detection is one of the most essential elements of computer vision tasks, there are several methodologies, tools, algorithms, and libraries to approach this task. Some of the best algorithms include Histogram of Oriented Gradients (HOG), Faster R-CNN, YOLO, and many other similar algorithms and techniques. There are also several libraries like Image AI and Detectron2 that perform these tasks effectively. Check out the following article on Object Detection Algorithms and Libraries to gain further information on this topic.
Conclusion:

"There is great potential to use computer vision technology in a constructive and benevolent way." – Fei-Fei Li
Computer vision is one of the greatest achievements that humans have accomplished. With the rapid progression in this field, it is now possible to achieve almost any kind of machine vision task related to images, videos, or real-time captures with ease. As there are a bunch of future operations and projects that are yet to be uncovered, we have amassed massive potential in the following field.
In this article, we have explored the ten most significant terminologies related to computer vision. With the perfect processing, understanding, and knowledge of these resources, it becomes possible for most developers to accomplish any kind of specific computer or machine vision project with the right tools and technologies. Ensure that you remember or keep a note of these terms for future pursuits.
If you have any queries related to the various points stated in this article, then feel free to let me know in the comments below. I will try to get back to you with a response as soon as possible.
Check out some of my other articles that you might enjoy reading!
5 Best Python Projects With Codes That You Can Complete Within An Hour!
14 Pandas Operations That Every Data Scientist Must Know!
7 Best UI Graphics Tools For Python Developers With Starter Codes
15 Numpy Functionalities That Every Data Scientist Must Know
Thank you all for sticking on till the end. I hope all of you enjoyed reading the article. Wish you all a wonderful day!