Real-time eye tracking using OpenCV and Dlib
Learn to create a real-time gaze detector through the webcam in python with this tutorial.
The first step is to download the required packages. Installation via pip:
pip install opencv-python
pip install dlib
Or if you are using Anaconda then using conda:
conda install -c conda-forge opencv
conda install -c menpo dlib
Other than this we will need a facial keypoints detector that can detect eyes in real-time. For this will use a pre-trained network in the dlib library which can detect ’68 key points’ that was presented in this paper. The required pre-trained model can be downloaded from here. Dlib is used because it can give predictions in real-time, unlike a CNN model which was very important for me as I was making an AI for online proctoring.
Eye detection Using Dlib
The first thing to do is to find eyes before we can move on to image processing and to find the eyes we need to find a face. The facial keypoint detector takes a rectangular object of the dlib module as input which is simply the coordinates of a face. To find faces we can use the inbuilt frontal face detector of dlib. You can use any classifier for this task. If you want high accuracy and speed is not an issue for you then I would suggest you use a CNN as it will give much better accuracy especially for non-frontal facing faces and partially occluded faces as shown in the article linked below.
import cv2
import dlibimg = cv2.imread('image.png')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # convert to grayscale detector = dlib.get_frontal_face_detector()
rects = detector(gray, 1) # rects contains all the faces detected
At this moment we have the rectangular object of faces, we can pass it to our keypoint detector.
def shape_to_np(shape, dtype="int"):
coords = np.zeros((68, 2), dtype=dtype)
for i in range(0, 68):
coords[i] = (shape.part(i).x, shape.part(i).y)
return coordspredictor = dlib.shape_predictor('shape_68.dat')
for (i, rect) in enumerate(rects):
shape = predictor(gray, rect)
shape = shape_to_np(shape)
for (x, y) in shape:
cv2.circle(img, (x, y), 2, (0, 0, 255), -1)
A more extensive description of dlib can be found in this awesome article which I referred to while making this project.
Finding the center of eyeballs using OpenCV
We will be working with the live feed obtained through webcam. To open the webcam, read frames, and display it you can use the code shown below:
import cv2cap = cv2.VideoCapture(0)
while(True)
ret, img = cap.read()
cv2.imshow("Output", img)
if cv2.waitKey(1) & 0xFF == ord('q'): # escape when q is pressed
break
So now we how to read frames from webcam, it’s time to work on them to reach our goal. We create a new black mask using NumPy of the same dimensions as our webcam frame. Store the (x, y) coordinates of the points of the left and right eyes from the keypoint array shape and draw them on the mask using cv2.fillConvexPoly
. It takes an image, points as a NumPy array with data type = np.int32
and color as arguments and returns an image with the area between those points filled with that color.
def eye_on_mask(mask, side):
points = [shape[i] for i in side]
points = np.array(points, dtype=np.int32)
mask = cv2.fillConvexPoly(mask, points, 255)
return maskleft = [36, 37, 38, 39, 40, 41] # keypoint indices for left eye
right = [42, 43, 44, 45, 46, 47] # keypoint indices for right eye
mask = np.zeros(img.shape[:2], dtype=np.uint8)
mask = eye_on_mask(mask, left)
mask = eye_on_mask(mask, right)
After doing this we have a black mask where the eye area is drawn in white. This white area is expanded a little using a morphological operation cv2.dilate
. Using cv2.bitwise_and
with our mask as the mask on our image, we can segment out the eyes. Convert all the (0, 0, 0) pixels to (255, 255, 255) so that only the eyeball is the only dark part left. Convert the result to grayscale to make the image ready for thresholding.
kernel = np.ones((9, 9), np.uint8)
mask = cv2.dilate(mask, kernel, 5)
eyes = cv2.bitwise_and(img, img, mask=mask)
mask = (eyes == [0, 0, 0]).all(axis=2)
eyes[mask] = [255, 255, 255]
eyes_gray = cv2.cvtColor(eyes, cv2.COLOR_BGR2GRAY)
Thresholding is used to create a binary mask. So our task is to find an optimal threshold value against which we can segment out the eyeballs from the rest of the eye and then we need to find its center. But the threshold value will be different for different lighting conditions so we can make an adjustable trackbar for controlling the threshold value. In all fairness, I got this idea from Stepan Filonov who also tried to solve this problem of gaze detection in this article and used Haar cascade along with Blob detection. The thresholding processing steps namely erosion, dilation, and median blur are also taken from him but his final results were not convincing so I made this solution.
def nothing(x):
pass
cv2.namedWindow('image')
cv2.createTrackbar('threshold', 'image', 0, 255, nothing)
threshold = cv2.getTrackbarPos('threshold', 'image')
_, thresh = cv2.threshold(eyes_gray, threshold, 255, cv2.THRESH_BINARY)
thresh = cv2.erode(thresh, None, iterations=2)
thresh = cv2.dilate(thresh, None, iterations=4)
thresh = cv2.medianBlur(thresh, 3)
We have reached the final step of our project. The eyeballs are segmented out and we can utilize cv2.findContours
for finding them. Right now our background is white and our eyeballs are in black. However, in OpenCV’s cv2.findContours()
method, the object to find should be in white and the background is black. So we need to invert our thresh using cv2.bitwise_not
. Now we can find contours. Theoretically, we can say that all we need to do is now find the two largest contours and those should be our eyeballs. However, this leaves out a little room for false positives that can be tackled by finding the midpoint between the eyes and dividing the image by that. Then we find the largest contours in those divisions and should be our eyeballs. The key points 40 and 43 (39 and 42 in Python because index starts from zero) are used to find the midpoint. Find the largest contours on both sides of the midpoint by sorting it with cv2.contourArea
. We can utilize cv2.moments
to find the centers of the eyeballs.
def contouring(thresh, mid, img, right=False):
cnts, _ = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
cnt = max(cnts, key = cv2.contourArea) # finding contour with #maximum area
M = cv2.moments(cnt)
cx = int(M['m10']/M['m00'])
cy = int(M['m01']/M['m00'])
if right:
cx += mid # Adding value of mid to x coordinate of centre of #right eye to adjust for dividing into two parts
cv2.circle(img, (cx, cy), 4, (0, 0, 255), 2)# drawing over #eyeball with redmid = (shape[39][0] + shape[42][0]) // 2
contouring(thresh[:, 0:mid], mid, img)
contouring(thresh[:, mid:], mid, img, True)
On running this our code throws several errors like max() arg is an empty sequence
or division by zero
which are thrown when no contour is found or M['m00']
is zero respectively. To solve this, enclose the contouring function in a try block. We don’t need to do anything if an error is thrown which will occur only when eyes are not detected. The new contouring function will look like:
def contouring(thresh, mid, img, right=False):
cnts, _ = cv2.findContours(thresh, cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_NONE)
try:
cnt = max(cnts, key = cv2.contourArea)
M = cv2.moments(cnt)
cx = int(M['m10']/M['m00'])
cy = int(M['m01']/M['m00'])
if right:
cx += mid
cv2.circle(img, (cx, cy), 4, (0, 0, 255), 2)
except:
pass
Everything’s done. Just display img and thresh and set the threshold trackbar accordingly and enjoy. The complete code:
The complete code for online proctoring using AI can be found here on my Github.