Real-time eye tracking using OpenCV and Dlib

Learn to create a real-time gaze detector through the webcam in python with this tutorial.

Vardan Agarwal
Towards Data Science

--

Result of eye tracking

The first step is to download the required packages. Installation via pip:

pip install opencv-python
pip install dlib

Or if you are using Anaconda then using conda:

conda install -c conda-forge opencv
conda install -c menpo dlib

Other than this we will need a facial keypoints detector that can detect eyes in real-time. For this will use a pre-trained network in the dlib library which can detect ’68 key points’ that was presented in this paper. The required pre-trained model can be downloaded from here. Dlib is used because it can give predictions in real-time, unlike a CNN model which was very important for me as I was making an AI for online proctoring.

dlib facial keypoints
Dlib facial keypoints. The image is taken from here.

Eye detection Using Dlib

The first thing to do is to find eyes before we can move on to image processing and to find the eyes we need to find a face. The facial keypoint detector takes a rectangular object of the dlib module as input which is simply the coordinates of a face. To find faces we can use the inbuilt frontal face detector of dlib. You can use any classifier for this task. If you want high accuracy and speed is not an issue for you then I would suggest you use a CNN as it will give much better accuracy especially for non-frontal facing faces and partially occluded faces as shown in the article linked below.

import cv2
import dlib
img = cv2.imread('image.png')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # convert to grayscale detector = dlib.get_frontal_face_detector()
rects = detector(gray, 1) # rects contains all the faces detected

At this moment we have the rectangular object of faces, we can pass it to our keypoint detector.

def shape_to_np(shape, dtype="int"):
coords = np.zeros((68, 2), dtype=dtype)
for i in range(0, 68):
coords[i] = (shape.part(i).x, shape.part(i).y)
return coords
predictor = dlib.shape_predictor('shape_68.dat')
for (i, rect) in enumerate(rects):
shape = predictor(gray, rect)
shape = shape_to_np(shape)
for (x, y) in shape:
cv2.circle(img, (x, y), 2, (0, 0, 255), -1)
Output obtained after applying dlib facial keypoints
Output obtained.

A more extensive description of dlib can be found in this awesome article which I referred to while making this project.

Finding the center of eyeballs using OpenCV

Dart hitting bull’s eye
Photo by Anastase Maragos on Unsplash

We will be working with the live feed obtained through webcam. To open the webcam, read frames, and display it you can use the code shown below:

import cv2cap = cv2.VideoCapture(0)
while(True)
ret, img = cap.read()
cv2.imshow("Output", img)
if cv2.waitKey(1) & 0xFF == ord('q'): # escape when q is pressed
break

So now we how to read frames from webcam, it’s time to work on them to reach our goal. We create a new black mask using NumPy of the same dimensions as our webcam frame. Store the (x, y) coordinates of the points of the left and right eyes from the keypoint array shape and draw them on the mask using cv2.fillConvexPoly. It takes an image, points as a NumPy array with data type = np.int32 and color as arguments and returns an image with the area between those points filled with that color.

def eye_on_mask(mask, side):
points = [shape[i] for i in side]
points = np.array(points, dtype=np.int32)
mask = cv2.fillConvexPoly(mask, points, 255)
return mask
left = [36, 37, 38, 39, 40, 41] # keypoint indices for left eye
right = [42, 43, 44, 45, 46, 47] # keypoint indices for right eye
mask = np.zeros(img.shape[:2], dtype=np.uint8)
mask = eye_on_mask(mask, left)
mask = eye_on_mask(mask, right)

After doing this we have a black mask where the eye area is drawn in white. This white area is expanded a little using a morphological operation cv2.dilate. Using cv2.bitwise_and with our mask as the mask on our image, we can segment out the eyes. Convert all the (0, 0, 0) pixels to (255, 255, 255) so that only the eyeball is the only dark part left. Convert the result to grayscale to make the image ready for thresholding.

kernel = np.ones((9, 9), np.uint8)
mask = cv2.dilate(mask, kernel, 5)
eyes = cv2.bitwise_and(img, img, mask=mask)
mask = (eyes == [0, 0, 0]).all(axis=2)
eyes[mask] = [255, 255, 255]
eyes_gray = cv2.cvtColor(eyes, cv2.COLOR_BGR2GRAY)

Thresholding is used to create a binary mask. So our task is to find an optimal threshold value against which we can segment out the eyeballs from the rest of the eye and then we need to find its center. But the threshold value will be different for different lighting conditions so we can make an adjustable trackbar for controlling the threshold value. In all fairness, I got this idea from Stepan Filonov who also tried to solve this problem of gaze detection in this article and used Haar cascade along with Blob detection. The thresholding processing steps namely erosion, dilation, and median blur are also taken from him but his final results were not convincing so I made this solution.

def nothing(x):
pass
cv2.namedWindow('image')
cv2.createTrackbar('threshold', 'image', 0, 255, nothing)
threshold = cv2.getTrackbarPos('threshold', 'image')
_, thresh = cv2.threshold(eyes_gray, threshold, 255, cv2.THRESH_BINARY)
thresh = cv2.erode(thresh, None, iterations=2)
thresh = cv2.dilate(thresh, None, iterations=4)
thresh = cv2.medianBlur(thresh, 3)
Displaying thresh with different thresholds controlled by trackbar
Displaying thresh after inverting with trackbar

We have reached the final step of our project. The eyeballs are segmented out and we can utilize cv2.findContours for finding them. Right now our background is white and our eyeballs are in black. However, in OpenCV’s cv2.findContours() method, the object to find should be in white and the background is black. So we need to invert our thresh using cv2.bitwise_not. Now we can find contours. Theoretically, we can say that all we need to do is now find the two largest contours and those should be our eyeballs. However, this leaves out a little room for false positives that can be tackled by finding the midpoint between the eyes and dividing the image by that. Then we find the largest contours in those divisions and should be our eyeballs. The key points 40 and 43 (39 and 42 in Python because index starts from zero) are used to find the midpoint. Find the largest contours on both sides of the midpoint by sorting it with cv2.contourArea. We can utilize cv2.moments to find the centers of the eyeballs.

def contouring(thresh, mid, img, right=False):
cnts, _ = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
cnt = max(cnts, key = cv2.contourArea) # finding contour with #maximum area
M = cv2.moments(cnt)
cx = int(M['m10']/M['m00'])
cy = int(M['m01']/M['m00'])
if right:
cx += mid # Adding value of mid to x coordinate of centre of #right eye to adjust for dividing into two parts
cv2.circle(img, (cx, cy), 4, (0, 0, 255), 2)# drawing over #eyeball with red
mid = (shape[39][0] + shape[42][0]) // 2
contouring(thresh[:, 0:mid], mid, img)
contouring(thresh[:, mid:], mid, img, True)

On running this our code throws several errors like max() arg is an empty sequence or division by zero which are thrown when no contour is found or M['m00'] is zero respectively. To solve this, enclose the contouring function in a try block. We don’t need to do anything if an error is thrown which will occur only when eyes are not detected. The new contouring function will look like:

def contouring(thresh, mid, img, right=False):
cnts, _ = cv2.findContours(thresh, cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_NONE)
try:
cnt = max(cnts, key = cv2.contourArea)
M = cv2.moments(cnt)
cx = int(M['m10']/M['m00'])
cy = int(M['m01']/M['m00'])
if right:
cx += mid
cv2.circle(img, (cx, cy), 4, (0, 0, 255), 2)
except:
pass

Everything’s done. Just display img and thresh and set the threshold trackbar accordingly and enjoy. The complete code:

The complete code for online proctoring using AI can be found here on my Github.

--

--