
Drowsy Driving: A Serious Problem
The National Highway Traffic Safety Administration estimates that there are 91,000 crashes involving drowsy drivers a year that lead to an estimated 50,000 injuries and nearly 800 deaths. Additionally, 1 in 24 adult drivers report having fallen asleep at the wheel in the past 30 days. Research has even found that going more than 20 hours without sleep is the equivalent of having a blood-alcohol concentration of 0.08% – the U.S. legal limit.
Because of this serious problem, A group of other data scientists and I set out to develop a neural network that can detect if eyes are closed, and when applied in tandem with computer vision, to detect if a live human has had their eyes closed for more than a second. This sort of technology is useful for anybody interested in increased driving safety, including commercial & everyday drivers, car companies, and vehicle-insurance companies.
Table of Contents:
Building a Convolutional Neural Network
Data Collection
We used full facial data from a number of sources, namely open eye face data from the UMass Amherst and closed eye face data from Nanjing University.
We then used a simple python function to crop out the eyes from this dataset, leaving us with just over 30,000 cropped eye images. We added a buffer to each image crop to not only get the eye but the area around the eye as well. This cropping function will be repurposed later for the webcam section.
A Warning: manually scrub this dataset unless you want people blinking or 200 photos of Bill Clinton to train your model. Here’s a sample of the data we used to train the model:

Creating a Convolutional Neural Network
Decide on a Metric
Because predicting the positive class (a sleeping driver) is more important to us than predicting the negative class (an awake driver), our most important metric will be recall (sensitivity). The higher the recall, the smaller amount of sleeping drivers the model mistakenly predicts are awake (false negatives).
The only problem here is that our positive class is significantly outnumbered by our negative class. Because of this, it’s better to use F1 score or Precision-Recall AUC score, because they also take into account the amount of times we guess a driver is asleep, but is really awake (precision). Otherwise our model will always predict we’re asleep and be unusable. Another method that we didn’t use for dealing with imbalanced image data is to use image augmentation. I didn’t utilize that here, but Jason Brownlee does a great job explaining how you can here.
Prepare Image Data
The next step is to import the images and preprocess them for the model.
Imports needed for this section:
import cv2
import tensorflow as tf
from tensorflow import keras
from sklearn.model_selection import train_test_split
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from keras.models import Sequential
from keras.layers import Dense,Flatten,Conv2D,MaxPooling2D,Dropout
Import the images we created earlier and resize the images so they all match. For this project, we resized to 80×80 pixels. This is a simple import function using the OS library:
Set up variables with the independent X being the images, and dependent y being the corresponding labels (1 for closed eye, 0 for open eye):
X = []
y = []
for features, label in eyes:
X.append(features)
y.append(label)
Convert the image to an array so it can enter the model. Also, scale the data by dividing by 255.
X = np.array(X).reshape(-1, 80, 80, 3)
y = np.array(y)
X = X/255.0
Split the data into a training set and a validation set using scikit learn’s train_test_split
. Important: make **** sure to stratify y because we have imbalanced classes.
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify = y)
Creating a Model Architecture

Convolutional Layers:
This layer creates subsets of pixels rather than the full image and allows for faster models. Depending on the number of filters you set, this could be more or less dense than the original images, but they will enable the model to learn more complex relationships using less resources. I used 32 filters. Use at least one convolutional layer, and usually you’ll want two or more. The optimal setup for me was two 3×3’s pooled together followed by three 3×3’s pooled together. The general trend in CNN’s is to use a smaller filter size. In fact, a double 3×3 layer is essentially the same as a 5×5 layer but is faster and often results in better scores, as explained here in this brilliant article by Arnault Chazareix. Pooling after is not always necessary or better. Try your model with and without if possible.
Flatten
Make sure to flatten the image array so it can enter the dense layers.
Dense Layers
The more dense layers there are, the longer your model will take to train. As the number of neurons in these layers increases, the complexity of the relationships learned by the network will increase. Generally the idea of convolutional layers is to avoid having to make an overly deep dense layer scheme. For our model we used three layers with relu activation at a decreasing rate of neurons (256, 128, 64). We also used a 30% dropout after each layer.
Output Layer
Finally, because this a binary classification problem, make sure to use the sigmoid activation for your outer layer.
Compile Model
In model.compile()
, you’ll want to set the metric to PR AUC (tf.keras.metrics.AUC (curve = 'PR')
in tensorflow) or recall (tf.keras.metrics.recall
in tensorflow). Set loss equal to binary_crossentropy
because this a binary classification model and a good optimizer is generally adam
.
Fitting the Model
Set your batch size generally as high as possible, but don’t blow up your machine in the process! I ran a gridsearch on Google Colab’s 32 GB TPU and it ran 1000+ batches with ease. When in doubt, try 32 batches and increase if it doesn’t overload your memory. In terms of epochs, there were diminishing returns after 20 epochs, so I wouldn’t go much higher than that for this specific CNN.
Here’s the full setup with Tensorflow Keras:
Final Precision-Recall Area Under the Curve Score:
0.981033
Let me know if you can better this!
Creating the Webcam App
Once you have a model you’re happy with, save it using model.save('yourmodelname.h5')
. Make sure to run the production model without the validation data when saving it. This will cause problems down the road when importing it.
Installations and Imports:
These are Mac optimized, although it’s possible to use this same script on Windows as well. Check out this link here for troubleshooting windows dlib.
# installations needed for webcam application
# pip install opencv-python #
# if you want to play a sound for the alert:
# pip install -U PyObjC
# pip install playsound
# imports for webcam application
import cv2
from playsound import playsound
# import model saved above
eye_model = keras.models.load_model('best_model.h5')
Using OpenCV to Access the Webcam
Use cv2.VideoCapture(0)
to start the webcam capture. If you’d like to make text locations based on relative frame size rather than absolute coordinates make sure to save the width and height of the webcam using cap.get(cv2.CAP_PROP_FRAME_WIDTH)
. You can also view frames per second. The full list of capture properties for OpenCV can be found here.
cap = cv2.VideoCapture(0)
w = cap.get(cv2.CAP_PROP_FRAME_WIDTH)
h = cap.get(cv2.CAP_PROP_FRAME_HEIGHT)
print(cap.get(cv2.CAP_PROP_FPS))
if not cap.isOpened():
raise IOError('Cannot open webcam')
Capturing the Frames with OpenCV and Cropping Them
Make sure to set a counter if you plan on counting eyes closed by frame. A while True:
loop will keep the camera on until you’re finished with the script. In that while loop, use the ret, frame = cap.read()
format to capture the frame of the webcam video. Lastly, call the function on the frame. It should return a cropped eye from the frame, if it can’t find an eye in the frame, the function will return None
which can’t be divided by 255 and will skip to the next frame.
counter = 0
# create a while loop that runs while webcam is in use
while True:
# capture frames being outputted by webcam
ret, frame = cap.read()
# function called on the frame
image_for_prediction = eye_cropper(frame)
try:
image_for_prediction = image_for_prediction/255.0
except:
continue
Running the Frame through the Model
We can then run the image through the model and get a prediction. If the prediction is closer to 0, then we display "Open" on the screen. Otherwise (i.e. it’s closer to 1), we display "Closed". Notice that the counter is reset to 0 if the model detects open eyes and counter is increased by 1 if the eyes are closed. We can display some basic text to indicate whether the eyes are closed or open using cv2.putText()
.
prediction = eye_model.predict(image_for_prediction)
if prediction < 0.5:
counter = 0
status = 'Open'
cv2.putText(frame, status, (round(w/2)-80,70),
cv2.FONT_HERSHEY_SIMPLEX, 2, (0,255,0), 2, cv2.LINE_4)
else:
counter = counter + 1
status = 'Closed'
cv2.putText(frame, status, (round(w/2)-104,70), cv2.FONT_HERSHEY_SIMPLEX, 2, (0,0,255), 2, cv2.LINE_4)
We also want to display an alert if there’s 6 frames in a row with closed eyes ("sleeping"). This can be done using a simple if statement:
if counter > 5:
cv2.putText(frame, 'DRIVER SLEEPING', (round(w/2)-136,round(h) - 146), cv2.FONT_HERSHEY_SIMPLEX, 1, (0,0,255), 2, cv2.LINE_4)
counter = 5

And lastly, we need to display the frame and provide an exit key for the while loop. The cv2.waitKey(1)
determines how long the frame will be shown. The number in parentheses is the amount of milliseconds the frame will be displayed for unless the "k" key is pressed which in this case is 27, or the escape key:
cv2.imshow('Drowsiness Detection', frame)
k = cv2.waitKey(1)
if k == 27:
break
Outside of the loop, release the webcam and close the application:
cap.release()
cv2.destroyAllWindows()
The Final Product
With some stylistic additions, here’s the final product. You can also include sounds which I’ve included in the full script below. I used the "Wake Up" lyric from System of a Down’s "Chop Suey" (If you know, you know).

As you can see, the model is incredibly effective and despite a long training time, returns predictions in a matter of milliseconds. With some further improvements and export to an external machine, this program could easily be applied in practical situations and perhaps save lives.
Thanks for reading. Feel free to contact me on LinkedIn if you have any questions or improvements.