Hands on Machine Learning demo:

Real time object detection with YOLO V2

Özgür Genç

Published in

Towards Data Science

12 min readJan 4, 2019

Intro

This piece of blog is written to share my experience with beginners for a specific Machine Learning object detection case.

Deep learning is an advanced sub-field of Artificial Intelligence (AI) and Machine Learning (ML) that stayed as a scholarly field for a long time. With the abundance of data and exponential increase of computing power, we have been seeing a proliferation of applied deep learning business cases across disciplines. Also a lot of smart people choose to study AI/ML and many large high tech companies (leading cloud ML platforms include AWS SageMaker, Microsoft Azure AI, Google Cloud Platform ML & TensorFlow etc) & start ups heavily invest in this most thriving domain of our time.

There is already an abundance of public online trainings and resources available for every AI/ML enthusiast as long as you have an internet connection in any corner of the world. Thus no excuses to stay behind… (If you are a really serious ML enthusiast, I think the best hard core one to start with is Andrew Ng ’s Coursera specializations.)

See my simple DIY demo videos to see the outcome of this piece after a few hours effort: Video 1 and Video 2.

Convolutional Neural networks (CNN)

Convolutional neural networks (CNN)are deep artificial neural networks that are used primarily to classify images (i.e. label what they see), cluster them by similarity (i.e. photo search), and perform object recognition within scenes. They are algorithms that can identify faces, individuals, street signs, cars, animals, anomalies, tumors and many other aspects of visual data.

Convolution layers are used to extract the features from input training samples. Each convolution layer has a set of filters that helps in feature extraction. In general, as the depth of CNN model increases, complexity of features learnt by convolution layers increases. You can learn more about CNN at here, here or here. Andrej Karpathy has written a great write up at this link for his earlier Stanford CNN course if you would like to academically go deeper.

CNN is such a fascinating and disruptive domain that opened up possibilities for face recognition, self driving cars, optical character recognition, automated diagnosis of diseases, image-to-text conversions, neural art etc…

There are many further innovation opportunities out there from maybe helping millions of people with visual impairment to further advances in preventive medical diagnosis, drug discoveries, video games, omni-channel retailers’ shelves/products recognition… Sky is the limit :) These frameworks started working on the edges as well. Edges could be iPhones, Android gadgets, Amazon DeepLens etc. Do you get my drift?

Real time object detection with YOLO

You Only Look Once (YOLO) — What a cool name, huh? This object detection algorithm is currently the state of art and is outperforming CNN and it’s variants mentioned above. Maybe the founder was inspired by human eye/brain as YOLO is looking at the whole image during the test time so its predictions are informed by the global context in the image. It also makes predictions with a single network evaluation unlike systems like R-CNN which require thousands for a single image. YOLO V2 and V3 can detect a wide variety of object classes in real-time. The latest YOLO V3 is even more than 1000 x faster than R-CNN and 100 x faster than Fast R-CNN (Reference).

You can feed it with any major image/video types or real time video feed from a webcam. YOLO is a convolutional network as well but it behaves in an ingenious way. (Applauds to the original YOLO V2 article here)

YOLO has an efficient approach as it first predicts which parts of the image contains required information and then runs the (CNN) classifiers on these parts only. Simply put, YOLO divides up the images into a grid of 13 by 13 cells which are further divided into 5 “bounding boxes”. A bounding box is a rectangle that encloses an object. For each bounding box, it parallel runs a recognition algorithm to identify which image class do they belong to. YOLO eventually outputs a confidence (probability) score that tells us how certain it is that the predicted bounding boxes actually encloses an object (image class). Each input image is very fast taken thru this special CNN and results with a matrix of (13 x 13 x 125) where each tensor carries unique information like x,y parameters, width/length of bounding box rectangle, confidence score and probability dist over the trained classes of images.

Our provided parameters ignore the low scored ones and pick the highest probability specific object that comes from our trained image class library. Thus we end up seeing very few bounding boxes with dogs, people, flowers, cars etc. Thus if I may oversimplify, the YOLO process visually looks like below in a flashing speed for each image grid. The right most picture shows the identified image & label combinations with the highest box confidence scores.

The input image is divided into an 7 x 7 grid. Then bounding boxes are predicted and a class is predicted among classification over the most confident ones. Source: J. Redmon and al. (2016)

YOLO V2 is trained with the COCO (Common objects in Context) library that has 100k images of 80 common classes plus it is complemented with a subset of ImageNet. It has 80 image classes. As a reference ImageNet has 14 million images and 22k classes. You can leverage other data sets or leverage Amazon mechanical Turk like new crowd sourcing services for manual labeling work and come up with your own data set!!

All code and environment is open source thus anyone can clone and modify the work. I also made a small Python code modification to be able to use an external webcam for my real time recording convenience while making my test YouTube videos (so as not to carry my laptop around). You can do any editing as long as you know some Python and understand the theory to some degree…

Most of the examples on the web are usually based on Linux or Mac though I had to make this demo on my Lenovo Windows 10 machine with Intel core7. Thus all below personal experience & guidance is fit for use on a Windows environment.

Original YOLO authors created DarkNet in an OpenSource library using C and CUDA. I chose to use Darkflow instead which is basically a translation of DarkNet to the convenient Tensorflow version. Thus we will download and use the DarkFlow version for this demo.

OK let’s get started… Dependencies

You need to have Python 3.5 or 3.6, Tensorflow, numPY, openCV on your (laptop) environment to get started. Below is some guidance that has worked for me on Windows.

Step 1 — Install the dependencies for Windows

For beginners you can install the following to have a clean slate of ready-to-go personal computing environment for your future ML experiments.

Download & install Anaconda package 64-bit version and choose the Python 3.6 version. (link to video tutorial) This automatically installs Python and many popular data scientist/ML libraries (NumPy, Scikit-Learn, Pandas, R, Matplotlib…), tools (Jupyter Notebook, RStudio) and hundreds of other open source packages for your future projects. When you start out, it feels like the closest thing to the holy grail of ML packages… For example I still use the Anaconda Jupyter Notebook for almost all my ML experiments mostly out of convenience. openCV library is not included though and we will install it separately as it is needed for real-time computer vision tasks. (Hint for Anaconda folks!)
Install Tensorflow and Keras (optional). TensorFlow is the most popular AI software library and is created/maintained by Google. Keras is another highly popular & high-level neural networks API, written in Python and capable of running on top of TensorFlow. It was developed with a focus on enabling fast experimentation. When you feel like “quitting” after going thru Andrew Ng ’s like really low level material, Keras feels like a piece of cake! Because it is a high level language based on Python. Someone else has done the hard work for you!

It is common to have issues while trying to install all these open source packages especially if you are on a Windows machine. It takes a while to make everything work and resolve all the version or conflict issues. My best practice is to basically Google such issues and find solutions online. Websites like stackoverflow are super helpful and save your time & sanity!

In general, I also find helpful to create a separate new conda virtual environment to mitigate the Windows installation issues. More on that here.

Step 2 — Install the DarkNet/YOLA, Darkflow stuff

DarkNet: Originally, YOLO algorithm is implemented in DarkNet framework by Joseph Redmon. Darknet is an open source custom neural network framework written in C and CUDA. It is fast, easy to install, and supports both CPU and GPU computations. You can find the open source on GitHub.

Darkflow: It is a nickname of an implementation of YOLO on TensorFlow. Thanks to Trinh Hoang Trieu, Darknet models are converted to Tensorflow and can be installed on both Linux and Windows environments. Lets do it!

# Open your anaconda prompt and clone the darkflow github repository. (You may need to install Git Bash Windows for git command to work)

git clone https://github.com/thtrieu/darkflow

# Alternative is to basically go to the DarkFlow GitHub page and download the master repository to your local (i.e. C:\users\user_name\darkflow)

# If you have not already created a new virtual environment in Step 1, then create a conda environment for darkflow installation.

conda create -n your_env_name python=3.6

# Activate the new environment using anaconda prompt.

activate your_env_name

# You can install the needed OpenCV with a conda-forge repository. conda-forge is a github organization containing repositories of conda libraries.

conda config --add channels conda-forge

conda install opencv

# Build the Cython extensions in place. This is a widely used Python to C compiler and wrapper that helps us to call the DarkNet C-code from Python.

python setup.py build_ext --inplace

or try the following as alternative

pip install -e .

If you get an error, try changing the working directory to darkflow (cd darkflow) first and re-run one of the above commands.

Cool. The above steps will hopefully setup a local environment to run darkflow and perform object detection task on images or videos.

Lastly we need to download the CFG and WEIGHTS files. The pre-trained model name is YOLOv2 that is trained on a COCO image data set containing 80 classes (image types like car, dog, person, aeroplane etc).

WEIGHTS file: Please download the yolov2.weights file from here. Pls create a darkflow/bin directory for keeping these weights file.

CFG file: Create a yolo.cfg text file of corresponding model in the existing darkflow/cfg directory under your local darkflow folder. Check here for the source file. You can copy paste the raw GitHub content with a notepad if you want. Also do not forget to have a look at Darkflow's command line help options for future reference.
python flow --h

PS: I found this blog (Abhijeet Kumar) very helpful while i was figuring out the needed installations.

We are all set.

Lets run Darkflow YOLO command line to render some video!

I fancy using the Anaconda command prompt to execute the following commands. You can find it from the Windows Start menu via searching for “Anaconda Prompt”. In the prompt window, activate your new Tensorflow virtual environment via “activate your_environ_name” command. Then execute the “cd darkflow” command to change the current working directory to your local Darkflow repository. Then you can try the following commands to start running DarkFlow to process images & videos.

For processing existing images, you can run the following command:
python flow --model cfg/yolo.cfg --load bin/yolov2.weights --imgdir sample_img

Pls note that darkflow/sample_img is a directory with sample photos.

2. For processing a video file, you can move the to-be-rendered video file under master darkflow folder and then use the following command:

python flow --model cfg/yolo.cfg --load bin/yolov2.weights --demo samplename.mp4

Hint: If you append “ — saveVideo” at the end, you can save the processed video under master folder as well.

3. For rendering a real time streaming video via your laptop camera:

python flow --model cfg/yolo.cfg --load bin/yolov2.weights --demo camera

More examples are available here or here.

My basic examples

Because Medium does not allow direct streaming, I uploaded 2 example videos at YouTube that I recorded with an external webcam.

Video 1 with Google search
Video 2 in our living room :)

Use a personal device vs. a cloud ML service?

Both. If you are a beginner, I suggest you first set up a personal private laptop with the needed Anaconda, Python and Keras/Tensorflow and all other relevant popular packages. This is key to learn, experiment with MVPs and fail/pivot fast in your ML journey. You can also easily set up an AWS SageMaker account/environment that does not require any of these individual installations for TerasFlow/Keras. I will not go into details but check this link for more info. You can easily try MS Azure or Google Cloud as well. As you get more advanced, Cloud will suit better helping you with bigger data space (i.e. AWS S3 buckets), computing power (i.e. more CPUs or GPUs) and fast deployment/scale. Thus they are not competing but complementing.

Conclusion

There is a lot of buzz around AI and ML. I tried to showcase that anyone can create real products with available open source frameworks and libraries to get started at your own risk.

For example you can further create and train your own classes (images or image types) and customize the trainings for your unique needs. Both YOLO and Darkflow are open source and you can clone & modify them. YOLO v3 is also available and this visual recognition domain will continue exploding. I am looking forward playing with it.

I am by no means an expert on ML or CNN but I hope this blog can serve as an inspiration for the AI/ML enthusiasts for your future DIY projects. Comments and feedbacks are always welcome. Do not stay behind!

Credits

All credits go to below people. All I did was to use their excellent work.

The original paper on YOLO. Following paper on YOLOv2 and YOLO9000. If you don’t fancy reading academic papers, you can check the presentation summary of the founding fathers at this link.
Then DarkNet (Joseph Redmon) created below libraries that inspired many to implement YOLO. He gave 2 TED talks about it. The first one talking mostly about the technology and the second being more on the moral implications.

YOLO: Real-Time Object Detection

You only look once (YOLO) is a state-of-the-art, real-time object detection system.

pjreddie.com

3. Then Mr. Trieu has translated Darknet-YOLO v2 to Tensorflow framework.

thtrieu/darkflow

Translate darknet to tensorflow. Load trained weights, retrain/fine-tune using tensorflow, export constant graph def to…

github.com

Updates

Error Handling: If you receive an error like “… AssertionError: labels.txt and cfg/yolov2.cfg indicate inconsistent class numbers …”
then check this solution out. You can only use the following reserved CFG words (weights name should be OK as long as your command prompt indicates the downloaded weights file name under new “bin” folder). To get started, you can download yolov2 from here but rename it as yolo.cfg as it is the one recognized by the code as per above. You also have the option to modify the open source code.
If you would like use an external webcam, here are some simple tips to adjust the open source code. God bless Open Source software!

How to use an external webcam for real time streaming:
You have to make a small modification with the open source Darkflow code that you cloned & downloaded. First you will need a webcam connected to the computer that OpenCV can connect to or it won’t work. If you have multiple webcams connected and want to select which one to use you can pass the “file = 1” to pick (OpenCV uses existing webcam as 0 by default). Please open the following file and make the changes with Jupyter or any other Python editor: http://localhost:8888/edit/darkflow/darkflow/net/help.py

Alternatively you can copy my forked help.py code at below link.

Sailor74/darkflow

Translate darknet to tensorflow. Load trained weights, retrain/fine-tune using tensorflow, export constant graph def to…

github.com

Then you can easily use the Anaconda prompt command window to stream with the external web camera.

# run darkflow with external webcam on windows
python flow --model cfg/yolo.cfg --load bin/yolov2.weights --demo webcamera

You can use Anaconda prompt command for streaming with a webcam and saving the video under the darkflow root folder:

# run darkflow with ext webcam and save to local folder

python flow --model cfg/yolo.cfg --load bin/yolov2.weights --demo webcamera --saveVideo

THE END

Hands on Machine Learning demo:

Real time object detection with YOLO V2

Intro

Convolutional Neural networks (CNN)

Real time object detection with YOLO

OK let’s get started… Dependencies

Step 1 — Install the dependencies for Windows

Step 2 — Install the DarkNet/YOLA, Darkflow stuff

Lets run Darkflow YOLO command line to render some video!

My basic examples

Use a personal device vs. a cloud ML service?

Conclusion

Credits

YOLO: Real-Time Object Detection

You only look once (YOLO) is a state-of-the-art, real-time object detection system.

thtrieu/darkflow

Translate darknet to tensorflow. Load trained weights, retrain/fine-tune using tensorflow, export constant graph def to…

Further reading and watching…

Updates

Sailor74/darkflow

Translate darknet to tensorflow. Load trained weights, retrain/fine-tune using tensorflow, export constant graph def to…

Written by Özgür Genç