An exploration of object detection with deep learning
Noah Jaffe
Last week, I completed my last day of the Metis data science bootcamp. Finally, after 12 grueling weeks of lectures, pair programming, quizzes, and projects, we could relax. We spent the festive day chit-chatting on Zoom, playing Among Us, and doing Wikipedia races. Despite the high of making it through the bootcamp, the realization that this would be the last day spent with my cohort made it bittersweet. I was surprised how close I became to my Metis colleagues, and feel as though I’ve made some true friends despite the live online format. After I logged off of Zoom, I took some personal reflection time in the form of a brisk run in the cold winter rain. In addition to the social ramifications of completing the bootcamp, I thought about the academic strides I had made, culminating in my final project. In this blog post, I will give you my process and thoughts about this final Data Science project experience.
The prompt for Project 5 was entirely open-ended. We were free to conduct whatever sort of project we wished with any of the tools we had learned. As I pondered my final project, I thought deeply about my passions and what I had been working on before Metis. In January, I completed a Master’s in marine biology, which culminated in a thesis investigating sea star wasting disease in a species of intertidal sea star. I spent many hours romping around rocky intertidal habitats up and down the West Coast, staring at the small invertebrates in the tidepools and collecting sea star DNA. Exploring these tidepools became one of my favorite pastimes, and I was enthusiastic about incorporating my love of sea creatures into my passion project; (which I wanted to live up to its name)!
Our last curricular material at Metis focused on neural networks, powerful models trained using large amounts of data. Neural Networks tend to be a little bit of a black box to those not intimately familiar with their architecture and functionality. To summarize, these models take in a huge amount of input data and create latent features to predict the target; that is, unseen features of the data that the model ultimately uses to discriminate among specified output classes. Thus the output of the model is a document containing the "weights" of the model: the connections from one "layer" of discriminatory features to the next. If this seems abstract, it is, but the important takeaway is that a neural network consists of layers that each look through the dataset to iteratively narrow down the features within and ultimately separate output classes from one another. Naturally, many of us in the bootcamp wanted to utilize this exciting new tool, and as I was scanning data science blogs looking for neural-network related inspiration, I came across one the most exciting and cutting-edge applications of this technology: image detection. Suddenly, a lightbulb went off in my head: What if I could use image-detecting neural networks to classify the intertidal invertebrates I spent so many hours staring at during grad school? And so that is what I have done! I stayed at a high-level for this project, training my model to classify large groups of organisms, rather than individual species, due to the time and complexity constraints of this project. Nevertheless, I built a functioning image classifier for these groups that I view as a proof of concept for the application of neural networks to the task of classifying small, intertidal invertebrates. Let’s jump into my workflow!
Image detection is a cutting edge application of neural networks (NN). As such, there are an increasing number of models and tools geared towards this task, and knowing which approach to use could be an article in and of itself. Rather than reinventing the wheel and training a NN from scratch, an incredibly resource intensive process, I looked to the many articles and tutorials online to find a pre-trained model that I could "teach" to recognize my sea creatures. At a high level, my goal was to gather training images to feed into the model so that I could build on the already-trained convolutional and pooling layers and simply retrain the "top" of the model to recognize my desired classes. I ended up choosing the YOLO v3 convolutional neural network, which was shown to me by one of my Metis colleagues and which seemed perfectly suited to my task. I closely followed this tutorial here, as it included a great step-by-step walkthrough of the process and an introduction to utilizing the GPU functionality of Google Colab, greatly reducing the time required to train this large model. Rather than go through my entire code, I will summarize the process and tools I used and refer you to the tutorial for the specifics.
The first step was to gather my training data, the images I would show my model. I needed a few hundred images for each of my classes, which were: anemones, barnacles, bivalves, crabs, nudibranchs, and sea stars. I chose these classes as they are frequent denizens of the tidepools of the West Coast of North America, where I live. To get my training images, I took advantage of a handy Google Chrome plugin called Download All Images which, as the name suggests, allows the user to download all images from a particular page – in my case a Google search of the animal group. After I had my images downloaded to my computer, I needed to label them. These labels are fed into the model along with the images so that the model knows what it is looking at during training. To label my images, I used LabelImg, a tool available on Github. I now had my images files and their corresponding labels on my computer, so the next stage was to hop on to Google Colab and connect to their GPU so I could start my training.
To train my model, I uploaded my files to Google drive and mounted my drive to my Colab notebook so I could reference it. I utilized the Darknet NN framework (see this repo), which is written in C and allows us to work with our model in a notebook. After constructing my necessary config, object and training text files, and moving them to all the right places (the tutorial lays this out very clearly), I was ready to begin training. As is common in convolutional neural network training, I chose a batch size of 64 and a maximum number of batches (iterations) of 12,000, for a total of approximately 700 epochs (here is a reference if you’re confused as to the difference). Then it was just a matter of training! Even with my trusty GPU in hand (or, I guess, in cloud), this process took awhile – approximately three days. One challenge to be aware of when utilizing Colab for a long period of time is that Google will sometimes kick you off and your processes will be interrupted. Fortunately, the tutorial has a couple of workarounds for this, most importantly code to save your model output every 100 iterations.
After my model was trained, I needed to test it! As with many tasks in Python, there are multiple ways to go about this, and the tutorial has a testing section. However, my final deliverable for this project was going to be a Streamlit app, and the code in the tutorial, which used bash commands, was harder for me to work with in my app. So instead, I tested my model using another useful module called OpenCV, roughly following this tutorial (Google "openCV image detection for more useful tutorials!) As described above, the output to my neural network model was a document of the model weights: connections from one discriminatory layer to the next. (These weights documents can be massive – mine was approximately 250 MB.) After defining some testing functions to test my model (see the "Testing_notebook.ipynb" notebook in the repo), I was ready to see if my model could classify images of my classes!
Success! I had made a working classifier. I am including only a single image here for copyright reasons (this is a picture I took during grad school), but check out some other examples of classifications in the "Project_5_final_pres.pdf" file in the repo.
Of course, my classifier did not work perfectly, and some classes were more difficult than others; crabs were especially difficult because they can be in many different positions, as opposed to my other classes which are much more sessile. All in all, though, this classifier represents a successful first step in constructing a tool to identify intertidal invertebrates, which was my goal. More training would no doubt improve the model, and I would also consider tinkering with the hyperparameters of my NN, a process I avoided for this project.
I hope this blog post was informative and entertaining. I recommend you try out my app and see for yourself how the model works! Simply drag and drop an image of an intertidal invertebrate belonging to one of the classes listed about and click "Identify"! Or, check out the Github repo here! And check out the other projects on my Github as well!
Thanks so much for reading.