Phytoplankton Species Image Classification using Neural Networks

Identifying Phytoplankton Species in Microscopic Ocean Images to Learn about Underwater Ecosystems

Mihir Garimella
Towards Data Science

--

Photo by CDC on Unsplash

The oceans of Earth are some of the most interesting and biologically active parts of the environment. Unfortunately, conditions such as extreme pressure and temperatures in such areas makes it difficult to gather information about these ecosystems. This setback has made it challenging to analyze data and produce findings on the biology and aquatic life in oceans.

Recent research however has shown that the presence of bacteria in waters, specifically the presence of single celled organisms known as phytoplankton, can help us learn about marine activity and the ecosystems growing in parts of the ocean. As plankton are at the bottom of the aquatic food chain, different species of these bacteria can have profound impacts on underwater systems and may allow us to investigate areas of the ocean unreachable by humans.

Image of Chaetoceros debilis bacteria from Dataset

The data being used in this project is a collection of images of over 100 different species of phytoplankton, collected by the Woods Hole Oceanographic Institute. This data is publicly available in conjunction with live real time images collected by Martha’s Vineyard Coastal Observatory. In total, there are a reported 3.5 million raw images available for data analysis, with the most recent classified data from 2014. In this model, the dataset “2014 labeled IFCB images” was used for classification. The data is available for commercial use under the MIT License which can be found in more detail here.

To start, we use Open-CV in conjunction with OS commands to analyze image pixel data for all of the data, allowing us to encode all of the images in a format that a CNN model can understand. We can do this by manipulating the Open-CV library feature of changing the bounds or cropping images and using the grayscale image format to get a single numerical value to represent the level of light from each pixel of the 28x28 images in the data.

After exporting the NumPy arrays of both the actual bacteria classification of the images and the pixel data values, we can use Keras to develop a sequential CNN model that can produce a model based on the grayscale values of each image. The CNN model works by chunking the 28x28 pixel images into smaller portions, for example 16 smaller 7x7 pixel images, and uses variable weights and biases to evaluate how well the chunked images represent each of the bacteria groups. Then, the structure of the model weights different areas of the image and repeats the process of chunking each image and evaluating the results, until the original image can be related to a single bacteria species.

Through this process, the model can evaluate how similar a given image is to a species of bacteria. The creation of such a model can help us learn about unreachable parts of the ocean, automating the process of classifying species of phytoplankton. By finding the species of the foundation of the marine food chain, we can extrapolate characteristics and different biological traits possessed by the bacteria into findings on how other aquatic creatures behave and have evolved in the deep oceans.

Exploring the vast reaches of Earth and its wonders helps us to learn and research about natural phenomena and allows us to advance society through scientific research. The ocean and other bodies of water are a key topic in nature and learning about the events and ecosystems in them can lead to discoveries that may open up room for innovation and improvement in everyday life. The model developed in this project can help us explore the ocean through the classification of the bacteria found in the ecosystem food chains, potentially making it possible to explore more about parts of the ocean.

Photo by Yannis Papanastasopoulos on Unsplash

This program allows for the classification of species of bacteria through Neural Network Modeling. The code for this project can be found on my GitHub profile, linked below:

mg343/Phytoplankton-Detection (github.com)

--

--