Oil Storage Tank’s Volume Occupancy On Satellite Imagery Using YoloV3

Recognition of Oil Storage Tanks in satellite images using the Yolov3 object detection model from scratch using Tensorflow 2.x and calculation of the volume occupied by the Floating Head Tanks with the help of shadow made by them.

Md. Mubasir
Towards Data Science

--

Before 1957, our planet Earth only had a natural satellite: the moon. On October 4, 1957, the Soviet Union launched the world’s first artificial satellite. Since then, about 8,900 satellites from more than 40 countries have been launched.

Photo by NASA on Unsplash

These satellites help us in monitoring, communication, navigation, and much more. The nations also use the satellite to keep an eye on another nation’s land and its movements, to estimate their economy and power. However, all the nations hide their information from one another.

Likewise, the global oil market is not entirely transparent. Almost all oil-producing nations make an effort to hide their total production, consumption, and storage. Nations do this to indirectly conceal their actual economy from outside and empower their defense system. This practice might lead to a threat to other nations.

For this reason, many startups companies like Planet and Orbital Insight came out to keep eyes on such kind of activities of the nations by satellite imagery. Thye collects satellite imagery of oil storage tanks and estimates reserve volumes.

But the question is, how can one estimate the volume of a tank by just a satellite image? Well, this will only be possible when oil is stored in the floating roof/head tank. This particular type of tank is specially designed to store large quantities of petroleum products such as crude oil or condensate. It consists of the top head that sits directly on the top of the oil, which rises or falls with the volume of oil in the tank and makes two shadows around it. As you can see the below image, the shadow on the north side

source

(exterior shadow) of the tank refers to the total height of the tank while the shadow within the tank (interior shadow) shows the depth of the floating head/roof(i.e how much empty tank is). And the volume will be estimated as 1-(Area of Interior Shadow/Area of Exterior Shadow).

In this blog, we are going to implement the complete model to estimate the occupied volumes of tanks with the help of satellite images in python language using Tensorflow2.x framework, from scratch.

GitHub Repo

Everything on this article and the entire code is available in this github repository

Here is the list of contents that are followed in this blog. We would explore each one by one.

Table of Contents

  1. Problem Statement, Dataset, and Evaluation metric
  2. Existing Approaches
  3. Related Research Works
  4. Useful Blogs and Research Papers
  5. Our Contribution
  6. Exploratory Data Analysis (EDA)
  7. First Cut Approach
  8. Data Preprocessing, Augmentation and TFRecords
  9. Object Detection using YoloV3
  10. Reserved Volume Estimation
  11. Result
  12. Conclusion
  13. Future Work
  14. References

1. Problem Statement, Dataset, and Evaluation metric.

Problem Statement:

Detection of the floating head tank and estimation of the reserved/occupied volume of oil present in it. Followed by reassembling image patches into the full-size image with volume estimations added.

Dataset:

dataset link: https://www.kaggle.com/towardsentropy/oil-storage-tanks?

The dataset contains a bounding box annotated, satellite images were taken from Google Earth of the tank containing industrial areas around the world. There are 2 folders and 3 files in the datasets. Let’s see each of them one by one.

  • large_images: This is a folder/directory that contains 100 satellite raw images of size 4800x4800 each. All the images are named in id_large.jpg format.
  • Image_patches: The image_patches directory contains 512x512 patches generated from the large image. Each large image is split into 100, 512x512 patches with an overlap of 37 pixels between patches on both axes. Image patches are named following an id_row_column.jpg format
  • labels.json: It contains labels for all images. Labels are stored as a list of dictionaries, one for each image. Images that do not contain any floating head tanks are given a label of ‘skip’. Bounding box labels are in the format of (x, y) coordinate pairs of the four corners of the bounding box.
  • labels_coco.json: It contains the same labels as the previous file, converted into COCO label format. Here bounding boxes are formatted as [x_min, y_min, width, height].
  • large_image_data.csv: It contains metadata about the large image files, including coordinates of the center of each image and the altitude.

Evaluation Metric:

For the detection of tanks, we will use the Average Precision(AP) for each type of tank and mAP(Mean Average Precision) for all kinds of tanks. There is no metric for the estimated volume of the floating head tank.

the mAP is the standard evaluation metric for the object detection model. Detail explanation of mAP can be found in the following youtube playlist

source

2. Existing Approaches

Karl Keyer [1] in his repository used RetinaNet for the tank Detection task. He made the model from scratch and applied generated anchor boxes form this dataset. This led to a score of 76.3% Average Precision(AP) of the Floating Head tank. Then he applied shadow enhancement and pixel thresholding to calculate its volume.

As per my knowledge, this is the only approach available on the internet.

3. Related Research Works

Estimating the Volume of Oil Tanks Based on High-Resolution Remote Sensing Images [2]:

It proposed the solution to estimate the capacity/volume of an oil tank based on satellite imagery. To calculate the total volume of a tank they required height and radius of the tank. To calculate the height they used the geometrical relationship with the length of shadow projected by it. But calculating the length of shadow was not easy. To highlight the shadow the used HSV(i.e Hue Saturation Value) color space because usually, a shadow has high saturation and increased hue in HSV color space. Then a median method based on sub-pixel subdivision positioning is used to calculate the shadow length of it. Finally, the got radius of the oil tank by the Hough transform algorithm.

In the related work of this paper, the mentioned solution to calculate the height of the building based on the satellite images.

4. Useful Blogs and Research Papers

A Beginner’s Guide To Calculating Oil Storage Tank Occupancy With Help Of Satellite Imagery [3]:

This blog is written by TankerTracker.com itself. It’s one of the services is to track the storage of crude oil in several geographical and geopolitical points of interest using satellite imagery. In this blog, they described in detail how the exterior and interior shadow made by the tanks would help us in estimating the volume of oil present in it. Also compared the images taken by the satellite at a particular time and a month later and showed the changes in oil storage tanks over a month. This blog gave us an intuitive knowledge that how the estimation of the volume is being done.

A Gentle Introduction to Object Recognition With Deep Learning [4] :

This article has covered the most confusing concept that arises in the mind of a beginner to object detection. First, describe the differences between, object classification, object localization, object recognition, and object detection. Then discussed some major state-of-the-art deep learning algorithms to unfold object recognition tasks.

Object classification refers to the assignment of a label to an image that contains a single object. Whereas object localization means drawing a bounding box around one or more objects in an image. Object detection task combines both object classification and localization. That means it is a more challenging/complex task which first draws a bounding box around the object of interest (OI) by localization technique the with the help of classification assigns a label of each OI. Object recognition is nothing but a collection of all the above tasks (i.e classification, localization, and detection).

source

Lastly, two major families of object detection algorithms /models have been talked about that are Region-Based Convolutional Neural Networks(R-CNN) and You Only Look Once(YOLO).

Selective Search for Object Recognition [5]:

In the object detection task, the most crucial part is object localization because object classification comes after this. The classification depends on the region of interest proposed by localization(in short region proposal). More perfect localization will lead to more perfect object detection. Selective Search is one of the start-of-the-art algorithms that is being used for object
localization in some object recognition models like R-CNN and Fast R-CNN.

This algorithm first generates a sub-segment of an input image using Efficient Graph-Based Image Segmentation then combines the smaller similar regions into larger ones using a greedy algorithm. The segment similarity is based on four properties that are color, texture, size, and fill.

source

Region Proposal Network — A detailed view [6]:

RPN(Region proposal Network) is being used widely for object localization because it is faster than traditional algorithm selective search. It learns the best location of the object of interest from the feature map as CNN learns classification from the feature map. It is responsible for three major tasks, firstly generating anchor box(9 different shapes of anchor boxes from each feature map point), secondly, classify each anchor box as foreground or background (i.e whether it contain an object or not), lastly, Learn the shape offsets for anchor boxes to fit them for objects.

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks [7]:

The Faster R-CNN model addresses all the issues of the previous two relative model(R-CNN and Fast R-CNN) and use RPN as a region proposal generator. The architecture of this is exactly the same as Fast R-CNN except it used RPN instead of selective search that makes it 34 times faster than Fast R-CNN.

source

Real-time Object Detection with YOLO, YOLOv2, and now YOLOv3 [8]:

Before introducing the Yolo(You only look once) series of models, let’s watch the presentation given by its lead researcher Joseph Redmon at Ted Talks.

There are many reasons behind the model being on the top position of the list of object detection models. However, the main reason is its fastness. Its inference time is very less which is the reason why it easily matches the normal speed of the video (i.e 25 FPS) and is applied on real-time data. Here is the accuracy and speed comparison of the YoloV3 provided by the Yolo Website on the COCO dataset.

source

Unlike other object detection models, Yolo models have the following features.

  • Single Neural Network Model (i.e Both the classification and localization task will be performed from the same model): Takes a photograph as input and predicts bounding boxes and class labels for each bounding box directly, which implies that it looks at the image once.
  • Since it performs convolution on the whole image rather than the section of an image, thus it makes very fewer background mistakes.
  • YOLO learns generalizable representations of objects. When trained on natural images and tested on the artwork, YOLO outperforms top detection methods like DPM and R-CNN by a wide margin. As YOLO is highly generalizable, it is less likely to break down when applied to new domains or unexpected inputs.

What makes YoloV3 better than Yolov2.

  • If you had a keen look at the title of yolov2 paper it is “YOLO9000: Better, Faster, Stronger”. Is yolov3 is much better than yolov2? Well, the answer is Yes, it is better but not faster and stronger because of the increase in the complexity of Darknet Architecture.
  • Yolov2 used 19 layers DarkNet architecture without any residual block, skip connection, and upsampling due to which it was struggling to detect small objects. However, In the Yolov3 these features are added and 53 layers DarkNet network trained on Imagenet is used. On top of that, 53 more convolutional layers have been stacked which results in 106 fully convolutional layers architecture.
  • Yolov3 makes a prediction on three different scales, first in the 13X13 grid for large objects, second, in the 26X26 for the medium object, and lastly, in the 52X52 grid for small objects.
  • YoloV3, in total, uses 9 anchor boxes, three for each scale. The best anchor boxes are selected using K-means Clustering.
  • Yolov3 now performs multilabel classification for objects detected in images. Object confidence and class predictions are predicted through logistic regression.

5. Our Contribution

Our problem statement comprises of two tasks, first being the detection of the floating head tank and another being the extraction of shadow and estimation of the volume of identified tanks. The first task is based on object detection and the second is based on computer vision technique. Let’s describe the approaches that have been taken to solve each task.

Detection of tanks:

Our goal is to estimate the occupied volume of Floating head tanks. We could make the object detection model for a single class but, to reduce the confusion of a model with another kind of tank (i.e Tank/Fixed head hank and Tank Cluster), and to make it robust we have come up with three class object detection model. YoloV3 with transfer learning for object detection is used as it is easy to train on less specific machines. Besides, to increase the metric score, Data Augmentation is also applied.

Shadow Extraction and Volume Estimation:

Shadow extraction involves many computer vision techniques. As the RGB color scheme is not sensitive to shadow, we have to convert it into HSV and LAB color space first. We have used -(l1+l3)/(V+1) (where l1 is the first channel value of LAB color space) ratio image for enhancement of the shadow part. After that, the enhanced image is filtered by thresholding 0.5*t1 + 0.4*t2 (where t1 is minimum pixel value and t2 is mean). The threshold image is then processed with morphological operations(i.e clear noise, clear contour, etc). Finally, the two tank shadow contours are extracted, followed by the estimation of the occupied volume by the formula stated above. These ideas are taken from the following notebook.

The entire pipeline is followed to solve this case study is shown below.

Let’s get started with Exploratory Data Analysis of our dataset!!

6. Exploratory Data Analysis (EDA)

Exploring Labels.json File:

Code by author
Image by author

All the labels are stored in the List of Dictionary. There is a total of 10K images. Images that do not contain any tank are labeled as Skip and those, who contain tanks are labeled as Tank, Tank Cluster, or Floating Head Tank. Each tank object has bounding box coordinates of four corner points in the dictionary format.

Counting Objects:

Image by author

Among 10K images, 8187 images have no label(i.e they do not contains any tank objects). Besides, there are 81 images that contain at least one Tank Cluster object, and 1595 images that contain at least one Floating Head Tank.

Int the bar graph, it may be observed that 26.45% images out of 1595 Floating head tank containing images contain only one Floating tank object. The highest number of Floating head tank objects in a single image is 34.

Exploring labels_coco.json File:

Code by author
Image by author

This file contains bounding boxes of only Floating head tank along with their image_id in a list of the dictionary format

Plotting Bounding Boxes:

Image by author

There are three kinds of tanks:

1. Tank(T)

2. Tank Cluster(TC),

3. Floating Head Tank(FHT)

7. First Cut Approach

In the EDA, it has been observed that 8171 out of 10000 images are useless as they do not contain any object. Moreover, 1595 images contain at least one floating head tank object. As we know, all the Deep learning models are hungry for data, not feeding enough data would result in poor performance.

Thus, our first cut approach would be data augmentation followed by fitting of the obtained augmented data into a Yolov3 object detection model.

8. Data Preprocessing, Augmentation and TFRecords

Data Preprocessing:

It is observed that annotations of objects are given in Jason format with 4 corner points. First, top-left and right-bottom points from these corner points are extracted. Following this, all the annotations and their corresponding labels belonging to a single image are kept in a list of lists in a single row of CSV files.

Code to Extract top-left and right-bottom points from corner points

Code by author

CSV file will look like this

Image by author

To Evaluate the model we will keep 10% of images as a test set.

Code by author

Data Augmentation:

As we know that Object detection needs lots of data but we only have 1645 images for the training which is very less. To increase the data, we are bound to perform Data Augmentation. In this process, new images are generated by flipping and Rotating original Images. All the credits go to the following GitHub repository, from where the codes are extracted.

7 new images from a single original image are generated by performing the following actions:

  1. Horizontal Fliping
  2. Rotating 90 degree
  3. Rotating 180 degree
  4. Rotating 270 degree
  5. Horizontal Flipping and 90-degree Rotating
  6. Horizontal Flipping and 180-degree Rotating
  7. Horizontal Flipping and 270-degree Rotating

An example is shown below

Image by author

TFRecords File:

TFRecords is the TensorFlow’s own binary storage format. It is generally useful when the dataset is too large. It stores the data in binary format and can have a significant impact on the performance of the training model. Binary data takes less time to copy, and also use up less space as only a batch of data gets loaded at the time of training. You can find a detailed description of it in the following blog.

also you can check the Tensorflow documentation of it below.

Our dataset has been converted into RFRecords format. There was no need for this task because our dataset is not vast. However, it is done for knowledge purposes. You may find the code in my GitHub repository, if interested.

9. Object Detection using YoloV3

Training:

To train the yolov3 model, transfer learning is used. The first step includes loading the weights for the DarkNet network and freezing it to keep the weights constant during the training.

We have used adam optimizer(initial learning rate=0.001) to train our model and applied cosine decay to reduce the learning rate w.r.t number of epochs. The best weight is saved using Model Checkpoint during the training, and the last weight is saved after the completion of training.

Loss Graph:

Image by author

Yolo Loss Function:

The loss function which is used in the training of the Yolov3 model is quite complicated. Yolo calculates three different losses at three different scales, and sum up for backpropagation(As you can see in the above code cell, final loss is the list of three different losses). Each loss calculates both localization and classification loss with the help of 4 subfunctions.

  1. Mean Squared Error(MSE) of Centre (x,y).
  2. Mean Squared Error(MSE) of Width and Height of the bounding box.
  3. Binary CrossEntropy objectness score and no objectness score of a bounding box
  4. Binary CrossEntropy or Sparse Categorical CrossEntropy of multi-class predictions of a bounding box.

let's look at the formula of loss used in Yolov2 and inspect the source of the different

source

The last three terms in Yolov2 are the squared errors, whereas, in Yolov3, they’ve been replaced by cross-entropy error terms. In other words, object confidence and class predictions in Yolov3 are now predicted through logistic regression.

Have a look at the Yolov3 loss function implementation

Scores:

To evaluate our model we have used AP and mAP of test and train data

Test Score

Code by author
Image by author

Train Score

Code by author
Image by author

Inference:

let's see how the model is performed!!

Image by author
Image by author

10. Reserved Volume Estimation

The estimation of the volume is the final output of this case study. There is no metric to evaluate the estimated volumes of the tank. Yet, we have tried to come up with the best threshold pixel value of the image, so that the area of shadow (by counting the number of pixels) can be detected to the large extent.

We would use the large Image of shape 4800X4800 captured by satellite, and split it into a hundred 512x512 patches with an overlap of 37 pixels between patches on both the axes. Image patches are named following an id_row_column.jpg format.

Prediction of each generated patches would be stored in a CSV file. We have stored a bounding box of floating head tanks only. Followed by this, the volume of each floating head tank is estimated(code along with the explanation is available in a notebook format on my GitHub repository). Lastly, all the images patches along with bounding boxes are merged with the labels as an estimated volume into a large image. You can have a look at an example given below:

Image by author

11. Result

The AP score for the floating head tank on the test set is 0.874 and on the train set is 0.942.

12. Conclusion

  • Fairly good results with just a limited number of images are obtained.
  • Data Augmentation worked quite up to the mark.
  • In this case, yolov3 performed well as compared to the RetinaNet model’s existing approaches.

13. Future Work

  • 87.4% AP for floating head tanks is obtained, which is a good score. Yet, we may try to increase the score to some more extent.
  • We would try to train this model with some more data generated by augmentation.
  • We would try to train another more accurate model like yolov4, yolov5(unofficial).

14. References

[1] Oil-Tank-Volume-Estimation, by Karl Heyer, Nov 2019.

[2] Estimating the Volume of Oil Tanks Based on High-Resolution Remote Sensing Images by Tong Wang, Ying Li, Shengtao Yu, and Yu Liu, April 2019.

[3] A Beginner’s Guide To Calculating Oil Storage Tank Occupancy With Help Of Satellite Imagery by TankerTrackers.com, Sep 2017.

[4] A Gentle Introduction to Object Recognition With Deep Learning by https://machinelearningmastery.com/, May 2019.

[5] Selective Search for Object Recognition by J.R.R. Uijlings at el. 2012

[6] Region Proposal Network — A detailed view by Sambasivarao. K, Dec 2019

[7] Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks by Ross Girshick et al. Jan 2016.

[8] Real-time Object Detection with YOLO, YOLOv2 and now YOLOv3 by Joseph Redmon, 2015–2018

Applied Course: https://www.appliedaicourse.com/

Thank You for reading! Here is my LinkedIn and Kaggle Profile

--

--

Currently, I am pursuing my B-tech degree in Computer Science. I have a huge interest in Competitive Coding and ML/AI