Building A Deep Learning-Based Object Detection App Using R Shiny and Tensorflow

Learn how to fine-tune a customized deep learning model for object detection based on your own dataset

Liu Peng
Towards Data Science

--

Training a decent deep learning model for object detection takes a lot of effort, which gets compounded when deploying and embedding the model in a web application for end-users. In this tutorial, we intend to tackle this seemingly daunting task by providing a practical example of how to develop an accurate deep learning model using Python and the Tensorflow framework, as well as build a working web application that supports object detection on the fly using the R Shiny framework. By the end of the tutorial, you will be able to build a full-scale object recognition app for grocery items like this:

Object detection app. Image by the author.

Training a deep learning model for object detection

Training a performing deep learning model for object detection takes a lot of data and computing power. To facilitate the development, we can use transfer learning by fining tuning models pre-trained based on other relevant datasets. In this example, we choose the sdd mobilenet provided by Tensorflow Hub, which offers a good balance between speed and accuracy.

Since there are multiple backend logistics such as paths and hyper parameters to take care of when training a full-scale deep learning model, we can create a central dictionary to store these configuration parameters, including setting up the different paths, installing relevant libraries, and downloading the pre-trained models.

# Configuration parameters
CUSTOM_MODEL_NAME = 'my_ssd_mobnet'

# SSD has good tradeoff between speed and accuracy; can switch to other pretrained model
PRETRAINED_MODEL_NAME = 'ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8'
PRETRAINED_MODEL_URL = 'http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8.tar.gz'

# TF official script to encode training data to tf record format
TF_RECORD_SCRIPT_NAME = 'generate_tfrecord.py'

# Mapping dictionary between label and integer id
LABEL_MAP_NAME = 'label_map.pbtxt'

# Define a list of folder paths to be created (if needed) and used later
paths = {
'WORKSPACE_PATH': os.path.join('Tensorflow', 'workspace'),
'SCRIPTS_PATH': os.path.join('Tensorflow','scripts'),
'APIMODEL_PATH': os.path.join('Tensorflow','models'),
# bounding box annotation
'ANNOTATION_PATH': os.path.join('Tensorflow', 'workspace','annotations'),
'IMAGE_PATH': os.path.join('Tensorflow', 'workspace','images'),
'MODEL_PATH': os.path.join('Tensorflow', 'workspace','models'),
'PRETRAINED_MODEL_PATH': os.path.join('Tensorflow', 'workspace','pre-trained-models'),
'CHECKPOINT_PATH': os.path.join('Tensorflow', 'workspace','models',CUSTOM_MODEL_NAME),
'OUTPUT_PATH': os.path.join('Tensorflow', 'workspace','models',CUSTOM_MODEL_NAME, 'export'),
'PROTOC_PATH':os.path.join('Tensorflow','protoc')
}

files = {
'PIPELINE_CONFIG':os.path.join('Tensorflow', 'workspace','models', CUSTOM_MODEL_NAME, 'pipeline.config'),
'TF_RECORD_SCRIPT': os.path.join(paths['SCRIPTS_PATH'], TF_RECORD_SCRIPT_NAME),
'LABELMAP': os.path.join(paths['ANNOTATION_PATH'], LABEL_MAP_NAME)
}
# Download TF model training utility scripts from TF model zoo
if not os.path.exists(os.path.join(paths['APIMODEL_PATH'], 'research', 'objection_detection')):
!git clone https://github.com/tensorflow/models {paths['APIMODEL_PATH']}
# Install TF object detection library
if os.name=='posix':
!apt-get install protobuf-compiler
!cd Tensorflow/models/research && protoc object_detection/protos/*.proto --python_out=. && cp object_detection/packages/tf2/setup.py . && python -m pip install .

We will also need to provide the training images along with the bounding boxes for our specific task of grocery item recognition. These data will be used to fine-tune the pre-trained model by changing the default output to the specific setting, i.e., recognizing the six grocery items (apple, avocado, banana, cabbage, carrot, and potato) which are profiled in a dictionary.

# Download training images
import
shutil

if os.path.exists('object_detection_using_tensorflow'):
shutil.rmtree('object_detection_using_tensorflow')

!git clone https://github.com/jackliu333/object_detection_using_tensorflow.git
# Create label map
labels = [{'name':'Apple', 'id':1},
{'name':'Avocado', 'id':2},
{'name':'Banana', 'id':3},
{'name':'Cabbage', 'id':4},
{'name':'Carrot', 'id':5},
{'name':'Potato', 'id':6}]

with open(files['LABELMAP'], 'w') as f:
for label in labels:
f.write('item { \n')
f.write('\tname:\'{}\'\n'.format(label['name']))
f.write('\tid:{}\n'.format(label['id']))
f.write('}\n')

We will also split the data into training and test sets. Note that the split needs to go by category index so that the items of a particular category will not be totally allocated into either the training or the test set.

# Split into train test folders
tmp_folders = ['train', 'test']

for i in tmp_folders:
if os.path.exists(os.path.join(paths['IMAGE_PATH'], i)):
shutil.rmtree(os.path.join(paths['IMAGE_PATH'], i))
!mkdir -p {os.path.join(paths['IMAGE_PATH'], i)}
else:
!mkdir -p {os.path.join(paths['IMAGE_PATH'], i)}
import shutil

for i in range(len(labels)):
# print(labels[i]['name'])
from_path = os.path.join('object_detection_using_tensorflow','images',labels[i]['name'])
# print(from_path)

# get unique file names
tmp_files = os.listdir(from_path)
tmp_names = []
tmp_file_types = []
for tmp_file in tmp_files:
tmp_name = os.path.splitext(tmp_file)[0]
tmp_file_type = os.path.splitext(tmp_file)[1]
tmp_names.append(tmp_name)
tmp_file_types.append(tmp_file_type)
tmp_names = list(set(tmp_names))
tmp_names = [i for i in tmp_names if i != '.DS_Store']
tmp_file_types = list(set(tmp_file_types))
tmp_file_types = [i for i in tmp_file_types if len(i) != 0]
# random shuffle the files
random.shuffle(tmp_names)

# training and test files
tmp_names_train = tmp_names[0:int(len(tmp_names)*0.9)]
tmp_names_test = [i for i in tmp_names if i not in tmp_names_train]

# move into respective target folders
for tmp_name in tmp_names_train:
for tmp_file_type in tmp_file_types:
tmp_name_full = tmp_name + tmp_file_type
shutil.copy(os.path.join(from_path, tmp_name_full), \
os.path.join(paths['IMAGE_PATH'], "train"))

for tmp_name in tmp_names_test:
for tmp_file_type in tmp_file_types:
tmp_name_full = tmp_name + tmp_file_type
shutil.copy(os.path.join(from_path, tmp_name_full), \
os.path.join(paths['IMAGE_PATH'], "test"))

The results image data are then converted into TF Record format for faster processing.

# Create TF Record
# download conversion script

if not os.path.exists(files['TF_RECORD_SCRIPT']):
!git clone https://github.com/nicknochnack/GenerateTFRecord {paths['SCRIPTS_PATH']}
!python {files['TF_RECORD_SCRIPT']} -x {os.path.join(paths['IMAGE_PATH'], 'train')} -l {files['LABELMAP']} -o {os.path.join(paths['ANNOTATION_PATH'], 'train.record')}
!python {files['TF_RECORD_SCRIPT']} -x {os.path.join(paths['IMAGE_PATH'], 'test')} -l {files['LABELMAP']} -o {os.path.join(paths['ANNOTATION_PATH'], 'test.record')}

Before starting training the model, we need to update a few configuration parameters that will deviate from the default setting, so that the training pipeline knows we are performing object recognition based on six categories.

# Update configuration file for transfer learning
import
tensorflow as tf
from object_detection.utils import config_util
from object_detection.protos import pipeline_pb2
from google.protobuf import text_format

# Read current configuration file
pipeline_config = pipeline_pb2.TrainEvalPipelineConfig()
with tf.io.gfile.GFile(files['PIPELINE_CONFIG'], "r") as f:
proto_str = f.read()
text_format.Merge(proto_str, pipeline_config)

# Update based on new labels
pipeline_config.model.ssd.num_classes = len(labels)
pipeline_config.train_config.batch_size = 4
pipeline_config.train_config.fine_tune_checkpoint = os.path.join(paths['PRETRAINED_MODEL_PATH'], PRETRAINED_MODEL_NAME, 'checkpoint', 'ckpt-0')
pipeline_config.train_config.fine_tune_checkpoint_type = "detection"
pipeline_config.train_input_reader.label_map_path= files['LABELMAP']
pipeline_config.train_input_reader.tf_record_input_reader.input_path[:] = [os.path.join(paths['ANNOTATION_PATH'], 'train.record')]
pipeline_config.eval_input_reader[0].label_map_path = files['LABELMAP']
pipeline_config.eval_input_reader[0].tf_record_input_reader.input_path[:] = [os.path.join(paths['ANNOTATION_PATH'], 'test.record')]

# Write to configuration file
config_text = text_format.MessageToString(pipeline_config)
with tf.io.gfile.GFile(files['PIPELINE_CONFIG'], "wb") as f:
f.write(config_text)

Model training is made simple using the training script provided by Tensorflow.

TRAINING_SCRIPT = os.path.join(paths['APIMODEL_PATH'], 'research', 'object_detection', 'model_main_tf2.py')
command = "python {} --model_dir={} --pipeline_config_path={} --num_train_steps=2000".format(TRAINING_SCRIPT, paths['CHECKPOINT_PATH'],files['PIPELINE_CONFIG'])
!{command}

Since the training procedure will save intermediate checkpoints, i.e., model weights, we can choose which model checkpoint to load and use for detection.

# Load trained model from checkpoint
import
os
import tensorflow as tf
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as viz_utils
from object_detection.builders import model_builder
from object_detection.utils import config_util

# Load pipeline config and build a detection model
configs = config_util.get_configs_from_pipeline_file(files['PIPELINE_CONFIG'])
detection_model = model_builder.build(model_config=configs['model'], is_training=False)

# Restore checkpoint
ckpt = tf.compat.v2.train.Checkpoint(model=detection_model)
ckpt.restore(os.path.join(paths['CHECKPOINT_PATH'], 'ckpt-3')).expect_partial()

# @tf.function
def detect_fn(image):
image, shapes = detection_model.preprocess(image)
prediction_dict = detection_model.predict(image, shapes)
detections = detection_model.postprocess(prediction_dict, shapes)
return detections

Now we can test the fine-tuned model by passing in an image randomly selected from the image folder defined earlier.

import cv2 
from matplotlib import pyplot as plt
%matplotlib inline

# Randomly select an image to be detected
tmp_img = random.choice([file for file in os.listdir(os.path.join(paths['IMAGE_PATH'],
'test')) if file.endswith(".jpg")])
IMAGE_PATH = os.path.join(paths['IMAGE_PATH'], 'test', tmp_img)

category_index = label_map_util.create_category_index_from_labelmap(files['LABELMAP'])

img = cv2.imread(IMAGE_PATH)
image_np = np.array(img)

input_tensor = tf.convert_to_tensor(np.expand_dims(image_np, 0), dtype=tf.float32)
detections = detect_fn(input_tensor)

num_detections = int(detections.pop('num_detections'))
detections = {key: value[0, :num_detections].numpy()
for key, value in detections.items()}
detections['num_detections'] = num_detections

# detection_classes should be ints.
detections['detection_classes'] = detections['detection_classes'].astype(np.int64)

label_id_offset = 1
image_np_with_detections = image_np.copy()

viz_utils.visualize_boxes_and_labels_on_image_array(
image_np_with_detections,
detections['detection_boxes'],
detections['detection_classes']+label_id_offset,
detections['detection_scores'],
category_index,
use_normalized_coordinates=True,
max_boxes_to_draw=5,
min_score_thresh=.5,
agnostic_mode=False)

plt.imshow(cv2.cvtColor(image_np_with_detections, cv2.COLOR_BGR2RGB))
plt.show()
Detect objects in the image. Image by the author.

Building A Web Application App using R Shiny

R Shiny is a great tool to build modern web applications without deep knowledge on HTML, CSS or Javascript. We can quickly spin up an app by choosing a particular app framework and fill in the front end (ui.R) and back end (server.R) scripts, as well as the optional global file (global.R) to handle the environment across the app.

The design of the user interface in R Shiny can follow a grid system as follows, which makes it easy to decide what components to add in a specific row or column.

ui <- dashboardPage(
skin=”blue”,
#(1) Header
dashboardHeader(title=”Object Recognition App”,#,style=”font-size: 120%; font-weight: bold; color: white”),
titleWidth = 250,
tags$li(class = “dropdown”),
dropdownMenu(
type = “notifications”,
icon = icon(“question-circle”),
badgeStatus = NULL,
headerText = “Feedback”,
notificationItem(“Send email to developer”, icon = icon(“file”),
href = “liu.peng@u.nus.edu”)
)),
#(2) Sidebar
dashboardSidebar(
width=250,
fileInput(“input_image_upload”,”Upload image”, accept = c(‘.jpg’,’.jpeg’)),
tags$br(),
sliderInput(“min_score_threshold”,”Confidence threshold”,0,1,0.5),
# tags$p(“Upload the image here.”)
selectInput(inputId = “product_type”,label = “Choose product”,
choices = c(“Flour”,”Baby Food”),
selected = NA),
selectInput(inputId = “halal_status”,label = “Halal status”,
choices = c(“H”,”NH”),
selected = NA),
selectInput(inputId = “weight”,label = “Choose weight”,
choices = c(“50g”,”100g”),
selected = NA),
actionButton(“submit”,”Submit”,icon(“paper-plane”),
style=”color: #fff; background-color: #337ab7; border-color: #2e6da4")
),


#(3) Body

dashboardBody(
box(
title = “Object Recognition”, width = 12, solidHeader = TRUE, status = “primary”,
collapsible = T, collapsed = F,
fluidRow(
column(6,
h4(“Instruction:”),
# tags$br(),
tags$p(“1. Upload image to be classified and set confidence threshold.”),
tags$p(“2. Check prediction results.”),
tags$p(“3. Select specific product category.”),
tags$p(“4. Click submit to record in the system.”)
),
column(6,
h4(“Predicted Category:”),
tableOutput(“text”)
)
),

fluidRow(
column(h4(“Image:”),imageOutput(“output_image”), width=6),
column(h4(“Predicted Image:”),imageOutput(“output_image2”), width=6)
)
),
box(
title = “Image Gallery”, width = 12, solidHeader = TRUE, status = “success”,
collapsible = T, collapsed = F,
fluidRow(
column(3,
h3(“All categories”),
verbatimTextOutput(“all_cats”)
),
column(3,
selectInput(“input_image_select”, “Select image”,c(“”,ALL_IMGS),selected = “”),
),
column(6,
column(h4(“Image:”),imageOutput(“output_image_selected”), width=6),
)
)
)

# box(
# title = “Product Recording”, width = 12, solidHeader = TRUE, status = “success”,
# collapsible = T, collapsed = T,
# “test”
# )

))

The server handles all backend processing such as making predictions for the uploaded image using the pre-trained model and returning the prediction output to the frontend UI.

The server file. Image by the author.

We also define a few utility functions as well as environmental settings in the global file. Note that R uses the reticulate library to handle Python scripts. In this case, we first build a new virtual environment to set up the necessary background environment for object detection conda, and then load the core image recognition Python functions, which will be converted in the form of an API to interact with R Shiny ecosystem.

The global file. Image by the author.

Conclusion

In this tutorial, we covered how to fine-tune pre-trained deep learning models in Tensorflow via transfer learning for grocery item recognition, and how to build a modern web application to host the model for end-users via R Shiny. The final product is a combination of R & Python scripts, where the core Python functions such as object detection are seamlessly wrapped and exposed as API to R, which handles the application development. We hope that this tutorial would a good starting point for you to deploy your very own model and share with others in an effective and engaging way.

All the supporting data and codes are available in the accompanying github as well as YouTube walkthrough below.

--

--