Making Sense of Big Data, Hands-On Tutorials

Some time ago I was working with the LISA Traffic Lights Dataset, training a neural network for some (boring) task definitely irrelevant for the purpose of this article. My context was very specific: LISA was a mandatory requirement, and I wanted to try as many different network architectures as possible so I was using boilerplates and implementations from GitHub instead of coding them from scratch. But the problem is that no one designs their networks with LISA in mind; COCO is the standard de facto, and if you’ve worked with image detection or classification problems I bet you’re probably nodding right now. So I had a very big problem: I couldn’t switch to COCO, but I didn’t want to spend that much time adapting every one of the boilerplates to work with LISA. There was only one sensible way left: converting LISA annotations to COCO format. The main issue? Such converter did not exist – I had to do it myself.
A little bit of context
If you’ve read until this point and you’re still interested I’d bet a fairly big amount of money to the fact that you don’t need context at all – but please allow me to explain a little bit before going down to the code.
Both LISA and COCO are annotated image datasets. LISA contains just traffic lights images within seven categories: Go, GoLeft, GoForward, Stop, StopLeft, Warning and WarningLeft. Quoting the description provided by the authors:
The database is collected in San Diego, California, USA. The database provides four day-time and two night-time sequences primarily used for testing, providing 23 minutes and 25 seconds of driving in Pacific Beach and La Jolla, San Diego. The stereo image pairs are acquired using the Point Grey’s Bumblebee XB3 (BBX3–13S2C-60) which contains three lenses which capture images with a resolution of 1280 x 960, each with a Field of View(FoV) of 66°.
On the other hand, COCO (Common Objects in Context) is a "large-scale Object Detection, segmentation, and captioning dataset", with more than 200K labeled images within 91 categories. Presented by Microsoft in 2014, it has since gaining traction and it is now one of the most popular datasets out there for detection and classification tasks – a quick search throws lots of really good stuff related to its usage, like this tutorial from Viraf. However, we’re not as interested here in the dataset itself as we are in its annotations.
COCO has three folders of images: train, val and annotations, whereas LISA has multiple folders for subsets of the images. Both datasets consist of a bunch of pictures and a file (or files, in LISA’s case) with annotations of what is in the picture and where is it located within it. Theoretically, the only thing we’d need to do to go from one to another would be translating that file to the other dataset’s format. Let’s take a deeper look at those formats.
COCO Dataset annotation format
COCO format is fairly easy: the annotations are in a unique JSON file (one for train and one for val) with five keys:
{
"info": {...},
"licenses": [...],
"images": [...],
"annotations": [...],
"categories": [...]
}
The info
and licenses
keys describe the dataset itself:
"info": {
"description": "COCO 2017 Dataset",
"url": "http://cocodataset.org",
"version": "1.0",
"year": 2017,
"contributor": "COCO Consortium",
"date_created": "2017/09/01"
},
"licenses": [
{
"url": "http://creativecommons.org/licenses/by-nc-sa/2.0/",
"id": 1,
"name": "Attribution-NonCommercial-ShareAlike License"
}
]
The images
key is a list of the attributes of the images: licenses, filenames, sizes and dates:
"images": [
{
"license": 4,
"file_name": "000000397133.jpg",
"coco_url": "http://images.cocodataset.org/val2017/03.jpg",
"height": 427,
"width": 640,
"date_captured": "2013-11-14 17:02:52",
"flickr_url": "http://farm7.staticflickr.com/6116.jpg",
"id": 397133
},
...
]
The categories
key is – quite obviously – the list of categories, grouped by parent categories.
"categories": [
{
"supercategory": "vehicle", "id": 1,"name": "bike"},
...
"supercategory": "vehicle", "id": 15,"name": "car"},
}
]
And finally, the annotations
key: the most complex and important one. It consists of a list of attributes of the object inside the image:
- Segmentation: List of vertices of a polygon surrounding the object.
- Area, in pixels.
- Iscrowd: Binary parameter, 0 being unique object and 1 being a group of them.
- Image ID: from the
images
key. - Bbox: The image limits – upper left X, upper left Y, width, height.
- Category ID: from the
categories
key. - ID of the annotation, unique.
"annotations": [
{
"segmentation": [[510.66,423.01,...,510.45,423.01]],
"area": 702.1057499999998,
"iscrowd": 0,
"image_id": 289343,
"bbox": [473.07,395.93,38.65,28.67],
"category_id": 18,
"id": 1768
},
...
]
LISA Dataset annotation format
Unlike COCO, LISA has a different annotation CSV file for each folder. The files share a common structure; each one of the rows has ten columns:
- Filename: Name and relative path of the image.
- Annotation tag: Of the possible seven categories we listed before.
- Upper left corner X.
- Upper left corner Y.
- Lower right corner X.
- Lower right corner Y.
- Origin file: Name and path of the original file.
- Origin frame number.
- Origin track: Name of the original video.
- Origin track frame number.
So basically what we need is to convert the data we have in the CSVs from LISA to the JSON format COCO expects.
The conversion
We can easily list the tasks we need to do for converting from LISA to COCO:
- Create the folders train, val and annotations, as COCO expects.
- Move the images from the subfolders from LISA to the train and val folders.
- Merge all LISA annotation files into two: one for train and one for val.
- Translate LISA annotations to COCO format.
- Final cleanup of unneeded folders and files.
Bullet points 1, 2 and 5 are easy enough to do with any scripting language like Python or Bash and I’m not going to cover them here. Check out the code in GitHub – I provide an implementation there. Bullet point 3 is also quite easy with Python; first we need to create an empty Dataframe with the columns we’re expecting:
columns = [
'Filename', 'Annotation tag', 'Upper left corner X',
'Upper left corner Y', 'Lower right corner X',
'Lower right corner Y', 'Origin file', 'Origin frame number',
'Origin track', 'Origin track frame number'
]
train_df = pd.DataFrame(columns=columns)
And then merge all the existing files into it:
for f in train_folders:
new_df = pd.read_csv(
os.path.join(f, 'frameAnnotationsBOX.csv'), sep=';')
train_df = train_df.merge(
new_df, how='outer', left_on=columns, right_on=columns)
train_df.to_csv('training_data.csv', index=False)
And finally, the main task: the 4th point, translation from LISA to COCO. I have defined a method to generate the info
and licenses
keys:
def get_info():
return {
"description": "LISA Traffic Sign Dataset",
"url": "cvrr.ucsd.edu/LISA/lisa-traffic-sign-dataset.html",
"version": "2.0",
"year": 2018
}
def get_licenses():
return [
{
"url": "creativecommons.org/licenses/by-nc-sa/4.0/",
"id": 1,
"name": "CC BY-NC-SA 4.0"
}
]
The images
key is also easy – we just need to iterate through all the folders of the dataset checking the filenames:
images = []
id = 0
for f in os.listdir(folder):
if f.endswith('.jpg'):
images.append(
{
"license": 1,
"file_name": f,
"height": 960,
"width": 1280,
"id": id
})
id += 1
The categories
key is also trivial once we have all the annotations merged in a unique file – all the categories are contained there, so we can take advantage of the unique()
and tolist()
methods from Pandas:
tags = train_df['Annotation tag'].unique().tolist().sort()
categories = []
for i, t in enumerate(tags, 1):
categories.append({"supercategory": "", "id": i, "name": t})
return categories
The annotations
key is the trickiest. I’ve defined a set of auxiliary functions to calculate those parameters from the (X, Y) points provided from LISA:
And then a main function to wrap everything together. Gets the images and categories list as a parameter, plus a flag to distinguish between train and test. Gets the CSV file, previously merged, with the annotations, and casts it to a list of lists. From that list, the JSON is generated:
- Segmentation, area and bbox come from the auxiliary functions.
- Itcrowd is always 0 since all segmentations in LISA are from single objects.
- The image ID comes from a comparison between the filename and the image list passed as parameter.
- The category ID comes from a comparison between the tag and the category list passed as parameter.
-
The annotation ID is autoincremental.
Okay – solved! We now have a powerful tool to convert the LISA dataset to the COCO format, allowing us to use it with the myriad of implementations that only accept the latter. You can check the code in GitHub.
References
[1] M. B. Jensen, M. P. Philipsen, A. Møgelmose, T. B. Moeslund and M. M. Trivedi, Vision for Looking at Traffic Lights: Issues, Survey, and Perspectives (2016), IEEE Transactions on Intelligent Transportation Systems
[2] M. P. Philipsen, M. B. Jensen, A. Møgelmose, T. B. Moeslund and M. M. Trivedi, Traffic light detection: A learning algorithm and evaluations on challenging dataset (2015), Intelligent transportation systems (ITSC)
[3] T-Y Lin et al, Microsoft coco: Common objects in context (2014), European conference on computer vision