Extract annotations from CVAT XML file into mask files in Python

An approach to get mask files from the data contained in the CVAT XML

Oleksii Sheremet
Towards Data Science

--

The Computer Vision Annotation Tool (CVAT) is a well-known image annotation tool. The results of the data labelers work can be imported in XML file. This XML file contains all the necessary information about the markup. However, for image segmentation task it is necessary to have masks in the form of image files (JPEG, GIF, PNG, etc.). In other words, having the markup coordinates in the CVAT XML file, you need to draw the corresponding masks.

Image is created by Oleksii Sheremet with Adobe Photoshop

If the data labelers worked with images in a higher resolution than it is supposed to be used for training, then the task will become more complicated. It is necessary to take into account the influence of the image compression factor on the numerical values of the cue points presented in the XML file.

All code for extracting annotations is implemented as a script in Python. The lxml library is used for parsing XML. It is a fast and flexible solution for handling XML and HTML markup The lxml package has XPath and XSLT support, including an API for SAX and an API for compatibility with C modules.

The tqdm package is used as a progress bar to illustrate the processing of a large number of files.

Let’s take a closer look. Import libraries:

import os
import cv2
import argparse
import shutil
import numpy as np
from lxml import etree
from tqdm import tqdm

A useful function for creating a new directory and recursively deleting the contents of an existing one:

def dir_create(path):
if (os.path.exists(path)) and (os.listdir(path) != []):
shutil.rmtree(path)
os.makedirs(path)
if not os.path.exists(path):
os.makedirs(path)

The arguments for the script in question are the following data: directory with input images, input file with CVAT annotation in XML format, directory for output masks and scale factor for images. A function for parsing arguments from the command line:

def parse_args():
parser = argparse.ArgumentParser(
fromfile_prefix_chars='@',
description='Convert CVAT XML annotations to contours'
)
parser.add_argument(
'--image-dir', metavar='DIRECTORY', required=True,
help='directory with input images'
)
parser.add_argument(
'--cvat-xml', metavar='FILE', required=True,
help='input file with CVAT annotation in xml format'
)
parser.add_argument(
'--output-dir', metavar='DIRECTORY', required=True,
help='directory for output masks'
)
parser.add_argument(
'--scale-factor', type=float, default=1.0,
help='choose scale factor for images'
)
return parser.parse_args()

For understanding how the extracting function works, let’s take a closer look at the section of the CVAT XML file:

<image id="7" name="7.jpg" width="4800" height="2831">
<polygon label="roofs" occluded="0" points="2388.11,2069.80;2313.80,2089.10;2297.46,2080.21;2285.57,2043.80;2339.07,2031.17;2336.10,2018.54;2428.23,2060.89">
</polygon>
<polygon label="roofs" occluded="0" points="1431.35,1161.11;1353.11,1179.63;1366.25,1229.79;1398.80,1219.94;1396.11,1210.08;1437.91,1194.26">
</polygon>
<polygon label="roofs" occluded="0" points="1344.81,1673.28;1270.10,1619.40;1213.00,1697.00">
</polygon>
<polygon label="roofs" occluded="0" points="1498.35,939.31;1573.30,923.19;1586.74,985.00;1509.10,1002.32">
</polygon>
...

At first, it is necessary to find in the XML file the area corresponding to the currently processed image. The easiest way to do this is by the file name (‘7.jpg’ in the example). Next, you need to find the tags ‘polygon’ or ‘box’ and extract the necessary data from them (in this example, roofs are marked on the basis of polygons). You can use the following function to obtain markup results from CVAT XML:

def parse_anno_file(cvat_xml, image_name):
root = etree.parse(cvat_xml).getroot()
anno = []
image_name_attr = ".//image[@name='{}']".format(image_name)for image_tag in root.iterfind(image_name_attr):
image = {}
for key, value in image_tag.items():
image[key] = value
image['shapes'] = []
for poly_tag in image_tag.iter('polygon'):
polygon = {'type': 'polygon'}
for key, value in poly_tag.items():
polygon[key] = value
image['shapes'].append(polygon)
for box_tag in image_tag.iter('box'):
box = {'type': 'box'}
for key, value in box_tag.items():
box[key] = value
box['points'] = "{0},{1};{2},{1};{2},{3};{0},{3}".format(
box['xtl'], box['ytl'], box['xbr'], box['ybr'])
image['shapes'].append(box)
image['shapes'].sort(key=lambda x: int(x.get('z_order', 0)))
anno.append(image)
return anno

Next, we need to create mask files. Draw the sides of the mask polygons in white, and the inner content in red (as shown in the picture above). The following function allows you to do this:

def create_mask_file(width, height, bitness, background, shapes, scale_factor):
mask = np.full((height, width, bitness // 8), background, dtype=np.uint8)
for shape in shapes:
points = [tuple(map(float, p.split(','))) for p in shape['points'].split(';')]
points = np.array([(int(p[0]), int(p[1])) for p in points])
points = points*scale_factor
points = points.astype(int)
mask = cv2.drawContours(mask, [points], -1, color=(255, 255, 255), thickness=5)
mask = cv2.fillPoly(mask, [points], color=(0, 0, 255))
return mask

And in the end, the main function:

def main():
args = parse_args()
dir_create(args.output_dir)
img_list = [f for f in os.listdir(args.image_dir) if os.path.isfile(os.path.join(args.image_dir, f))]
mask_bitness = 24
for img in tqdm(img_list, desc='Writing contours:'):
img_path = os.path.join(args.image_dir, img)
anno = parse_anno_file(args.cvat_xml, img)
background = []
is_first_image = True
for image in anno:
if is_first_image:
current_image = cv2.imread(img_path)
height, width, _ = current_image.shape
background = np.zeros((height, width, 3), np.uint8)
is_first_image = False
output_path = os.path.join(args.output_dir, img.split('.')[0] + '.png')
background = create_mask_file(width,
height,
mask_bitness,
background,
image['shapes'],
args.scale_factor)
cv2.imwrite(output_path, background)

When we execute file as command to the python interpreter, we must add the following construct:

if __name__ == "__main__":
main()

That’s all. To run the script, you should run the following command (scale factor is 1 by default when after markup you don’t resize images):

python script_name.py --image-dir original_images_dir --cvat-xml cvat.xml --output-dir masks_dir --scale-factor 0.4

An original image example:

Image is created by Oleksii Sheremet with Google Earth

The mask obtained as a result of the script:

Image is created by Oleksii Sheremet with OpenCV library

Conclusion

The considered approach allows obtaining more complex mask files from the data contained in the CVAT XML. You can extract individual polygons or highlight polygons with different colors depending on the number of vertices. In addition, after a little revision, the considered script will allow cutting polygonal sections from the original images in accordance with the marking contour.

References

Computer Vision Annotation Tool (CVAT)

lxml — XML and HTML with Python

OpenCV

How to Run Your Python Scripts

--

--