Slideio: a new python library for reading medical images.

Stanislav Melnikov
Towards Data Science
6 min readJul 3, 2020

--

This is an introduction to a new python library for the reading of medical images. More detailed information can be found on the project WEB site: www.slideio.com. The source code is available on a GitLab repository: https://gitlab.com/bioslide/slideio.

Introduction

Medical images — images produced by bio-scanners, microscopes, etc. are different from normal images. One of the important differences is their size. Such images can be very large. Currently, slides with sizes of many gigabytes are not so rare. Another difference is the number of dimensions. Many bio-image formats support 3 and 4 dimensions (volumes and time series). Additionally to the conventional dimensions, some formats introduce scanner specific dimensions like focal distance, rotation (for data recorded from various angles), phase index, etc.

It is not possible to encode a multi-gigapixel image with conventional compression methods. Such image codecs like jpeg or png require saving of the whole image to the computer memory to show it on the screen or even to read a small region of the image. Bio-formats solve the problems using tiling approach and zoom pyramids. It allows reading of an arbitrary region of an image at an arbitrary scale with minimal memory and computational resources. A zoom pyramid is a set of image copies at different scales.

Slideio library is designed to read medical images using their internal structure to make the process as performant as possible. Slideio is not the first library that provides such functionality. In my practice in image analysis, I used a lot of different libraries. But so far, I did not find any library that can serve all my requirements for image analysis. So I decided to create my own, which should aggregate my experience in this area.

The library has a driver architecture. Each driver supports one or more image formats. The first version of the slideio provides 4 drivers:

  • CZI — driver for the reading of Zeiss CZI images.
  • SVS — driver for the reading of Aperio SVS images.
  • AFI — driver for the reading of Aperion fluorescent images.
  • GDAL — driver for the reading of generic formats like jpeg, png, tiff, etc. It uses a popular c++ image library GDAL.

Slideio library has a simple object structure:

Image drivers create Slide objects. Slide object represents a single image file (or a folder, depending on the image format). A Slide object contains at least one Scene object which is a continuous raster region (2D image, volume, time-series, etc). Some image formats support a single scene like a single tissue scan. Some formats allow storing in a file multiple tissue regions. All layers of a 2D Scene have the same pixel size and resolution. If a scene is a 3D volume, all slices of the volume have the same size and resolution. The same is true for time series.

Following code snippet shows how to open a slide with “SVS” image driver:

Image metadata

Slideio library provides image information on different levels. Slide object has a property “raw_metadata” which exposes an unmodified text information extracted from the image. Content of the text is specific to the file format. In the case of the Aperio SVS slide, it is a string extracted from the “Image Information” tiff tag. In the case of the Zeiss CZI file, it is an XML document with complete file metadata. Here is a code snippet for retrieving of the metadata from an Aperio SVS file:

Here is an output produced by the code sample:

['Aperio Image Library vFS90 01\r\n20320x19545 [0,100 19919x19445] (240x240) JPEG/RGB Q=70',
'AppMag = 20',
'StripeWidth = 2032',
'ScanScope ID = SS1598',
'Filename = 24496',
'Date = 11/09/11',
'Time = 18:51:40',
'Time Zone = GMT+09:00',
'User = e8ddb309-efc1-4a6b-b9b0-7c555f9fa0ef',
'MPP = 0.4962',
'Left = 23.939867',
'Top = 19.531540',
'LineCameraSkew = 0.000320',
'LineAreaXOffset = 0.060417',
'LineAreaYOffset = 0.011084',
'Focus Offset = -0.000500',
'DSR ID = ap6101-dsr',
'ImageID = 24496',
'Exposure Time = 109',
'Exposure Scale = 0.000001',
'DisplayColor = 0',
'OriginalWidth = 20320',
'OriginalHeight = 19545',
'ICC Profile = ScanScope v1']

Raster access

A Scene is the main object for accessing of raster data. It exposes the following information:

  • compression: type of data compression;
  • magnification: scanner magnification;
  • name: scene name;
  • num_t_frames: number of time frames in the time series;
  • num_z_slices: number of slices in the volume;
  • rect: coordinates and dimensions of the scene rectangle;
  • resolution: the resolution of the scene in-plane (a tuple);
  • t_resolution, z_resolution: resolutions of the scene in time and z-direction;
  • num_channels: number of channels in the scene;
  • channel_data_type: data type of an image channel (byte, 16 bit, etc.);
  • channel_name: name of an image channel.

The following code snippet retrieves scene name, rectangle, and resolution.

It produces the following output:

('Image', (0, 0, 19919, 19445), 3, (4.961999999999999e-07, 4.961999999999999e-07))

The image has a width of 19919 pixels and a height of 19445 pixels. Each pixel is 0.4962 mkm in both x and y directions. The image has 3 channels. The meaning of a channel in bio images depends on image format. For the bright field images, it is just red, green, and blue colors. Such images have 3 8-bit channels. Channel properties are accessible through methods get_chanel_data_type and get_channel_name.

uint8
uint8
uint8

Method read_block retrieves pixel values of continuous regions. Execution of the method without parameters retrieves the whole scene in the original size. Normally it is not possible to read the whole image at the original scale because of the large size. In this case, the program can retrieve a region of the image, down-scale it to the acceptable size, or retrieve a down-scaled region. A code snippet bellow. retrieves the whole image and scales it to 500 pixels width picture. Note, zero in place of the picture height indicates that it has to be automatically calculated to keep the same scale in x and y directions.

The code snippet below reads a rectangle region from the image and down-scales it to a 500 pixels width picture.

It is possible to read a single channel or a subset of channels:

Additional tuple-parameters slices and frames allow reading of volumes and time series:

(1000, 1000, 27)

Installation

The slideio library can be installed with pip:

pip install slideio

Currently, only Linux and Windows builds are supported.

Conclusion

Slideio is a python module for the reading of medical images. It allows reading of whole slides as well as any region of a slide. Large slides can be effectively scaled to a smaller size. The module uses internal zoom pyramids of images to make the scaling process as fast as possible. Slideio supports 2D slides as well as 3D data sets and time series.

The library delivers raster as numpy array and compatible with many popular image analysis libraries such as opencv.

Currently, it supports reading of Aperio SVS and AFI files, Zeiss CZI files and generic formats. Soon coming drivers for the following formats:

  • PerkinElmer images
  • Leica SCN images
  • DICOM datasets
  • Leica lif images
  • and more …

Thank you for reading. Any comments or suggestions would be highly appreciated.

--

--