The world’s leading publication for data science, AI, and ML professionals.

Fluorescent Neuronal Cells dataset - part II

Peculiar traits and challenges

In the second article of this series, we will go through the Fluorescent Neuronal Cells (FNC) Data in more detail, highlighting some of their peculiar traits and challenges.

If you missed the first part, check it out for more details on how the data were gathered and what they represent:

Fluorescent Neuronal Cells dataset – part I


Distinctive traits

The Fluorescent Neuronal Cells dataset comes with some peculiar traits that may be worth mentioning in order to help analyze these data.

RGB channels

The pictures are dominated by two prevalent tints due to the intentional selection of light with a specific wavelength:

  • a yellow tone emitted by the fluorescent marker
  • a darker hue for the background.

As a consequence, the only color channels to be populated are red and green – whose combination gives rise to the yellow color – , while blue is typically empty (Figure 1).

Hence, representing FNC data by a 3D colorspace may be redundant and a lower-dimensional representation may be sufficient (e.g. grayscale).

On the other hand, the residual information dispersed in the additional dimensions may be useful to discern pixels of cells from the background – why throwing it away anyway? However, RGB is not the only colorspace available, so one may wonder whether other representations may be more convenient for the analysis of FNC data.

Figure 2 shows the pixels of one image as points in a three-dimensional colorspace. The color reflects the appearance of the pixel in the image, while the position is determined using the RGB (left) or HSV (right) encodings.

The points are cluttered and almost aligned in the RGB representation. Conversely, the cloud is much more dispersed in the HSV space, which may foster an easier separation between cell pixels and the background.

This observation suggests that different colorspaces may be more or less convenient for a specific learning task, so the choice of one over the other may influence the results. For this reason, the authors of the FNC dataset inserted a learned colorspace transformation in the c-ResUnet architecture they proposed for these data.

Cell features

Beyond technical specs, it is crucial to consider the characteristics of the objects to segment/detect/count.

For example, we can inspect the distribution of quantities like the area and _maximum Feret diameter_ to get an idea of the variability in size among different neuronal cells.

Both indicators suggest that most of the cells have small dimension, with a 75th percentile of roughly 150μm² and 21μm respectively. Nonetheless, the two distributions present long tails extending to much higher values, up to more than three times the above measures.

Another important aspect is the number of objects in each picture.

Most images present a low count of neurons, with a median value of 21 and a first peak of 56 images without cells (# cells=0). The remaining ones instead form a very long tail – in fact, half of the distribution – with higher values squeezed around a few local peaks up to a maximum of 68 cells in one image.

All in all, the cells exhibit a decent heterogeneity in terms of size, shape and counts. Thus, this calls for a model flexible enough to deal with such variability.


Challenges

The FNC dataset presents some specific challenges that must be addressed during training.

Class imbalance

One of these is surely the extreme imbalance between background and cell pixels.

Almost 90% of the images contain less than 1% of cell pixels. Even more significantly, this percentage does not exceed 5% in the best case scenario, with a maximum of 4.86%.

In absolute terms, background pixels are roughly from 20 to 300 times the cell pixels in 50% of the images, with a significant part of the distribution extending up to over a factor 1000.

Of course, learning tasks are typically affected by the disproportion of classes, thus the analysis of FNC data should consider necessary remedies to deal with it.

Hard examples

Along with class imbalance, there are also several challenges specific to the FNC and fluorescent microscopy imaging in general.

For example, sometimes multiple cells are close together or even overlapping each other. When that is the case, a precise segmentation is paramount to correctly handle cell agglomerates. Thus, some tricks may be necessary to help the model separate different cell instances (e.g. watershed post-processing).

Noisy labels

Also, a big source of complexity is due to the presence of noise in the labels due to the intrinsic arbitrariness in cell recognition.

In fact, sometimes even human expert do not agree whether a marker stain should be considered or not as a cell. Thus, it may happen that similar examples are labelled slightly differently across the images.

Unfortunately, this problem is hardly addressable. However, one may want to take it into account when evaluating results, or at least be aware of this issue when looking at pure metrics.

Artifacts

Finally, the learning task is hampered by the occasional presence of accumulations of fluorophore that generate emissions very similar to the ones of cells.

When that happens, the pictures may contain fictitious objects or uninteresting structures that resemble neuronal cells in terms of shape, size or color.

These artifacts may vary from small areas, as in the case of filaments and point artifacts (Figures 7 and 8), to bigger structures as the stripe in Figure 7 or the "macaron"-shaped object in Figure 8.

Thus, the model really needs to understand the morphology of cells and consider it together with the color information in order to recognize them.


Well done to make it so far!

In this article we covered some EDA performed on the Fluorescent Neuronal Cells dataset, highlighting some peculiar traits and challenges to address when analyzing these data.

In the next and last article of this series, we will go through some suggested metrics to evaluate detection and counting performance specific to these data.

If you liked the topic, you can read a more detailed discussion in [1, 2]. Also, you can go ahead and download the dataset, experiment with the code of the original paper and play yourself with the data. Let me know what you discovered in the comments!

References

[1] L. Clissa, Supporting Scientific Research Through Machine and Deep Learning: Fluorescence Microscopy and Operational Intelligence Use Cases (2022), AlmaDL[2] R. Morelli, et al., __ Automating cell counting in fluorescent microscopy through deep learning with c-ResUnet (2021), Scientific Reports


Related Articles