Performance evaluation of image segmentation methods on microscopic samples


The fundamental objective of image segmentation is to partition the input image into meaningful non-overlapping regions -- segments -- for further analysis or visualization. Despite the longtime effort to develop high quality segmentation algorithms, there has not been any universal segmentation method proposed. Under these circumstances, there is a dilemma which method to choose for given particular data set and whether the combination of segmentation results would be beneficial. We limit our study to the microscopic image data that contain the sample located in the inner part of the image, mostly not reaching to the top and bottom image borders. The data may come from an analysis of painting materials used in art restoration, which is the case of the data set used in our evaluation. They can be samples of various biological materials, such as tissues, cells, or other biological structures. The task at hand can be seen as the two-target problem where an image has to be labeled with either foreground or background label and where the foreground is usually the inner part of the image and the background is separated and/or removed. The problem can be viewed as image binarization, too.

At first glance it might seem to be a simple task solvable by means of basic thresholding, however the situation is often more complex. Due to the setting of data collection process, acquired images are often unfit to the chosen segmentation method and following complications are usually inevitable -- surroundings of analyzed samples can be semitransparent, with non-uniform cutting-plane and various debris, to name a few examples. High number of samples can negatively influence precision of sample scanning in terms of noise level and blurring. The objective is to evaluate the non-interactive segmentation methods in terms of their accuracy, assessed by several indices used for measuring the output quality of image segmentation algorithms.

Segmentation algorithms and quality indices

There is a variety of segmentation methods available to be used to solve the image segmentation problem which differ in many ways. The algorithms in our study are selected with respect to the following criteria. Methods with different fundamentals are considered to provide a diversity. The performance and computational (time) efficiency are taken into account with preference for short execution time. Finally, the public availability of the implementation and thus related popularity of the segmentation method are considered too. Last criterion is also important because it can be expected that potential users of image segmentation algorithms would choose exactly such popular methods. The set of studied methods thus covers various approaches such as thresholding, region growing, clustering methods and graph-based algorithms.

Ten quality indices are selected to objectively evaluate the performance of the image segmentation methods and quality of their results. The pursuit of objectivity is motivated by an effort to suppress the subjective (and still often empirical) evaluation of the segmentation algorithms in the original papers. Following indices are adopted: Hamming distance, boundary Hamming distance, Rand index, adjusted Rand index, Dice coefficient, Fowlkes-Mallows index, normalized mutual information, variation of information, Hausdorff distance, mean absolute surface distance.


The main objective is to find the best average segmentation method. The method which is good enough (and not necessarily the best) for vast majority of the images. We look for method which is comparable to the best method in case of easy to segment images (majority methods can segment this image with satisfactory results) and does not completely fail in case of worse images (where most of the methods fail).

Such method is found for every modality and also the lists of segmentation methods ranked according to their performances are produced through rank aggregation process. Mean Shift algorithm generally performs the best and can be considered the best segmentation method on average for related data. We verified the findings on separate testing data set and the applicability of the evaluation results was shown on different but related biological data.

Fusion of segmentation results

Next we examine whether combination (or fusion) of segmentation methods could further improve the performance of even the best average method. Majority vote with limited subset of segmentation methods was considered. Majority vote applied on three best average methods proved to deliver significantly better or in some cases at least comparable results as the best average method itself. The combination approach is thus appropriate in all cases as it does not underperform and its advantage is robustness.


Much more detailed analysis with discussion and results can be found in paper, which was accepted to the Journal of Microscopy (link) and is now available through Early View with DOI 10.1111/jmi.12186. Linked version is the pre-peer reviewed version of the article.