clustering of cell populations in flow cytometry data using a combination of gaussian mixtures

Michael Reiter, TU Wien

Identifying biologically meaningful cell populations in flow cytometry data is essentially a clustering problem, however, standard clustering methods are impractical, because size, shape and location of corresponding clusters may vary strongly between samples mainly due to phenotypic differences and inter-laboratory variations.

In our holistic approach, we implicitly employ the structural information (such as relative locations and shape of sub-populations). A new input sample is reconstructed by a linear combination of artificial reference samples each represented by Gaussian Mixture Models (GMM), in which for each Gaussian component the class label of the corresponding cluster of observations is known. The reference samples are calculated from a larger set of training samples by non-negative matrix factorization and can be regarded as the basis of a lower dimensional feature space, in which input samples are reconstructed.

We show a method for calculating the feature space transformation based on minimization the L2 distance defined between two GMM. The feature space representation of the sample is then used to assign each observation to one of the specified sub-populations by a Bayes decision. We present classification results on a database of about 170 patients with Acute Lymphoblastic Leukaemia (ALL), where high accuracy in the prediction of relatively small leukaemic populations is crucial.


Short CV

Michael Reiter received his Ph.D. from Graz University of Technology in 2010. His research focuses on machine learning, statistical pattern recognition and computer vision. Since 2010 he is senior lecturer at TU Wien. In the years 2014 and 2015 he was a Marie Curie Fellow in the AutoFLOW project at Labdia Labordiagnostik GmbH in Vienna developing algorithms for FCM data analysis.