PhytoCluster: a generative deep learning model for clustering plant single-cell RNA-seq data
Hao Wang, Xiangzheng Fu, Lijia Liu, Yi Wang, Jingpeng Hong, Bintao Pan, Yaning Cao, Yanqing Chen, Yongsheng Cao, Xiaoding Ma, Wei Fang, Shen Yan,
PhytoCluster: a generative deep learning model for clustering plant single-cell RNA-seq data,
aBIOTECH,
Volume 6, Issue 2,
2025,
Pages 189-201,
ISSN 2662-1738,
https://doi.org/10.1007/s42994-025-00196-6.
(https://www.sciencedirect.com/science/article/pii/S2662173825001936)
Abstract: Single-cell RNA sequencing (scRNA-seq) technology enables a deep understanding of cellular differentiation during plant development and reveals heterogeneity among the cells of a given tissue. However, the computational characterization of such cellular heterogeneity is complicated by the high dimensionality, sparsity, and biological noise inherent to the raw data. Here, we introduce PhytoCluster, an unsupervised deep learning algorithm, to cluster scRNA-seq data by extracting latent features. We benchmarked PhytoCluster against four simulated datasets and five real scRNA-seq datasets with varying protocols and data quality levels. A comprehensive evaluation indicated that PhytoCluster outperforms other methods in clustering accuracy, noise removal, and signal retention. Additionally, we evaluated the performance of the latent features extracted by PhytoCluster across four machine learning models. The computational results highlight the ability of PhytoCluster to extract meaningful information from plant scRNA-seq data, with machine learning models achieving accuracy comparable to that of raw features. We believe that PhytoCluster will be a valuable tool for disentangling complex cellular heterogeneity based on scRNA-seq data.
Keywords: scRNA-seq; Deep learning; Cellular heterogeneity; Latent features; Clustering