Machine learning-based plasma-derived extracellular vesicle signatures for digestive system cancers prediction

2025-11-30

Xiaowei Qin, Zhibin Bi, Wenbin Li, Huipeng Zhang, Ming Han, Kongxi Zhang, Jian Wu, Lei Huang,
Machine learning-based plasma-derived extracellular vesicle signatures for digestive system cancers prediction,
Computer Methods and Programs in Biomedicine,
Volume 272,
2025,
109064,
ISSN 0169-2607,
https://doi.org/10.1016/j.cmpb.2025.109064.
(https://www.sciencedirect.com/science/article/pii/S016926072500481X)
Abstract: Background
Digestive system cancers (DSCs) represent a heterogeneous group of malignancies characterized by a poor prognosis and a lack of accurate early diagnostic methods. While traditional serological biomarkers and non-coding RNA continue to be commonly diagnostic marker for these cancers, their sensitivity and specificity in detection are often limited. RNA in plasma-derived extracellular vesicles (PDEV) has emerged as a promising diagnostic tool for a variety of cancers, but its application in the detection of various DSCs has not yet been fully explored.
Methods
By integrating PDEV sequencing data from the exoRBase 2.0 database, a total of 444 participants were included in the study, including 326 patients of DSCs, and 118 healthy individuals. The dataset was divided into training and test sets. The PDEV-diagnostic model was constructed using various machine learning algorithms and underwent 5-fold cross-validation in the training sets. The model's performance metrics were further evaluated in the test set. Additionally, the features were assessed using bulk RNA-seq and single RNA-seq datasets for different DSCs.
Results
Based on various feature selection methods and a comparison of 10 machine learning algorithms using seven metrics, the XGBoost model was selected as the PDEV-diagnostic model, with an AUC of 0.83 and 0.94 in the training and test sets, respectively, and 9 exosome predictors, including BANK1, MALAT1, FGA, UBR4, ILR-7,FGB, PLPP5,PCAT19, and CIITA for DSCs prediction.
Conclusions
The machine learning-based PDEV diagnostic models exhibit remarkable accuracy in identifying patients of DSCs. These nine exosomal mRNAs/lncRNAs consequently showed promise as non-invasive biomarkers for DSCs diagnosis.
Keywords: Digestive system cancers; Biomarker; Machine learning; Extracellular vesicles; Single-cell RNA-seq