Integrated Metabolomics-KPCA-Machine Learning framework: a solution for geographical traceability of Chinese Jujube
Xiaoli Wang, Xiaolei Ma, Yuxin Liu, Wenhan Tao, Yuting Zuo, Yueqin Zhu, Feng Hua, Chanming Liu, Wei Huang,
Integrated Metabolomics-KPCA-Machine Learning framework: a solution for geographical traceability of Chinese Jujube,
Food Chemistry: X,
Volume 31,
2025,
103069,
ISSN 2590-1575,
https://doi.org/10.1016/j.fochx.2025.103069.
(https://www.sciencedirect.com/science/article/pii/S2590157525009162)
Abstract: Due to widespread product adulteration, Chinese jujube (CJ), a crop of global economic importance with nutritional and medicinal properties, struggles with geographical traceability. The study introduced a Metabolomics-Kernel Principal Component Analysis (KPCA)-Machine Learning (ML) framework to set up an origin identification system for CJ from six production regions in China (Xinjiang, Gansu, Shaanxi, Henan, Shandong, and Hebei). Using LC-MS/MS for untargeted metabolomics, researchers identified 312 metabolites. Multivariate analysis revealed 37 key discriminant variables (VIP > 1). KPCA compressed these features into 28 principal components (retaining 90.59 % information). Compared with the traditional method, the K-means clustering after dimensionality reduction of KPCA greatly improves the sample differentiation ability: the origin samples with original data overlap with fuzzy boundaries; while after dimensionality reduction, the six origin samples form a clear and compact cluster, which achieves accurate classification. This study pioneers a “Metabolomics-KPCA-ML” paradigm, offering a solution for traceability of geographical indication agricultural products.
Keywords: Chinese jujube; Metabolomics; LC-MS/MS; Kernel principal component analysis; K-means clustering; Machine learning; Traceability