Identification of stem rust-associated candidate genes in wheat (Triticum aestivum L.) via RNA-sequencing, cluster analysis, weighted gene co-expression networks, and machine learning models
Muhammad Farhan, Muhammad Ikram, Shahbaz Atta Tung, Ming-Jian Ren, Yong Wang,
Identification of stem rust-associated candidate genes in wheat (Triticum aestivum L.) via RNA-sequencing, cluster analysis, weighted gene co-expression networks, and machine learning models,
Plant Physiology and Biochemistry,
Volume 229, Part D,
2025,
110701,
ISSN 0981-9428,
https://doi.org/10.1016/j.plaphy.2025.110701.
(https://www.sciencedirect.com/science/article/pii/S098194282501229X)
Abstract: Wheat is a staple crop that provides nutrients and carbohydrates, and its yield decreases due to pathogen attacks, particularly stem rust (Sr) caused by Puccinia graminis f. sp. tritici. The identification of Sr resistance (R) genes is particularly challenging due to the large and complex polyploid genome, as well as the small sample size, which limits breeding efforts for Sr disease resistance. Thus, this study integrated multiple RNA-seq datasets to identify the potential candidate genes for early and late disease response using meta-analysis, clustering, WGCNA, and machine learning models. As a result, 7883 and 742 meta-differentially expressed genes (meta-DEGs) were identified using the limma model for early and late Sr infection, respectively, with seven known genes (Lr67/Yr46/Sr55 and Sr57). Of these, 272 for early (93.01 % upregulated) and 22 for late (13.64 % upregulated) R-genes were differentially expressed, which belonged to the RLPs and RLKs classes that trigger immunity via microbial molecular patterns. The above meta-DEGs were clustered into five groups, designated as C1 to C5, with C5 associated with both early and late Sr responses. These genes were significantly enriched in defense, protein folding, oxidative stress responses, detoxification, cell wall modification, and glutathione metabolic process. Based on network analysis, 61 candidate hub genes (36 for early and 25 for late) were identified, which contribute to disease resistance through pathogen recognition and signaling. Finally, 15 potential candidate genes—e.g., TraesCS1D03G0990700 (CRK8), TraesCS3B03G0796000 (RLP52), TraesCS1B03G1216400 (RLK3), TraesCS4A03G0805200, TraesCS3B03G0757500 (PR3), TraesCS4B03G0104300—were mined via machine learning models (XGBoost AUC = 0.96) and checked for robustness via a single-gene model. The relative expression analysis of six out of 15 genes revealed significantly higher expression under early compared to late Sr response, as determined by a t-test with a p ≤ 0.05. Therefore, our study identified candidate genes under early and late Sr infection using a large dataset, bioinformatics tools, and supervised machine learning models that provide a genetic architecture to understand the molecular mechanisms of Sr disease resistance and an advanced pipeline for future studies.
Keywords: Wheat; Stem rust; RNA-Sequencing; Meta-differentially expressed genes; Machine learning; Disease resistance genes