Optimizing cultivation areas for Salvia leriifolia using advanced spatial prediction and hybrid machine learning algorithms for maximum bioactive compound yield

2026-01-23

Emran Dastres, Hassan Esmaeili, Ali Sonboli, Mohammad Hossein Mirjalili,
Optimizing cultivation areas for Salvia leriifolia using advanced spatial prediction and hybrid machine learning algorithms for maximum bioactive compound yield,
Smart Agricultural Technology,
Volume 12,
2025,
101482,
ISSN 2772-3755,
https://doi.org/10.1016/j.atech.2025.101482.
(https://www.sciencedirect.com/science/article/pii/S2772375525007130)
Abstract: This study investigated the transformative potential of integrating machine learning algorithms into spatial modeling to identify optimum cultivation areas of the Salvia leriifolia medicinal plant, focusing on maximizing secondary metabolite (abietatriene) production. By employing a novel combination of individual and hybrid machine learning models—including Random Forest (RF), Boosted Regression Trees (BRT), Support Vector Machines (SVM), and their hybrids (RF-BRT, SVM-BRT, SVM-RF, and RF-BRT-SVM)—the research offered an advanced approach to spatially predicting suitable cultivation sites in Razavi Khorasan province, Iran. A Geographic Information System (GIS) was employed to analyze 23 environmental factors after rigorous feature selection using Variance Inflation Factor (VIF) and Recursive Feature Elimination with Cross-Validation (RFECV), ensuring model robustness. Variable contributions were evaluated using Bagged CART, and key influencing factors, such as clay content (FR = 6.70), land use/cover (FR = 6.25), and electrical conductivity (EC) (FR = 5.41), were identified. Among the tested models, the RF-BRT hybrid achieved the highest predictive accuracy (RMSE = 0.011 and MAE = 0.007), with statistical significance confirmed through Friedman test followed by Nemenyi Critical Difference (CD) test. Furthermore, spatial uncertainty was assessed using prediction standard deviations and 95% prediction intervals, revealing that the RF-BRT model not only produced the most accurate but also the most stable predictions. These insights are crucial for enhancing the sustainable cultivation and conservation of S. leriifolia, particularly in the northern and northwestern regions of Razavi Khorasan, and provide a framework for informed agricultural practices. The broader implications extend to medicinal plant conservation, sustainable agriculture, and drug development, with recommendations for future research to explore physiological mechanisms and extend predictive models to other regions and species.
Keywords: Abietatriene; Spatial prediction models; Secondary metabolites; Environmental factors; Machine Learning Algorithms; Artificial Intelligence (AI)