Can we use APSIM’s embedded expert knowledge to ‘train’ a machine learning model to identify, at high resolution, land suitable to grow lucerne (Medicago sativa L.) using simplified climate data?

2026-01-10

Jing Guo, Teng Fei, Xiumei Yang, Linda Lilburne, Derrick Moot, Brent Martin, Edmar Teixeira, Man Yang,
Can we use APSIM’s embedded expert knowledge to ‘train’ a machine learning model to identify, at high resolution, land suitable to grow lucerne (Medicago sativa L.) using simplified climate data?,
European Journal of Agronomy,
Volume 171,
2025,
127815,
ISSN 1161-0301,
https://doi.org/10.1016/j.eja.2025.127815.
(https://www.sciencedirect.com/science/article/pii/S1161030125003119)
Abstract: Using low-producing resident pasture lucerne (Medicago sativa L.) is a proven method to increase pastoral on-farm productivity, and minimize environment impacts. Locating suitable areas for lucerne production requires land suitability evaluation. This proof-of-concept study develops machine-learning (ML) models for land suitability that learn from predictions of lucerne dry matter yields from the Agricultural Production Systems Simulator (APSIM) Next Gen lucerne model. This detailed mechanistic model was used to simulate lucerne crop growth using spatially-coarse-resolution daily-climate data and three generic soil types. The ML models are developed using the resulting estimates of lucerne production but with spatially-high-resolution annually-aggregated climate data. The aim is to test if this much simplified climate data in combination with ML modeling could be used for efficient land evaluation. The second aim is to investigate the interpretability of the ML models. Four ML models, Multiple Linear Regression (MLR), Random Forest (RF), Gradient Boosting Machines (GBM), and Artificial Neural Network (ANN), were trained and validated using integrated annual climate data. ML models consistently achieved higher overall performance with two types of generic loam soil (MEC > 0.8, RMSE < 2 tons/ha), than simulations with generic stony sand soil (MEC > 0.6, RMSE < 3 tons/ha). Using Shapley Additive exPlanations (SHAP), the predictors contribution to the models and the plant-environment interactions were uncovered from the ‘black-box’ ML models. Complex ML models (RF, GBM, and ANN), revealed the nonlinear interaction between lucerne production and underlying climate, which the linear model (MLR) was unable to capture. The trained ML model achieved high accuracy (MEC > 0.89, RMSE < 1.3 tons/ha) when predicting with independent high spatial resolution (100 m) annual climate data. The model discovered altitude effects on lucerne production by showing the strong correlation (R > 0.84, p < 0.001) between the two. Our study demonstrated that lightweight ML models can effectively assess lucerne land suitability at 100 m resolution, while also acknowledging limitations from generic soils, single-cultivar calibration, and model performance, with suggestions for future improvement.
Keywords: Alfalfa; Machine learning; Land suitability; Medicago sativa; APSIM; SHAP