A high performance assimilation of surface soil moisture based on a hybrid framework of machine learning and physical hydrological model
Shuang Zhu, Gang Zha, Qi Wang, Siyu Ma, Hui Qin,
A high performance assimilation of surface soil moisture based on a hybrid framework of machine learning and physical hydrological model,
Journal of Hydrology,
2025,
134513,
ISSN 0022-1694,
https://doi.org/10.1016/j.jhydrol.2025.134513.
(https://www.sciencedirect.com/science/article/pii/S0022169425018530)
Abstract: Surface soil moisture is a key variable in hydrology, agriculture, climate, and ecological research, playing a critical regulatory role in land surface evapotranspiration and ecological processes. Obtaining high-precision, spatiotemporally continuous, and large-scale surface soil moisture data remains a major research focus and challenge. Currently, data-driven machine learning methods for surface soil moisture retrieval face difficulties in balancing spatial coverage and prediction accuracy, primarily due to poor data quality at target sites and the absence of physical parameter constraints. These methods also suffer from limited robustness, poor regional transferability, and insufficient interpretability of the results. To address these limitations, this study proposes a coupled framework that integrates multi-source data, data-driven methods, and physics-based hydrological models to generate a high-precision, wide-coverage, and spatiotemporally continuous surface soil moisture dataset. Firstly, Spatial clustering is applied to identify observation sites with similar characteristics to the target region from a source domain of 986 stations, enabling effective data transfer and sample size augmentation. Secondly, the generalized refractive mixing dielectric model is introduced to enhance the machine learning models, improve retrieval accuracy and reduce the occurrence of outliers. Then, the spatiotemporal patterns of soil moisture learned from the machine learning model are further optimized by the variable infiltration capacity hydrological model using a specially designed cost function to ensure that simulated SSM values reflect both real-world spatial distribution and the dynamic characteristics of physical hydrological processes. Finally, two-dimensional data assimilation is performed to estimate soil moisture errors in regions without ground observations. Compared to SMAP L4, the new method significantly improved all metrics: bias was reduced by 91.5 %, RMSE by 51.3 %, and MRE by 66.5 %. Furthermore, the R2 increased dramatically from −0.078 to 0.736.
Keywords: Surface soil moisture; Machine learning; Data augmentation; Physics-based model; Loss function optimization