Data-driven optimization of polycyclic aromatic hydrocarbons removal by organic composites from aquatic environments: Integrating machine learning with theoretical calculations

2025-11-30

Zhengwen Wei, Wei Wang, Giuseppe Mele, Xiang-fei Lü, Wankui Ni, Zhen-Yi Jiang,
Data-driven optimization of polycyclic aromatic hydrocarbons removal by organic composites from aquatic environments: Integrating machine learning with theoretical calculations,
Journal of Cleaner Production,
Volume 529,
2025,
146803,
ISSN 0959-6526,
https://doi.org/10.1016/j.jclepro.2025.146803.
(https://www.sciencedirect.com/science/article/pii/S0959652625021535)
Abstract: The application of novel and functional organic composites for efficient removing polycyclic aromatic hydrocarbons (PAHs) from aquatic environments offers a promising strategy for mitigating their associated environmental and health risks. Nevertheless, the advancement of high-efficiency organic composites and the refinement of reaction parameters remain significantly constrained by time-consuming, costly experimental approaches and the inherent complexity of real wastewater systems. To address these challenges, a machine learning framework is established in this study to enable the targeted design of advanced adsorbent composites and the systematic adjustment of operational parameters. A comprehensive dataset (over 700 adsorption cases with 12 physicochemical descriptors) was constructed from literature reports on PAHs adsorption using organic composites. Multiple data-driven models were utilized to assess the influence of adsorbent physicochemical properties, PAHs characteristics, and reaction conditions on adsorption performance. The superior fitting performance of the XGBoost-CMAES model (R2 = 0.9615) highlights the substantial benefits of customized hyperparameter tuning in enhancing the predictive accuracy. Model interpretability was enhanced using Shapley additive explanations values and partial dependence plots, which revealed the dominant roles of operational parameters and material features, including initial concentration, specific surface area, and contact time, and highlighted conditional dependencies related to molecular structure and environmental descriptors. Furthermore, density functional theory calculations were integrated to validate and strengthen the interpretability of the machine learning derived insights. This integrated, data-driven and theory-supported approach provides novel perspectives into the systematic development of organic composites and contributes to the broader advancement of adsorption technologies in complex environmental systems.
Keywords: Polycyclic aromatic hydrocarbons; Water pollution control; Machine learning; DFT calculations; Environmental remediation