A machine learning framework for screening phenyl phthalimide derivatives as corrosion inhibitors based on dataset generated by DFT and molecular dynamics simulations

2025-12-23

Jaka Fajar Fatriansyah, Irma Hartia Tihara, Mohammad Rayhan Ramadano, Andreas Federico, Agrin Febrian Pradana, Siti Norasmah Surip, Nicolas Gascoin,
A machine learning framework for screening phenyl phthalimide derivatives as corrosion inhibitors based on dataset generated by DFT and molecular dynamics simulations,
Results in Engineering,
Volume 28,
2025,
107350,
ISSN 2590-1230,
https://doi.org/10.1016/j.rineng.2025.107350.
(https://www.sciencedirect.com/science/article/pii/S259012302503405X)
Abstract: Corrosion in metals, especially carbon steel, incurs substantial economic losses, estimated at 2.5 trillion USD per year, or 3.4% of the global GDP. This research examines the corrosion inhibition efficacy of 284 phenyl phthalimide derivatives through computational techniques and machine learning. The electronic structure properties, comprising EHOMO (mean: -6.36 eV, standard deviation: 0.49 eV), ELUMO (mean: -2.41 eV, standard deviation: 0.30 eV), and adsorption energy (mean: -171.23 kJ/mol, standard deviation: 27.50 kJ/mol), were determined through Density Functional Theory (DFT) and Molecular Dynamics (MD) simulations, and correlated with experimentally obtained inhibition efficiency. Various machine learning models, such as Artificial Neural Networks (ANN), K-Nearest Neighbors (KNN), Support Vector Machines (SVM), and XGBoost, were utilized, with ANN demonstrating the highest prediction accuracy, reaching R² values of 93.18% for EHOMO and 91.12% for ELUMO. The complex relationship between molecular descriptor and electronic properties makes ANN perform better than XGBoost and XTRees. SHAP and PFI studies of feature importance revealed that descriptors B06[C-N] (PFI score: 0.016) and qnmax (PFI score: 0.009) are essential for inhibitor efficacy. Correlation-based filtering for feature reduction revealed that models utilizing a diminished set of 1,078 descriptors (down from 5,627) maintained robust performance, attaining a R² score of 91.47% for EHOMO. These findings create a strong computational framework for the screening and optimization of corrosion inhibitors, minimizing reliance on experimental trials and facilitating cost-effective, scalable synthesis of customized inhibitors. Subsequent efforts will be focused on experimental validation and the implementation of this methodology in more extensive chemical contexts.