A machine learning framework for screening phenyl phthalimide derivatives as corrosion inhibitors based on dataset generated by DFT and molecular dynamics simulations
Jaka Fajar Fatriansyah, Irma Hartia Tihara, Mohammad Rayhan Ramadano, Andreas Federico, Agrin Febrian Pradana, Siti Norasmah Surip, Nicolas Gascoin,
A machine learning framework for screening phenyl phthalimide derivatives as corrosion inhibitors based on dataset generated by DFT and molecular dynamics simulations,
Results in Engineering,
Volume 28,
2025,
107350,
ISSN 2590-1230,
https://doi.org/10.1016/j.rineng.2025.107350.
(https://www.sciencedirect.com/science/article/pii/S259012302503405X)
Abstract: Corrosion in metals, especially carbon steel, incurs substantial economic losses, estimated at 2.5 trillion USD per year, or 3.4% of the global GDP. This research examines the corrosion inhibition efficacy of 284 phenyl phthalimide derivatives through computational techniques and machine learning. The electronic structure properties, comprising EHOMO (mean: -6.36 eV, standard deviation: 0.49 eV), ELUMO (mean: -2.41 eV, standard deviation: 0.30 eV), and adsorption energy (mean: -171.23 kJ/mol, standard deviation: 27.50 kJ/mol), were determined through Density Functional Theory (DFT) and Molecular Dynamics (MD) simulations, and correlated with experimentally obtained inhibition efficiency. Various machine learning models, such as Artificial Neural Networks (ANN), K-Nearest Neighbors (KNN), Support Vector Machines (SVM), and XGBoost, were utilized, with ANN demonstrating the highest prediction accuracy, reaching R² values of 93.18% for EHOMO and 91.12% for ELUMO. The complex relationship between molecular descriptor and electronic properties makes ANN perform better than XGBoost and XTRees. SHAP and PFI studies of feature importance revealed that descriptors B06[C-N] (PFI score: 0.016) and qnmax (PFI score: 0.009) are essential for inhibitor efficacy. Correlation-based filtering for feature reduction revealed that models utilizing a diminished set of 1,078 descriptors (down from 5,627) maintained robust performance, attaining a R² score of 91.47% for EHOMO. These findings create a strong computational framework for the screening and optimization of corrosion inhibitors, minimizing reliance on experimental trials and facilitating cost-effective, scalable synthesis of customized inhibitors. Subsequent efforts will be focused on experimental validation and the implementation of this methodology in more extensive chemical contexts.
Keywords: Corrosion inhibition; Density functional theory (DFT); Machine learning; Phenyl phthalimide derivatives; Structure-property relationship