Machine learning for adsorption-related parameters prediction of electronic specialty gases: DFT-based dataset construction and balanced data augmentation

2025-11-30

Zhikang Wu, Ying Wu, Guang Miao, Runze Chen, Lingjun Ma, Hongxia Xi, Jing Xiao,
Machine learning for adsorption-related parameters prediction of electronic specialty gases: DFT-based dataset construction and balanced data augmentation,
Chinese Journal of Chemical Engineering,
2025,
,
ISSN 1004-9541,
https://doi.org/10.1016/j.cjche.2025.09.012.
(https://www.sciencedirect.com/science/article/pii/S1004954125003726)
Abstract: Electronic specialty gases play vital roles in key chip manufacturing processes like lithography, etching, deposition and cleaning. While their ultra-high purity (≥99.999%) creates challenging separation requirements, insufficient physicochemical data has hindered adsorbent development. To bridge this gap, we constructed a multidimensional database covering 101 semiconductor-related molecules with 19 physical parameters, and developed a Bayesian regression-based collaborative prediction model demonstrating high accuracy (R2=0.95–0.97) on test sets. We further constructed the balanced data-augmented Transformer-based molecular property prediction (BD-TMPP) model to address the overfitting problem in small-sample learning. This model achieves the end-to-end prediction of molecular quadrupole moment (R2=0.99), and polarizability (R2=0.98) via the capture of interatomic spatial correlations. Compared with traditional density functional theory calculations, the model achieves a five-orders-of-magnitude improvement in computational efficiency while maintaining accuracy, demonstrating a successful application of the "structure-property relationship" theory in chemical machine learning.
Keywords: Molecular property database; Small sample machine learning; Data augmentation; Molecular property prediction; Adsorption