SMOTE-Optimized Machine Learning Framework for Predicting Retention in Workforce Development Training
Abdulaziz Alshahrani,
SMOTE-Optimized Machine Learning Framework for Predicting Retention in Workforce Development Training,
Computers, Materials and Continua,
Volume 85, Issue 2,
2025,
Pages 4067-4090,
ISSN 1546-2218,
https://doi.org/10.32604/cmc.2025.065211.
(https://www.sciencedirect.com/science/article/pii/S1546221825008471)
Abstract: High dropout rates in short-term job skills training programs hinder workforce development. This study applies machine learning to predict program completion while addressing class imbalance challenges. A dataset of 6548 records with 24 demographic, educational, program-specific, and employment-related features was analyzed. Data preprocessing involved cleaning, encoding categorical variables, and balancing the dataset using the Synthetic Minority Oversampling Technique (SMOTE), as only 15.9% of participants were dropouts. six machine learning models—Logistic Regression, Random Forest, Support Vector Machine, K-Nearest Neighbors, Naïve Bayes, and XGBoost—were evaluated on both balanced and unbalanced datasets using an 80-20 train-test split. Performance was assessed using Accuracy, Precision, Recall, F1-score, and ROC-AUC. XGBoost achieved the highest performance on the balanced dataset, with an F1-score of 0.9200 and a ROC-AUC of 0.9684, followed by Random Forest. These findings highlight the potential of machine learning for early identification of dropout trainees, aiding in retention strategies for workforce training. The results support the integration of predictive analytics to optimize intervention efforts in short-term training programs.
Keywords: Predictive analytics; workforce training; machine learning; SMOTE