Performance comparison of sampling techniques with machine learning algorithms for churn prediction in telecommunication

2025-11-08

B. Shunmuga Priya, G. Chitra, R. Ramalakshmi,
Performance comparison of sampling techniques with machine learning algorithms for churn prediction in telecommunication,
Franklin Open,
Volume 13,
2025,
100402,
ISSN 2773-1863,
https://doi.org/10.1016/j.fraope.2025.100402.
(https://www.sciencedirect.com/science/article/pii/S2773186325001902)
Abstract: The telecommunications industry faces intense competition in customer retention, making churn prediction a key area of research in machine learning. A systematic churn prediction model is essential, as large datasets, high-dimensional features, and class imbalance can hinder prediction performance. This study proposes a novel churn prediction model using machine learning techniques, evaluated on a dataset of 7043 telecom customers. It explores various sampling techniques such as the Synthetic Minority Oversampling Technique (SMOTE), Adaptive Synthetic Sampling (ADASYN), and Conditional Tabular Generative Adversarial Network (CTGAN) combined with machine learning models to address class imbalance in churn prediction. Experimental results demonstrate that CTGAN, when paired with a Weighted Random Forest (WRF) classifier, consistently outperforms other methods, achieving a remarkable accuracy of 99.79%, along with strong performance in terms of precision, recall, F1-score, and AUC. These findings highlight CTGAN’s effectiveness in addressing dataset imbalance and enhancing the generalizability of churn prediction models, offering a valuable solution for the telecommunications industry.
Keywords: Customer retention; Churn prediction; Telecommunications industry; Machine learning; Sampling techniques; Class Imbalance; CTGAN; GANs; Data Augmentation; Weighted random forest