PhishHunter-XLD: An ensemble approach integrating machine learning and deep learning for phishing URL classification
Tirth Doshi, Vishva Patel, Nemil Shah, Debabrata Swain, Debabala Swain, Biswaranjan Acharya,
PhishHunter-XLD: An ensemble approach integrating machine learning and deep learning for phishing URL classification,
Franklin Open,
Volume 12,
2025,
100349,
ISSN 2773-1863,
https://doi.org/10.1016/j.fraope.2025.100349.
(https://www.sciencedirect.com/science/article/pii/S2773186325001379)
Abstract: Phishing continues to pose a significant cybersecurity threat by deceiving users into disclosing sensitive information through maliciously crafted URLs. Traditional detection methods, including blacklists and heuristic analyses, have proven inadequate against evolving phishing techniques due to their reliance on static patterns and manual updates. In this study, a weighted voting ensemble framework has been proposed, integrating semantic feature extraction using DistilBERT with classical machine learning classifiers (XGBoost) and deep learning models (LSTM) to enhance phishing URL detection. Model complementarity has been leveraged XGBoost captures explicit lexical features, LSTM models sequential dependencies, and DistilBERT extracts contextual semantics resulting in an adaptive decision boundary that improves generalization and reduces false positives. Extensive experiments conducted on large-scale benchmark datasets, such as the “Phishing Site URLs” and “Malicious URLs” datasets, have demonstrated that the proposed ensemble framework achieves a detection accuracy of 99.83% with low computational latency. Furthermore, the system has been deployed via Streamlit, providing a real time, interactive interface for cybersecurity practitioners. Future work will explore optimization strategies, including model pruning, quantization, and adversarial training, to further enhance efficiency, scalability, and resilience against emerging zero-day phishing techniques.
Keywords: Cybersecurity; Phishing detection; Deep learning; Machine learning; Transformer models; BERT; DistilBERT