Hybrid machine learning model for the prediction of anaemia
Rabia Omar Said, Mahadia Tunga,
Hybrid machine learning model for the prediction of anaemia,
Machine Learning with Applications,
Volume 22,
2025,
100741,
ISSN 2666-8270,
https://doi.org/10.1016/j.mlwa.2025.100741.
(https://www.sciencedirect.com/science/article/pii/S2666827025001240)
Abstract: In developing countries like Tanzania, despite the national intervention, the proportion of anaemic children aged 6–59 months is seen to be high, with a prevalence of 59 %. Traditional methods, such as examining paleness in the eyes and tongue, are commonly used, but are subjective and often lead to delayed or missed diagnoses. While existing Machine learning models have attempted to predict anaemia in children and offer improved accuracy, many rely on single-model strategies and a default threshold of 0.5, which tends to favour sensitivity over specificity, leading to a high number of false positives. The study used the supervised machine learning approach within the CRISP-DM framework. A stacked hybrid approach was used, integrating Random Forest (RF) and Artificial Neural Network (ANN) as base models, using XGBoost as a meta-learner. The model's performance was evaluated using metrics such as accuracy, sensitivity, specificity, precision, and Area Under the Curve (AUC) with 95 % confidence intervals (CIs) across thresholds of 0.35, 0.4, 0.45, and 0.5, optimized using Youden’s J index. The hybrid model achieved balanced performance, especially at a 0.4 threshold with a sensitivity of 0.861 and a specificity of 0.880. Compared to standalone models, the hybrid approach outperformed in reducing false positives and false negatives, offering greater reliability and clinical safety. This study concludes that the stacking ensemble approach, along with threshold optimization, provides an effective solution for early detection of anaemia in children. Integration of hybrid Machine Learning models into Tanzania’s health screening programs could improve child health outcomes.
Keywords: Anaemia; Hybrid machine learning models; Stacking ensemble; Threshold optimization