Predicting negative self-rated oral health in adults using machine learning: A longitudinal study in Southern Brazil
Cinthia Fonseca Araujo, Felipe Mendes Delpino, LĂlian Munhoz Figueiredo, Alexandre Dias Porto Chiavegatto Filho, Bruno Pereira Nunes, Helena Silveira Schuch, Flavio Fernando Demarco,
Predicting negative self-rated oral health in adults using machine learning: A longitudinal study in Southern Brazil,
Journal of Dentistry,
Volume 163,
2025,
106164,
ISSN 0300-5712,
https://doi.org/10.1016/j.jdent.2025.106164.
(https://www.sciencedirect.com/science/article/pii/S0300571225006104)
Abstract: Objective
This study aims to develop and evaluate the performance of machine learning models to predict the occurrence of negative self-rated oral health (SROH) among adults.
Methods
Data were collected through a longitudinal population-based survey conducted in Pelotas, Southern Brazil. The analysis included 3,461 participants with complete data at both baseline and follow-up. Predictors were collected at baseline and encompassed 46 sociodemographic, behavioral, general, and oral health characteristics. The outcome of interest was negative SROH. Data analysis was conducted using Python. The database was divided into training (70%) and testing (30%). The performance of five machine learning algorithms - Random Forest, LightGBM, CatBoost, XGBoost, and TabPFN - was evaluated according to the area under the ROC curve. Additional performance metrics included accuracy, precision, recall, and F1-score. The contribution of each predictor was assessed using Shapley values.
Results
Negative self-rated oral health was reported by 571 individuals (16.6%). The models achieved a performance between 0.671 to 0.715 according to the AUC-ROC, with TabPFN demonstrating the best performance. The most important predictors according to Shapley values were ABEP index scores (socioeconomic indicator), type of dental service used, age, General Anxiety Disorder (GAD-7) scores, and overall life satisfaction.
Conclusions
The machine learning models developed in this study demonstrated a reasonable performance in identifying individuals with negative self-rated oral health. However, they require further refinement to ensure practical applicability in real-world settings, considering their current limitations.
Clinical Significance
Our findings highlighted the potential of using machine learning to predict subjective oral health conditions. This model should be improved to make it feasible for real-world implementation, considering its limitations. The correct identification of individuals with negative self-rated oral health may support the development of targeted strategies focused on these at-risk groups.
Keywords: Artificial intelligence; Machine learning; Oral health; Dentistry