Malaysian Journal of Mathematical Sciences, December 2025, Vol. 19, No. 4


Gradient-Boosting Classifier with Shapley Additive Explanations for Analysis of Obesity Risk in Saudi Arabia

Alanazi, W. M. T., Abu Bakar, M. A., and You, H. W.

Corresponding Email: wadhatgfsjh@gmail.com

Received date: 27 February 2025
Accepted date: 14 May 2025

Abstract:
Obesity remains a critical public health issue in the Kingdom of Saudi Arabia (KSA), necessitating accurate and interpretable predictive tools. This study proposes a Gradient-Boosted Decision Tree (GBDT-XGBoost) model integrated with SHapley Additive exPlanations (SHAP) for obesity risk classification. Trained on demographic and lifestyle data, the model achieved a training accuracy of $95.8\%$, testing accuracy of $93.1\%$, precision of $94.2\%$, recall of $92.8\%$, F1-score of $93.5\%$, AUC of $0.963$, RMSE of $0.32$ and AIC of $1,205$, with a training time of $58.2$ seconds. Comparative analysis showed XGBoost outperformed Artificial Neural Networks (ANN), Random Forest (RF) and Support Vector Regression (SVR) across all metrics. Although ANN achieved competitive results (test accuracy: $93.8\%$, RMSE: $0.35$), it required over three times longer training time. SHAP analysis identified BMI $(0.42)$, age $(0.28)$ and dietary habits $(0.16)$ as the most influential features. SHAP plots also revealed that vegetable consumption (FCVC) significantly impacted predictions and age negatively correlated with risk. The XGBoost-SHAP framework offers a high-performance, interpretable and scalable solution for obesity risk prediction. Its integration of transparent feature attribution supports its utility in clinical decision-making and public health strategies tailored to the Saudi population.

Keywords: obesity risk prediction; machine learning; XGBoost; SHAP analysis; healthcare analytic