An Automated Stacking Framework for Insurance Customer Profitability Prediction using Hybrid Transformer-Gradient Boosting Architectures

Malakouti Semnani, Amirhossein; Kordrostami, Sohrab; Refahi Sheikhani, Amirhossein; Moattar, Mohammad Hossein

doi:10.66224/iors.16.2.156

[صفحه اصلی ]

[Archive]

بخش‌های اصلی

صفحه اصلی

اطلاعات نشریه

آرشیو مجله و مقالات

نشانی

جستجو در پایگاه

دریافت اطلاعات پایگاه

آخرین مطالب بخش
:: راه‌اندازی پایگاه

جلد 16، شماره 2 - ( 6-1404 )

جلد 16 شماره 2 صفحات 188-156

برگشت به فهرست نسخه ها

An Automated Stacking Framework for Insurance Customer Profitability Prediction using Hybrid Transformer-Gradient Boosting Architectures

چکیده: (376 مشاهده)

Insurance companies face the critical challenge of identifying “good customers”—policyholders who consistently pay premiums with minimal or no claims—within large, heterogeneous datasets. This study proposes and evaluates a hybrid machine learning framework to predict good customer status using an enhanced insurance dataset that integrates demographic, financial, and policy-related features. The framework combines an XGBoost classifier, a soft-voting ensemble of RandomForest and LightGBM, and a custom Transformer Encoder, with all models tuned using the Optuna hyperparameter optimization library to enhance predictive accuracy and interpretability.
The methodology includes preprocessing steps such as categorical encoding and standardization of numerical variables (e.g., age, BMI, premium with GST), followed by a novel label engineering scheme that defines good customers as those whose premiums exceed the mean plus one standard deviation and have no claim history. The dataset is split into training (80%) and testing (20%) subsets. Two hybrid architectures are developed: Model A, which fuses the predicted probabilities from XGBoost and the Transformer Encoder using a 60–40 weighting, and Model B, which employs a soft-voting ensemble of RandomForest and LightGBM. Ablation studies quantify the contribution of each component, while performance is assessed using accuracy, AUC, F1-score, and Matthews Correlation Coefficient, supported by visual tools such as correlation heatmaps, ROC curves, and confusion matrices.
Experimental results show that Model A attains an accuracy of 0.8720 and an AUC of 0.9140, whereas Model B achieves an accuracy of 0.8850 and an AUC of 0.9260 after systematic hyperparameter tuning. Removing either the Transformer or XGBoost markedly degrades Model A, while omitting RandomForest or LightGBM leads to smaller performance drops in Model B, underscoring the value of ensemble diversity. Overall, the proposed framework provides a practical tool for insurance customer segmentation and profitability-oriented decision-making, and its open-source implementation facilitates replication, extension with additional features or larger datasets, and potential real-time deployment in operational insurance environments.

متن کامل [PDF 1595 kb] (292 دریافت)

نوع مطالعه: پژوهشی | موضوع مقاله: Computing and information management
دریافت: 1404/12/1 | پذیرش: 1405/1/3 | انتشار: 1405/2/23

ارسال نظر درباره این مقاله

‎ 10.66224/iors.16.2.156

بازنشر اطلاعات
	این مقاله تحت شرایط Creative Commons Attribution-NonCommercial 4.0 International License قابل بازنشر است.

جلد 16، شماره 2 - ( 6-1404 )

برگشت به فهرست نسخه ها

Persian site map - English site map - Created in 0.18 seconds with 38 queries by YEKTAWEB 4758