Iranian Journal of Operations Research

[Home ] [Archive]

Main Menu

Home

Journal Information

Articles archive

Submission Instruction

Registration

Submit article

Site Facilities

Contact us

Google Scholar

Search in website

Receive site information

Search published articles

Showing 1 results for Moattar

An Automated Stacking Framework for Insurance Customer Profitability Prediction using Hybrid Transformer-Gradient Boosting Architectures

Amirhossein Malakouti Semnani, Sohrab Kordrostami, Amirhossein Refahi Sheikhani, Mohammad Hossein Moattar,
Volume 16, Issue 2 (8-2025)

Abstract

Insurance companies face the critical challenge of identifying “good customers”—policyholders who consistently pay premiums with minimal or no claims—within large, heterogeneous datasets. This study proposes and evaluates a hybrid machine learning framework to predict good customer status using an enhanced insurance dataset that integrates demographic, financial, and policy-related features. The framework combines an XGBoost classifier, a soft-voting ensemble of RandomForest and LightGBM, and a custom Transformer Encoder, with all models tuned using the Optuna hyperparameter optimization library to enhance predictive accuracy and interpretability.
The methodology includes preprocessing steps such as categorical encoding and standardization of numerical variables (e.g., age, BMI, premium with GST), followed by a novel label engineering scheme that defines good customers as those whose premiums exceed the mean plus one standard deviation and have no claim history. The dataset is split into training (80%) and testing (20%) subsets. Two hybrid architectures are developed: Model A, which fuses the predicted probabilities from XGBoost and the Transformer Encoder using a 60–40 weighting, and Model B, which employs a soft-voting ensemble of RandomForest and LightGBM. Ablation studies quantify the contribution of each component, while performance is assessed using accuracy, AUC, F1-score, and Matthews Correlation Coefficient, supported by visual tools such as correlation heatmaps, ROC curves, and confusion matrices.
Experimental results show that Model A attains an accuracy of 0.8720 and an AUC of 0.9140, whereas Model B achieves an accuracy of 0.8850 and an AUC of 0.9260 after systematic hyperparameter tuning. Removing either the Transformer or XGBoost markedly degrades Model A, while omitting RandomForest or LightGBM leads to smaller performance drops in Model B, underscoring the value of ensemble diversity. Overall, the proposed framework provides a practical tool for insurance customer segmentation and profitability-oriented decision-making, and its open-source implementation facilitates replication, extension with additional features or larger datasets, and potential real-time deployment in operational insurance environments.

Page 1 from 1

Persian site map - English site map - Created in 0.06 seconds with 27 queries by YEKTAWEB 4741