Skip to content

A Kaggle competition project predicting customer responses to insurance offers using XGBoost, focusing on feature engineering, visualization, and robust evaluation metrics.

Notifications You must be signed in to change notification settings

AnnaAnastasy/Insurance-Cross-Selling-XGBoost

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

Boosting Insights: Insurance Cross-Selling with XGBoost

Project Overview

This project aims to predict customer responses to automobile insurance offers using machine learning techniques. Utilizing Kaggle's synthetic dataset, we explore data visualization, preprocessing, and modeling strategies to optimize performance on this binary classification problem.

Dataset Description

  • Competition: Kaggle Playground Series 2024
  • Objective: Predict the probability of a customer responding positively to an automobile insurance offer.
  • Evaluation Metric: Area Under the ROC Curve (ROC-AUC).
  • Synthetic Data: Designed to mimic real-world data while preserving privacy.

Steps

  • Library Installation: Importing essential Python libraries.
  • Data Exploration: Inspecting data structure, identifying patterns, and analyzing distributions.
  • Feature Engineering: Encoding categorical variables, scaling numerical features, and handling missing values.
  • Model Development: Implementing and fine-tuning an XGBoost model.
  • Visualization: Creating insightful plots for understanding feature relationships and model performance.
  • Submission: Preparing and validating the final submission file.

Tools & Libraries

  • Programming Language: Python
  • Libraries: Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn, XGBoost, Plotly

How to Use

  • Clone the repository and navigate to the project directory.
  • Install the required libraries:
pip install pandas numpy matplotlib seaborn scikit-learn xgboost plotly
  • Download the dataset from the Kaggle competition page and place it in the same directory as the project.
  • Open the notebook and execute cells sequentially to reproduce the results.

Results

The XGBoost model achieved a strong performance with an ROC-AUC score of 0.886 on the test set, indicating reliable predictions.

Author

Anna Balatska - Kaggle Grandmaster | Data Scientist | Machine Learning Enthusiast