This project leverages machine learning to assist medical insurance companies in accurately predicting healthcare costs for individuals. By analyzing key factors such as age, BMI, smoking habits, and region, the model helps in determining fair and balanced insurance premiums.
-
Data Preprocessing:
- Handles missing values with appropriate imputations.
- Standardizes numerical features and applies one-hot encoding to categorical features.
-
Machine Learning Models:
- Implements and evaluates multiple regression models including Linear Regression, Random Forest, Gradient Boosting, XGBoost, CatBoost, and AdaBoost.
- Hyperparameter tuning for optimal performance.
-
Prediction Pipeline:
- Seamless data flow from input preprocessing to final predictions.
- Supports dynamic user inputs for real-time premium estimation.
-
Streamlit Application:
- Interactive and user-friendly web interface for predicting medical costs.
- Input parameters include age, BMI, number of children, smoking status, and region.
- Displays predicted healthcare costs in an intuitive format.
- Languages: Python
- Frameworks: Streamlit
- Libraries: Pandas, NumPy, Scikit-learn, XGBoost, CatBoost, Matplotlib
- Tools: dill for model serialization
The dataset includes:
- Age, BMI, and number of children as numerical features.
- Smoking status, gender, and region as categorical features.
- Healthcare charges as the target variable.
-
Clone the repository:
git clone https://github.com/Rithish5513U/Medical-Cost-Prediction.git
-
Navigate to the project directory:
cd Medical-Cost-Prediction
-
Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install the required dependencies:
pip install -r requirements.txt
-
Run the Streamlit application:
streamlit run app.py
- Open the Streamlit application in your browser.
- Enter the required input details:
- Age
- BMI
- Number of children
- Smoking status
- Region
- Click "Predict" to view the estimated healthcare cost.
The application provides:
- Predicted healthcare costs based on the user inputs.
- Insights into the contribution of various factors to the overall cost.
Developed with ❤️ by Rithish S