This university project aims to predict brain stroke occurrences using a publicly available dataset. Our objective is twofold: to replicate the methodologies and findings of the research paper "Stroke Risk Prediction with Machine Learning Techniques" and to implement an alternative version using best practices in machine learning and data analysis.
- Paper Replication: Adhering closely to the methodologies and approaches detailed in the referenced paper.
- Best Practices Implementation: Incorporating contemporary best practices to enhance model performance, robustness, and reproducibility.
- Data Source: Publicly available stroke prediction dataset from Kaggle.
- Machine Learning Techniques: Implementation of various ML algorithms including Random Forest, Naive Bayes, Logistic Regression, and more.
- Performance Metrics: Evaluation using AUC, precision, recall, F-measure, and accuracy.
To run the project, clone the repository and follow the instructions in the respective Jupyter notebooks:
- Paper Replication Version: NAML_project_paper_replica.ipynb
- Best Practices Version: NAML_project_best_practice.ipynb
For an in-depth understanding of our work, including methodology, experiments, and results, please refer to our comprehensive report.
Report: Brain_Stroke_Prediction_Project.pdf
- Members: Cavallini Sara, Eusebio Alberto
- University: Politecnico di Milano
- Course: Numerical Analysis for Machine Learning (NAML)