For page with results refer to: https://msoczi.github.io/football_predictions/web/index.html
The aim of the project was to create a tool for predicting the results of league matches from the leading European leagues based on data prepared by myself.
The project was implemented from scratch, i.e. it included:
- collection of raw data on the basis of which it will be possible to create characteristics and then modeling
- creating variables based on i.a. time aggregates (last n matches), position in the table, team form, etc.
- calculate historical data for modeling
- building the target solution: XGBoost model with 3 classes. Then, based on the estimated probability, a decision tree was created, which in a simple, rule-based way predicts which team will win the match (or a possible draw)
- creating a script that downloads data about upcoming matches, creating model variables for given teams and prediction of the match result.
Raw data with match results are downloaded from https://www.football-data.co.uk.
The advantage of the approach is the ability to predict results from any league. But o far, it is possible to predict the results of the first league of the following countries:
Based on the raw data, I created the appropriate characteristics by myself. The full list of variables is available in the file: variables
The XGBoost model was built on a hand-prepared historical sample containing 7210 rows and 354 columns. As the objective function, multi:softprob
was used so that the model's output was the probability of assigning observations to each of the 3 classes of match result - H (Home), A (Away), D (Draw).
These probabilities were then used to build a simple decision tree (max_depth = 3
) that would allow to categorize individual observations in a rule-based manner, i.e. to predict the final result with simple rules. This procedure allowed for the generalization of the results in such a way that the draw was not too rare. Below is the sheme of decision tree.
Forecasts do not use bookmaker odds.
You can view the results on the site:
You can also clone the repository and use it with python.
How to use?
- Clone repository.
git clone https://github.com/msoczi/football_predictions
- Create and activate virtual environment for python.
# LINUX:
python3 -m venv football_preds
source football_preds/bin/activate
# WINDOWS:
python -m venv football_preds
football_preds/Scripts/activate
- Install required packages (in virtual environment!).
pip install -r requirements.txt
- Run the main_script.py from console.
python scripts/main_script.py <LEAGUE_NAME>
Then results will be saved to \output_tables
for league passed in the argument.