This project was inspired by the Shell AI Hackathon 2023 - Waste to Energy
This project leverages Machine Learning and Optimization techniques to minimize the cost of biofuel production in Gujarat, India. The project involves forecasting biomass annual production at 2418 harvesting sites using machine learning models and selecting optimal depot locations using a hybrid Simulated Annealing and Variable Neighborhood Search algorithm. The primary goal is to minimize the total cost of biofuel production optimizing the supply chain, which includes the cost of biomass collection and transportation.
-
dataset/
: Contains all the data files used in the project.Biomass_History.csv
: Historical biomass data.Distance_Matrix.csv
: Distance matrix between sites.biomass_long.csv
: Long format biomass data.districts.csv
: District information.cropland/
: Cropland data.elevation/
: Elevation data download links.rainfall/
: Rainfall data download links.
-
docs/
: Contains the detailed problem statement.detailed-problem-statement.pdf
: Detailed problem statement document.
-
feature_engineering/
: Scripts and notebooks for feature engineering.cropland.ipynb
,cropland.py
: Cropland feature engineering.district.py
: District feature engineering.elevation.py
: Elevation feature engineering.fe_utils.py
: Utility functions for feature engineering.rainfall.py
: Rainfall feature engineering.
-
modelling/
: Scripts for model training, testing, tuning, and plotting.plotting.py
: Plotting functions.test.py
: Testing scripts.training.py
: Training scripts.tuning.py
: Hyperparameter tuning scripts.logs/
: Contains results logs.results.csv
: Results of model training and testing.
-
models/
: Trained models and related data.full_data.csv
: Full dataset used for training.xgb_best.pkl
: Best trained model (XGBoost).
-
optimization/
: Scripts for optimization algorithms.cost_functions.py
: Cost functions used in optimization.optimization.py
: Implementation of the optimization algorithm.
-
scripts/
: Miscellaneous scripts.add_districts.ipynb
: Script of the district data.add_precipitation_rate.ipynb
: Script of the precipitation rate data.download_parallel.py
: Script for parallel data downloading.
-
utils.py
: Utility functions.
- Gathered historical biomass data, cropland data, elevation data, and rainfall data.
Trained various machine learning models to forecast biomass at the 2418 harvesting sites. The XGBoost model was found to be the best performing. The model pipeline consists in OrdinalEncoder on the categorical variables and a median imputer.
Year | Test MAE |
---|---|
2011 | 28.4 |
2012 | 58.03 |
2013 | 43.15 |
2014 | 31.56 |
2015 | 30.82 |
2016 | 48.03 |
2017 | 38.41 |
2018 | 32.05 |
2019 | 36.85 |
Avg | 38.57 |
- Problem Definition: Selecting depot locations to minimize total cost, including biomass collection and transportation costs.
- Algorithm Design: Developed a hybrid algorithm combining Simulated Annealing and Variable Neighborhood Search to find the optimal depot locations.
- Implementation: Implemented the algorithm in
optimization/optimization.py
. - Results: The algorithm successfully identified optimal depot locations, resulting in significant cost savings.
- The XGBoost model achieved satisfactory error in forecasting biomass availability, with a MAE of 32.05 in 2018 and 36.85 in 2019.
- The hybrid optimization algorithm reduced the total cost of biofuel production by 34.45% for 2018 and 34.86% for 2019 compared to greedy methods.
Year | Greedy cost | Best cost | Reduction (%) |
---|---|---|---|
2018 | 51879.16 | 34006.25 | 34.45 |
2019 | 50872.53 | 33137.04 | 34.86 |
- Cost Reduction: The project significantly reduced the total cost of biofuel production, enhancing the economic viability of biofuel as an alternative energy source.
- Efficiency Improvement: Optimized depot locations led to more efficient logistics, reducing transportation costs and time.
- Scalability: The methodologies developed in this project can be scaled to other regions and similar logistics problems in the biofuel industry.