Merge pull request #57 from UBC-MDS/makefile_update

Makefile related changes
UBC-MDS · Dec 3, 2022 · ca2b396 · ca2b396
2 parents 35d2d79 + b2f90a6
commit ca2b396
Show file tree

Hide file tree

Showing 3 changed files with 74 additions and 18 deletions.
diff --git a/Makefile b/Makefile
@@ -1,11 +1,15 @@
-# Makefile
+# Dropout Prediction Makefile
 # Author: Caesar Wong
 # Date: 2022-11-29
-
+# 
 # ** please activate the conda environment before using the Make
+# This is a makefile for the dropout prediction project, there are 3 ways to run the make file:
+# 1. `make all`: generate the HTML report and run all the required dependencies files / programs.
+# 2. `make <target files>`: only run the specified target files.
+# 3. `make clean`: clean all the generated files, images, and report files.
+
 
 all: doc/The_Report_of_Dropout_Prediction.html 
-#data/raw/data.csv data/processed/train_eda.csv results/target_count_bar_plot.png results/correlation_heatmap_plot.pn results/correlation_with_target_plot.png results/gender_density_plot.png
 
 # download data
 data/raw/data.csv: src/download_data.py

diff --git a/README.md b/README.md
@@ -41,24 +41,74 @@ The hyperparameters of the aforementioned models are optimized using cross-valid
 
 The EDA performed can be found in the [dropout_pred_EDA.pdf](https://github.com/UBC-MDS/dropout-predictions/blob/main/src/dropout_pred_EDA.pdf).
 
-## Data Analysis Pipeline
+# Data Analysis Pipeline
 
 In this project, we adopt the following data analysis pipeline. First of all, we dowload and preprocess the raw data. After splitting and storing the required data files, we use the `train_eda.csv` as the input of `general_EDA.py`, `train.csv` for `model_training.py`, and `testing.py` for `model_result.py`.
 
 ![plot](doc/data_analysis_pipeline.png)
 
-## Usage
+# Usage
 
-To replicate the analysis, clone [this](https://github.com/UBC-MDS/dropout-predictions.git) GitHub repository, install the
-conda environment listed in [here](https://github.com/UBC-MDS/dropout-predictions/blob/main/env/dropout_pred_env.yml) 
-> `conda env create -f env/dropout_pred_env.yml`
+There are different ways to replicate the analysis.
 
-activate the environment 
-> `conda activate dropout_pred_env`
 
-and run the following commands `bash data_analysis_pipeline.sh` under `src` folder:
+1. Clone [this](https://github.com/UBC-MDS/dropout-predictions.git) GitHub repository
 
-
+```
+git clone https://github.com/UBC-MDS/dropout-predictions.git
+```
+
+2. Navigate to the GitHub repository
+
+```
+cd dropout-predictions
+```
+
+3. Install the conda environment listed in [here](https://github.com/UBC-MDS/dropout-predictions/blob/main/env/dropout_pred_env.yml) 
+
+```
+conda env create -f env/dropout_pred_env.yml
+```
+
+4. Activate the environment 
+
+```
+conda activate dropout_pred_env
+```
+
+We can either use the [Makefile](#makefile) or [Shell Script](#shell-script) to run the analysis.
+
+## Makefile
+
+### Run All
+
+To run the whole analysis, run the following command in the root directory:
+
+```
+make all
+```
+
+It will check whether the [final report](doc/The_Report_of_Dropout_Prediction.html) exists or not. If the final report does not exist, the Makefile will run all the dependencies required to generate the report.
+
+### Clean Files
+
+To clean the intermediate and final results including images, CSV files and report, run the following command in the root directory:
+
+```
+make clean
+```
+
+It will clean all the files under `data/raw/`, `results/`, and all the CSV files under `data/processed/`.
+
+## Shell Script
+
+After activating the Conda environment, run the following command under the `src` folder.
+
+```
+bash data_analysis_pipeline.sh
+```
+
+[Shell Script](src/data_analysis_pipeline.sh) content:
 
     <<comment
     This shell script will include all the script running required to reproduce the dropout prediction analysis.
@@ -80,17 +130,16 @@ and run the following commands `bash data_analysis_pipeline.sh` under `src` fold
     # model testing
     python model_result.py --test="../data/processed/test.csv" --out_dir="../results/"
 
-> `conda deactivate`
-
-> `Rscript -e 'rmarkdown::render("../doc/The_Report_of_Dropout_Prediction.Rmd")'`
+    # report generation
+    Rscript -e 'rmarkdown::render("../doc/The_Report_of_Dropout_Prediction.Rmd")'
 
 
-## License
+# License
 
 The Student Dropout Predictor materials here are licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. This allows for the sharing and adaptation of the datasets for our purpose of academic study and understanding, with the appropriate credit given.
 
 
-## References
+# References
 
 - Realinho,Valentim, Vieira Martins,Mónica, Machado,Jorge & Baptista,Luís. (2021). Predict students' dropout and academic success. UCI Machine Learning Repository. https://archive-beta.ics.uci.edu/dataset/697/predict+students+dropout+and+academic+success
 
diff --git a/src/data_analysis_pipeline.sh b/src/data_analysis_pipeline.sh
@@ -19,4 +19,7 @@ python general_EDA.py --input_path="../data/processed/train_eda.csv" --output_pa
 python model_training.py --train="../data/processed/train.csv" --scoring_metrics="recall" --out_dir="../results/"
 
 # model testing
-python model_result.py --test="../data/processed/test.csv" --out_dir="../results/"
+python model_result.py --test="../data/processed/test.csv" --out_dir="../results/"
+
+# report generation
+Rscript -e 'rmarkdown::render("../doc/The_Report_of_Dropout_Prediction.Rmd")'