MigratoryBirds

Project for DSP2022 - Data Science Project I - Detecting micro and macro spatial-temporal patterns in behaviour and habitat selection by a migratory bird

Python version

Python version python3.9 or higher

Libraries

numpy
pandas
geopandas
sklearn
matplotlib
seaborn
pointpats
contextily
Folium
Plotly
PySAL
[Pointpats] (https://pypi.org/project/pointpats/)
(Jupyter) notebook

Anaconda can be installed and the code and the notbooks can be executed in conda environment. For example, numpy, pandas and matplotlib are already inluded in Anaconda.

Instructions for running

Machine Learning Pipeline

Put original data csv files in resources/data/original
Run python3 src/data_wrangling/join_datasets.py to join the original 2 datasets together by the NestID
Run python3 src/data_wrangling/clustering.py to calculate the clusters using DBSCAN with different distances for each year
Run python3 src/data_wrangling/nearby_nests.py to calculate the closest nest and calculate number of nearby (different distances) neighbours
Run python3 src/data_wrangling/missing_data.py to impute values to null variables
Run python3 src/data_wrangling/nest_features.py to compute features related to nests such as the percentage of shy birds in a cluster or in a neighbourhood and also combine the features generated by the previous python scripts together in a single csv file.
Run python3 src/data_wrangling/generate_test_train_split.py to split the dataset into two parts (test and train) and also generates one hot variables out of some of the features

Run python3 src/machine_learning/build_classification_models.py to build and evaluate classification models
Run python3 src/machine_learning/build_clustering_models.py to build the clustering models which try to find a better propensity threshold
Run python3 src/machine_learning/build_regression_models.py to build and evaluate regression models (these do very poorly)
Run python3 src/machine_learning/explaining_models.py to run MPI and PIMP for feature importance over the decision trees built by the classification script
Run python3 src/machine_learning/explaining_model_lime.py to run LIME for explaining individual predictions

Drawing plots

Run python3 src/data_wrangling/mobber_percentages_per_cluster.pyto draw plots for percentages of shy and aggressive birds in clusters.
Run python3 src/data_wrangling/mobber_percentages_per_cluster_per_year.pyto draw plots for percentages of shy and aggressive birds in clusters for each year.

Please note that all scripts must be run from the root repository. For example, if we want to run the clustering phase, we would run python src/data_wrangling/clustering.py and not python3 data_wrangling/clustering.py

Calculating the Join Counts

Run python3 src/analysis/analysis.py to draw the histograms with the real values for each year, as well as print the values
autocorrelation.py contains the functions needed for analysis.py to run
The distance threshold may be edited in analysis.py

Jupyter notebooks

Before running the notebooks data files must be created. If not done in machine learning part run the following commands

$ python3 src/data_wrangling/join_datasets.py
$ python src/data_wrangling/clustering.py

Then go to visualisation folder

cd src/visualisation/
Open Visual Studio Code and navigate there to the location src/visualisation/ and open the chosen notebook file. Choose the kernel by clicking Select Kernel on the right corner and choose one before running the code. If you have installed conda, you may want to choose kernel that name starts with base.
The notebooks can also be executed on the server. Open the notebook server with command $ jupyter notebook that opens the folder into a browser. You can then choose the notebook you want to look more closely. More info can be found here.

The code for map clustering with the K-means algorithm is also written in a python notebook. To open the notebook and run it follow the same instruction as for running visualisations.

Name		Name	Last commit message	Last commit date
Latest commit History 106 Commits
resources		resources
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MigratoryBirds

Python version

Libraries

Instructions for running

Machine Learning Pipeline

Drawing plots

Calculating the Join Counts

Jupyter notebooks

About

Releases

Packages

Contributors 5

Languages

krn-hov/Migratory-Birds

Folders and files

Latest commit

History

Repository files navigation

MigratoryBirds

Python version

Libraries

Instructions for running

Machine Learning Pipeline

Drawing plots

Calculating the Join Counts

Jupyter notebooks

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages