Skip to content

Latest commit

 

History

History
132 lines (84 loc) · 8.29 KB

README.md

File metadata and controls

132 lines (84 loc) · 8.29 KB

DCASE2023: FEW-SHOT BIOACOUSTIC EVENT DETECTION USING BEATS, ADAPTIVE FRAME-SHIFTS AND SPECTRAL GATING

DOI

💥 A PIPELINE FOR FINE-TUNING BEATs ON ESC50 DATASET IS PROVIDED HERE. The rest of the repository is on training a prototypical network using BEATs as feature extractor 💥

Few-shot learning is a highly promising paradigm for sound event detection. It is also an extremely good fit to the needs of users in bioacoustics, in which increasingly large acoustic datasets commonly need to be labelled for events of an identified category (e.g. species or call-type), even though this category might not be known in other datasets or have any yet-known label. While satisfying user needs, this will also benchmark few-shot learning for the wider domain of sound event detection (SED).

figure

Few-shot learning describes tasks in which an algorithm must make predictions given only a few instances of each class, contrary to standard supervised learning paradigm. The main objective is to find reliable algorithms that are capable of dealing with data sparsity, class imbalance and noisy/busy environments. Few-shot learning is usually studied using N-way-K-shot classification, where N denotes the number of classes and K the number of examples for each class.

Text in this section is borrowed from c4dm/dcase-few-shot-bioacoustic

Our contribution:

This repository is the result of our submission to the DCASE2023 challenge task5: Few-shot Bioacoustic Event Detection. It containts the necessary code to train a prototypical network with BEATs as feature extractor on the data given by the DCASE challenge.

This repository's main objective is to keep active to tackle future DCASE challenges, if you wish to help us improve this repository / collaborate with us, please do not hesitate to send us a message!

Requirements

In this section are listed the requirements. Note that we make extensive use of Docker for easier reproducibility.

Setup

We have made a small wrapper to download the DCASE data and the BEATs model. Only the base folder needs to be specified:

./dcase_setup.sh /BASE/FOLDER/

The script should create a DCASE folder containing the DCASE Development Set (i.e. Training and Validation set) and a BEATs folder containing the model weights in the specified base folder.

Once the necessary files have been dowloaded create the Docker image from the Dockerfile located in our repository:

git clone https://github.com/NINAnor/rare_species_detections.git
cd rare_species_detections
docker build -t beats -f Dockerfile .

Processing the data

Because of the duration of the preprocessing, we save the preprocessed files as numpy arrays. This way we can experiment with the pipeline without constantly pre-processing the data. To facilitate the pre-processing step use:

./preprocess_data.sh /BASE/FOLDER

The script will create a new folder DCASEfewshot containing three subfolders (train, validate and evaluate). Each of these folder contains a subfolder with a hash as a name. The hash has been created based on the processing parameters. The processed data in the form of numpy arrays.

✒️ You can change the parameters for preprocessing the data in the CONFIG.yaml file

✒️ Note that to create the numpy arrays for train, validate and evaluate you need to change the CONFIG.yaml file at each iteration.

Train the model

Now that the data have been preprocessed into numpy arrays you can use them as a model input with train_model.sh:

./train_model.sh /BASE/FOLDER

The training script should create a log folder in the base folder (lightning_logs/) in which the model weights (version_X/checkpoints/*.ckpt) and the training configuration (version_X/checkpoints/config.yaml) are stored.

✒️ You can change the parameters for training the model in the CONFIG.yaml file

Using the model on the Validation / Evaluation dataset

✒️ Update the status parameter of the CONFIG.yaml file to the dataset you want to use the model on. Change status to either validate or evaluate.

✒️ Also update the model_path in the CONFIG.yaml file to the checkpoints (ckpt) that has been trained in the previous step (stored in lightning_logs)

To run the prediction use the script test_model.

./test_model.sh /BASE/FOLDER

test_model.sh creates a result file eval_out.csv in the folder containing the processed validation data. The full path is printed in the console

Note that there are other advanced options. For instance, if --wav_save is specified, the script will also return a .wav file for all files containing additional channels: the ground truth labels, the predicted labels, the distance to the POS prototype and finally the p-values. The .wav file can be opened in Audacity to be inspected more closely.

Computing the resulting metrics

Once the eval_out.csv has been created, it is possible to get the results for our approach. Note that the metrics can only be computed for the Validation_Set as it contains all ground truth labels as opposed to the Evaluation_Set for which only the 5 first samples of the POS class are labelled.

./compute_metrics.sh /BASE/FOLDER /PATH/TO/eval_out.csv

Here are the results we obtain using our pipeline described in our Technical Report

Evaluation for: TeamBEATs VAL
BUK1_20181011_001004.wav {'TP': 13, 'FP': 22, 'FN': 18, 'total_n_pos_events': 31}
BUK1_20181013_023504.wav {'TP': 3, 'FP': 206, 'FN': 21, 'total_n_pos_events': 24}
BUK4_20161011_000804.wav {'TP': 1, 'FP': 22, 'FN': 46, 'total_n_pos_events': 47}
BUK4_20171022_004304a.wav {'TP': 6, 'FP': 15, 'FN': 11, 'total_n_pos_events': 17}
BUK5_20161101_002104a.wav {'TP': 39, 'FP': 7, 'FN': 49, 'total_n_pos_events': 88}
BUK5_20180921_015906a.wav {'TP': 4, 'FP': 9, 'FN': 19, 'total_n_pos_events': 23}
ME1.wav {'TP': 10, 'FP': 21, 'FN': 1, 'total_n_pos_events': 11}
ME2.wav {'TP': 41, 'FP': 35, 'FN': 0, 'total_n_pos_events': 41}
R4_cleaned recording_13-10-17.wav {'TP': 19, 'FP': 23, 'FN': 0, 'total_n_pos_events': 19}
R4_cleaned recording_16-10-17.wav {'TP': 30, 'FP': 9, 'FN': 0, 'total_n_pos_events': 30}
R4_cleaned recording_17-10-17.wav {'TP': 36, 'FP': 6, 'FN': 0, 'total_n_pos_events': 36}
R4_cleaned recording_TEL_19-10-17.wav {'TP': 52, 'FP': 29, 'FN': 2, 'total_n_pos_events': 54}
R4_cleaned recording_TEL_20-10-17.wav {'TP': 64, 'FP': 10, 'FN': 0, 'total_n_pos_events': 64}
R4_cleaned recording_TEL_23-10-17.wav {'TP': 84, 'FP': 5, 'FN': 0, 'total_n_pos_events': 84}
R4_cleaned recording_TEL_24-10-17.wav {'TP': 99, 'FP': 13, 'FN': 0, 'total_n_pos_events': 99}
R4_cleaned recording_TEL_25-10-17.wav {'TP': 99, 'FP': 8, 'FN': 0, 'total_n_pos_events': 99}
file_423_487.wav {'TP': 57, 'FP': 7, 'FN': 0, 'total_n_pos_events': 57}
file_97_113.wav {'TP': 11, 'FP': 30, 'FN': 109, 'total_n_pos_events': 120}

Overall_scores: {'precision': 0.348444259075038, 'recall': 0.525770811091538, 'fmeasure (percentage)': 41.912}

Taking the idea further:

  • Computing the mahalanobis distance instead of the Euclidean distance
  • Implementing a p-value filtering to detect outlier distances from the prototypes

Acknowlegment and contact

For bug reports please use the issues section.

For other inquiries please contact Benjamin Cretois or Femke Gelderblom

Cite this work

Gelderblom, F., Cretois, B., Johnsen, P., Remonato, F., & Reinen, T. A. (2023). Few-shot bioacoustic event detection using beats. Technical report, DCASE2023 Challenge.