Tempo-CNN is a simple CNN-based framework for estimating temporal properties of music tracks featuring trained models from several publications [1] [2] [3] [4].
First and foremost, Tempo-CNN is a tempo estimator. To determine the global tempo of an audio file, simply run the script
tempo -i my_audio.wav
To create a local tempo "tempogram", run
tempogram my_audio.wav
For a complete list of options, run either script with the parameter --help
.
For programmatic use via the Python API, please see here.
In a clean Python 3.9 environment, simply run:
pip install tempocnn
If you rather want to install from source, clone this repo and run
setup.py install
using Python 3.9:
git clone https://github.com/hendriks73/tempo-cnn.git
cd tempo-cnn
python setup.py install
You may specify other models and output formats (MIREX, JAMS) via command line parameters.
E.g. to create JAMS as output format and the model originally used in the ISMIR 2018 paper [1], please run
tempo -m ismir2018 --jams -i my_audio.wav
For MIREX-style output, add the --mirex
parameter.
To use one of the DeepTemp
models from [3] (see also repo
directional_cnns), run
tempo -m deeptemp --jams -i my_audio.wav
or,
tempo -m deeptemp_k24 --jams -i my_audio.wav
if you want to use a higher capacity model (some k
-values are supported).
deepsquare
and shallowtemp
models may also be used.
Note that some models may be downloaded (and cached) at execution time.
To use DT-Maz models from [4], run
tempo -m mazurka -i my_audio.wav
This defaults to the model named dt_maz_v_fold0
.
You may choose another fold [0-4]
or another split [v|m]
.
So to use fold 3 from the M-split, use
tempo -m dt_maz_m_fold3 -i my_audio.wav
Note that Mazurka models may be used to estimate a global tempo, but were actually trained to create tempograms for Chopin Mazurkas [4].
While it's cumbersome to list the split definitions for the Version folds, the Mazurka folds are easily defined:
fold0
was tested onChopin_Op068No3
and validated onChopin_Op017No4
fold1
was tested onChopin_Op017No4
and validated onChopin_Op024No2
fold2
was tested onChopin_Op024No2
and validated onChopin_Op030No2
fold3
was tested onChopin_Op030No2
and validated onChopin_Op063No3
fold4
was tested onChopin_Op063No3
and validated onChopin_Op068No3
The networks were trained on recordings of the three remaining Mazurkas.
In essence this means, do not estimate the local tempo for Chopin_Op024No2
using
dt_maz_m_fold0
, because Chopin_Op024No2
was used in training.
For batch processing, you may want to run tempo
like this:
find /your_audio_dir/ -name '*.wav' -print0 | xargs -0 tempo -d /output_dir/ -i
This will recursively search for all .wav
files in /your_audio_dir/
, analyze then
and write the results to individual files in /output_dir/
. Because the model is only
loaded once, this method of processing is much faster than individual program starts.
To increase accuracy for greater than integer-precision, you may want to enable quadratic interpolation.
You can do so by setting the --interpolate
flag. Obviously, this only makes sense for tracks
with a very stable tempo:
tempo -m ismir2018 --interpolate -i my_audio.wav
Instead of estimating a global tempo, Tempo-CNN can also estimate local tempi in the form of a tempogram. This can be useful for identifying tempo drift.
To create such a tempogram, run
tempogram -p my_audio.wav
As output, tempogram
will create a .png
file. Additional options to select different models
and output formats are available.
You may use the --csv
option to export local tempo estimates in a parseable format and the
--hop-length
option to change temporal resolution.
The parameters --sharpen
and --norm-frame
let you post-process the image.
Tempo-CNN provides experimental support for temporal property estimation of Greek
folk music [2]. The corresponding models are named fma2018
(for tempo) and fma2018-meter
(for meter). To estimate the meter's numerator, run
meter -m fma2018-meter -i my_audio.wav
After installation, you may use the package programmatically.
Example for global tempo estimation:
from tempocnn.classifier import TempoClassifier
from tempocnn.feature import read_features
model_name = 'cnn'
input_file = 'some_audio_file.mp3'
# initialize the model (may be re-used for multiple files)
classifier = TempoClassifier(model_name)
# read the file's features
features = read_features(input_file)
# estimate the global tempo
tempo = classifier.estimate_tempo(features, interpolate=False)
print(f"Estimated global tempo: {tempo}")
Example for local tempo estimation:
from tempocnn.classifier import TempoClassifier
from tempocnn.feature import read_features
model_name = 'cnn'
input_file = 'some_audio_file.mp3'
# initialize the model (may be re-used for multiple files)
classifier = TempoClassifier(model_name)
# read the file's features, specify hop_length for temporal resolution
features = read_features(input_file, frames=256, hop_length=32)
# estimate local tempi, this returns tempo classes, i.e., a distribution
local_tempo_classes = classifier.estimate(features)
# find argmax per frame and convert class index to BPM value
max_predictions = np.argmax(local_tempo_classes, axis=1)
local_tempi = classifier.to_bpm(max_predictions)
print(f"Estimated local tempo classes: {local_tempi}")
Source code and models can be licensed under the GNU AFFERO GENERAL PUBLIC LICENSE v3. For details, please see the LICENSE file.
If you use Tempo-CNN in your work, please consider citing it.
Original publication:
@inproceedings{SchreiberM18_TempoCNN_ISMIR,
Title = {A Single-Step Approach to Musical Tempo Estimation Using a Convolutional Neural Network},
Author = {Schreiber, Hendrik and M{\"u}ller Meinard},
Booktitle = {Proceedings of the 19th International Society for Music Information Retrieval Conference ({ISMIR})},
Pages = {98--105},
Month = {9},
Year = {2018},
Address = {Paris, France},
doi = {10.5281/zenodo.1492353},
url = {https://doi.org/10.5281/zenodo.1492353}
}
ShallowTemp, DeepTemp, and DeepSquare models:
@inproceedings{SchreiberM19_CNNKeyTempo_SMC,
Title = {Musical Tempo and Key Estimation using Convolutional Neural Networks with Directional Filters},
Author = {Hendrik Schreiber and Meinard M{\"u}ller},
Booktitle = {Proceedings of the Sound and Music Computing Conference ({SMC})},
Pages = {47--54},
Year = {2019},
Address = {M{\'a}laga, Spain},
doi = {10.5281/zenodo.3249250},
url = {https://doi.org/10.5281/zenodo.3249250}
}
Mazurka models:
@inproceedings{SchreiberZM20_LocalTempo_ISMIR,
Title = {Modeling and Estimating Local Tempo: A Case Study on Chopin’s Mazurkas},
Author = {Hendrik Schreiber and Frank Zalkow and Meinard M{\"u}ller},
Booktitle = {Proceedings of the 21th International Society for Music Information Retrieval Conference ({ISMIR})},
Pages = {773--779},
Year = {2020},
Address = {Montreal, QC, Canada},
doi = {10.5281/zenodo.4245546},
url = {https://doi.org/10.5281/zenodo.4245546}
}
[1] | (1, 2) Hendrik Schreiber, Meinard Müller, A Single-Step Approach to Musical Tempo Estimation Using a Convolutional Neural Network, Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), Paris, France, Sept. 2018. |
[2] | (1, 2) Hendrik Schreiber, Technical Report: Tempo and Meter Estimation for Greek Folk Music Using Convolutional Neural Networks and Transfer Learning, 8th International Workshop on Folk Music Analysis (FMA), Thessaloniki, Greece, June 2018. |
[3] | (1, 2) Hendrik Schreiber, Meinard Müller, Musical Tempo and Key Estimation using Convolutional Neural Networks with Directional Filters, Proceedings of the Sound and Music Computing Conference (SMC), Málaga, Spain, 2019. |
[4] | (1, 2, 3) Hendrik Schreiber, Frank Zalkow, Meinard Müller, Modeling and Estimating Local Tempo: A Case Study on Chopin’s Mazurkas, Proceedings of the 21st International Society for Music Information Retrieval Conference (ISMIR), Montréal, QC, Canada, Oct. 2020. |