V2.1.0 TokenizerConfig
Major change
This "mid-size" update brings a new TokenizerConfig
object, holding any tokenizer's configuration. This object is now used to instantiate all tokenizers, and replaces the now removed beat_res
, nb_velocities
, pitch_range
and additional_tokens
arguments. It allows to simplify the code, reduce exceptions, and expose a simplified way to custom tokenizers.
You can read the documentation and example to see how to use it.
Changes
- e586b1f New
TokenizerConfig
object to hold config and instantiate tokenizers - 26a67a6 @tingled Fix in
__repr__
- 9970ec4 Fix in CPWord token type graph
- 69e64a7
max_bar_embedding
argument forREMIPlus
is now by default set to False - 62292d6 @Kapitan11
load_params
now private method, and documentation updated for this feature - 3aeb7ff Removing the depreciated "slow" BPE methods
- f8ca854 @ilya16 Fixing PitchBend time attribute in
merge_tracks
method - b12d270
TSD
now natively handleProgram
tokens, the same wayREMIPlus
does. Using theuse_prorams
option will convert MIDIs into a single token sequence for all tracks, instead of one seq per track instead; - Other minor code, lint and docstring improvements
Compatibility
- On your current / previous projects, you will need to update your code, specifically the way you create tokenizers, to use this update. This doesn't apply to code creating tokenizers from config file (
params
arg); - Slow BPE removed. If you still use these methods, we encourage you to switch to the new fast ones. You trained models will need to be using with old slow tokenizers.