Skip to content

V2.1.0 TokenizerConfig

Compare
Choose a tag to compare
@Natooz Natooz released this 03 Jul 14:47
· 203 commits to main since this release
b12d270

Major change

This "mid-size" update brings a new TokenizerConfig object, holding any tokenizer's configuration. This object is now used to instantiate all tokenizers, and replaces the now removed beat_res, nb_velocities, pitch_range and additional_tokens arguments. It allows to simplify the code, reduce exceptions, and expose a simplified way to custom tokenizers.
You can read the documentation and example to see how to use it.

Changes

  • e586b1f New TokenizerConfig object to hold config and instantiate tokenizers
  • 26a67a6 @tingled Fix in __repr__
  • 9970ec4 Fix in CPWord token type graph
  • 69e64a7 max_bar_embedding argument for REMIPlus is now by default set to False
  • 62292d6 @Kapitan11 load_params now private method, and documentation updated for this feature
  • 3aeb7ff Removing the depreciated "slow" BPE methods
  • f8ca854 @ilya16 Fixing PitchBend time attribute in merge_tracks method
  • b12d270 TSD now natively handle Program tokens, the same way REMIPlus does. Using the use_prorams option will convert MIDIs into a single token sequence for all tracks, instead of one seq per track instead;
  • Other minor code, lint and docstring improvements

Compatibility

  • On your current / previous projects, you will need to update your code, specifically the way you create tokenizers, to use this update. This doesn't apply to code creating tokenizers from config file (params arg);
  • Slow BPE removed. If you still use these methods, we encourage you to switch to the new fast ones. You trained models will need to be using with old slow tokenizers.