Releases: Natooz/MidiTok
Releases · Natooz/MidiTok
v1.1.9 MASK tokens & bugfixes
Changes
- d2a3404 22c15f2 When tokenizing, files not found can now be logged
- 1f74e3a
MASK
tokens now available, useyour_tokenizer.vocab.add_mask()
to add in in your vocabulary. - 1f74e3a fix to a possible bug when using custom vocabulary indexes, then non custom. Now checking indexes before registering tokens.
- 2f87765
merge_tracks()
now also merges sustain pedal, control change and pitch bend messages with theeffects
arg, and now handlesList[Instrument]
as well asMidiFile
objects. - 2e77eec MIDI-Like
token_types_errors()
now looks for Note-Off for each Note-On tokens.
Compatibility
All good !
v1.1.8 Time Signature tokens for Octuple
Changes
- 3c3c5da
TimeSignature
tokens are implemented for Octuple ! Thanks @ilya16 for this great contribution ! These tokens are optional and can be set with theadditional_tokens
parameter. - df1edd1 Added a fail-check for Bar / Pos based tokenizations, for when a token sequence begins by a
Position
tokens before anyBar
. - 5ab55f4 Bugfix when loading tokenizer params from config file with tempos.
- 08540a2
SOS
andEOS
tokens are not adresses to -1 and -2 anymore as this could led to issues.
Compatibility
SOS
and EOS
tokens saved with v1.1.7 and before will not be compatible anymore.
You can however easily convert them. You just have to convert SOS
(-1) and EOS
(-2) tokens to respectively len(tokenizer.vocab)
and len(tokenizer.vocab) + 1
.
v1.1.7 Class renamed
Changes
- 195d549 Tokenizer classes are renamed: the 'Encoding' suffixe is removed. Old class names still exist / work but will be removed in the future (a warning is called when using them)
- 195d549 constants import modified, now has to be accessed miditok.constants.A_CONSTANT
- 3ed5532 PAD token type are now handled in token_types_errors methods
v1.1.6 Speed up
v1.1.5 Fixes and debugging
Changes
- c7169fe rests no longer append a bar token when crossing a new bar (bugfix)
- 9457178 fix in token types graph for REMI / CP Word
- 8a6da14 events_to_tokens and tokens_to_events no longer protected methods, to use for debugging
Compatibility:
- MIDI files tokenized with REMI and CP Word using Rests, with v1.1.4 and below might not be compatible as the decoding process changed (c7169fe)
v1.1.4 Bugfix rest detections
- 7d1c5bc af90e72 Rest detection was inaccurate, now fixed for REMI, CP Word and MIDI-Like
- a2daaa2 Bugfix when using MuMIDI with chords
- Colab Notebooks !
v1.1.3 Bugfix chord detection
- 9a5975c bugfix in the chord detection method, was comparing lists with tuples for chord qualities
v1.1.2 Token sequence types validation & Bugfixes
Changes:
- da36b4a
token_types_errors
method introduced, its allows to check if a generated sequence of tokens is constituted of valid token types successions and values. This rule-based metric is useful to measure if a network understands the "semantic" of a tokenization strategy. Note that the validation is calculated differently following the tokenization strategy, we refer you to the docstring. - Fixes in _create_token_types_graph for CP Word, MIDI-Like, MuMIDI and Remi
- 2dc9fae When using Rests with Remi and CP Word, a Bar token is put after a/several Rest token(s) if the rest crossed one or several bars.
token_types_errors
is included in the tests scripts
Compatibility:
- Tokens previously created with Remi or CP Word using Rests may not be compatible with v1.1.2
v1.1.1 Program tokens and tokens_types_graph
Changes:
- c3d6c89 new attribute
tokens_types_graph
for every class, to be used to check if a generated sequence is made of valid token successions - bd9aade Program token type is part of
additional_tokens
attribute. MidiTok never use them, its here for you if you need it
Compatibility:
_create_token_types_graph
is called byMIDITokenizer
's constructor, your custom classes should then implement it (can return None)- Your datasets tokenized with <= v1.1.0 will stay compatible but you won't be able to load them without adding the
'Program'
key in theconfig.txt
files.