You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As suggested by @angeloskath' s code review ml-explore/mlx-examples#315 (comment), an implementation of BytePairTokenizer seems useful for many use cases, but it is currently missing in mlx-data. I did some research on byte pair tokenization in transformers. I think that the implementation in transformers is somewhat slow. More precisely, the implementation iterates over all possible adjacent symbol pairs to determine the optimal symbol pair to merge, every time a merge could be done. This implies quadratic time complexity. However, in the referenced paper, there is an elegant linearithmic time implementation. Since the implementation requires some pointer trickery, it seems that we could (relatively) easily implement this in C++ and expose to Python.
I would appreciate your thoughts on:
Do we want an implementation of BytePairTokenizer in C++?
Do we want the faster implementation of BytePairTokenizer in C++, referenced in the paper?
Yeah we would want a tokenizer in C++. I think for starters implementing it similar to python but in C++ would be sufficient. BPE quite a simple algorithm and if a Python implementation is usable I think a C++ one would be at least as much (probably much faster) with the benefit of allowing us to use threads.
As suggested by @angeloskath' s code review ml-explore/mlx-examples#315 (comment), an implementation of
BytePairTokenizer
seems useful for many use cases, but it is currently missing inmlx-data
. I did some research on byte pair tokenization intransformers
. I think that the implementation intransformers
is somewhat slow. More precisely, the implementation iterates over all possible adjacent symbol pairs to determine the optimal symbol pair to merge, every time a merge could be done. This implies quadratic time complexity. However, in the referenced paper, there is an elegant linearithmic time implementation. Since the implementation requires some pointer trickery, it seems that we could (relatively) easily implement this in C++ and expose to Python.I would appreciate your thoughts on:
Paper: https://arxiv.org/pdf/2306.16837.pdf
The text was updated successfully, but these errors were encountered: