Skip to content
This repository has been archived by the owner on Jul 4, 2023. It is now read-only.

Add BPE encoder #100

Open
wants to merge 16 commits into
base: master
Choose a base branch
from
Open

Add BPE encoder #100

wants to merge 16 commits into from

Conversation

Columbine21
Copy link

add the bytepair encoding #7

@codecov-commenter
Copy link

codecov-commenter commented Jul 1, 2020

Codecov Report

Merging #100 into master will decrease coverage by 0.10%.
The diff coverage is 92.55%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #100      +/-   ##
==========================================
- Coverage   94.41%   94.31%   -0.11%     
==========================================
  Files          64       66       +2     
  Lines        1611     1705      +94     
==========================================
+ Hits         1521     1608      +87     
- Misses         90       97       +7     
Impacted Files Coverage Δ
torchnlp/encoders/text/bpe_text_tokenizer.py 90.16% <90.16%> (ø)
torchnlp/encoders/text/bytepair_encoder.py 96.87% <96.87%> (ø)
torchnlp/encoders/text/__init__.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cde86ba...63460d0. Read the comment docs.

@PetrochukM
Copy link
Owner

Hey! Thank you for your contribution.

Do you an opinion on subword_nmt vs tokenizers by HuggingFace?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants