Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a proper function to add the prefixes #29

Closed
wants to merge 1 commit into from
Closed

Conversation

NohTow
Copy link
Collaborator

@NohTow NohTow commented Aug 9, 2024

This PR introduces a proper function to add the query/document prefixes that is more robust and works with all tokenizer (not rely on ". " being tokenized as one unique token, which is not the case for mGTE for example).

This fixes #11.

@NohTow NohTow closed this Aug 9, 2024
@NohTow NohTow deleted the fix_prefix branch August 9, 2024 09:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix tokenization for query/doc marker
1 participant