Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transitioning to a faster MTEB #482

Closed
KennethEnevoldsen opened this issue Apr 22, 2024 · 6 comments
Closed

Transitioning to a faster MTEB #482

KennethEnevoldsen opened this issue Apr 22, 2024 · 6 comments

Comments

@KennethEnevoldsen
Copy link
Contributor

@Muennighoff it seems like we might run into an issue where we will need to update MTEB to MMTEB, where we might e.g. want to speed up existing tasks. One solution might be to keep the leaderboard using an older version on MTEB until we move the entire leaderboard to the newest version (hereby also outdating older results).

As far as my experiments go with speeding up the clustering task there are plenty of ways to improve the speed, but most of them sacrifice comparative performance scores with the old versions.

@Muennighoff
Copy link
Contributor

Yeah, we can think about this in more detail when MMTEB is in its final stages. It may also make sense to change the dataset composition of the default English average.

The clustering_fast_and_fixed idea here sounds good or we just rename the classes to "..oldclassname..V2" or something, so people can just select the newer ones based on their name if they want.

@orionw
Copy link
Contributor

orionw commented Apr 22, 2024

If we end up subsampling datasets to speed things up, I might recommend we have like a MMTEB lite and an MMTEB full. I know when BEIR came out it confused a lot of folks since the MSMarco they use is not the same as the standard eval setup that had previously been done. Or if that is not desirable, it would be helpful to have some way of distinguishing that the datasets have been altered and that it's incomparable with previous results on the full benchmark.

@Muennighoff
Copy link
Contributor

This work may be relevant: https://arxiv.org/abs/2402.14992

@KennethEnevoldsen
Copy link
Contributor Author

The clustering_fast_and_fixed idea #407 (comment) sounds good or we just rename the classes to "..oldclassname..V2" or something, so people can just select the newer ones based on their name if they want.

Perfect I will go with this approach then when updating datasets.

If we end up subsampling datasets to speed things up, I might recommend we have like an MMTEB lite and an MMTEB full.

I was thinking of doing something like MMTEB({subset}) e.g. MMTEB(eng), MMTEB(medical), MMTEB(mini), where the subset is just a list of tasks.

but it would be frustrating if performance on task A could differ between two sets (so would rather decide on one subsampling strategy for each dataset rather than allow it to be different depending on benchmark). A mini would then just be a set of representative tasks.

This work may be relevant: https://arxiv.org/abs/2402.14992

Yea that is pretty cool paper. We can do a similar thing if we treat each dataset as a sample (otherwise we will need to do a refactor of how we handle samples - which I believe is too much).

@KennethEnevoldsen
Copy link
Contributor Author

Ah new suggestion I made in #481 is to add `task.superseeded_by = "new_dataset_name", which raises a warning when the original dataset is run.

I believe this keeps backward compatibility, it allows us to see which datasets are outdated and allow us to update datasets without influencing previous benchmarks.

@KennethEnevoldsen
Copy link
Contributor Author

This issue seems outdated will multiple newer issues - to get an overview #784 is probably a good place to start.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants