Transitioning to a faster MTEB #482

KennethEnevoldsen · 2024-04-22T12:46:43Z

@Muennighoff it seems like we might run into an issue where we will need to update MTEB to MMTEB, where we might e.g. want to speed up existing tasks. One solution might be to keep the leaderboard using an older version on MTEB until we move the entire leaderboard to the newest version (hereby also outdating older results).

As far as my experiments go with speeding up the clustering task there are plenty of ways to improve the speed, but most of them sacrifice comparative performance scores with the old versions.

Muennighoff · 2024-04-22T16:10:31Z

Yeah, we can think about this in more detail when MMTEB is in its final stages. It may also make sense to change the dataset composition of the default English average.

The clustering_fast_and_fixed idea here sounds good or we just rename the classes to "..oldclassname..V2" or something, so people can just select the newer ones based on their name if they want.

orionw · 2024-04-22T19:48:50Z

If we end up subsampling datasets to speed things up, I might recommend we have like a MMTEB lite and an MMTEB full. I know when BEIR came out it confused a lot of folks since the MSMarco they use is not the same as the standard eval setup that had previously been done. Or if that is not desirable, it would be helpful to have some way of distinguishing that the datasets have been altered and that it's incomparable with previous results on the full benchmark.

Muennighoff · 2024-04-22T19:52:11Z

This work may be relevant: https://arxiv.org/abs/2402.14992

KennethEnevoldsen · 2024-04-23T08:14:06Z

The clustering_fast_and_fixed idea #407 (comment) sounds good or we just rename the classes to "..oldclassname..V2" or something, so people can just select the newer ones based on their name if they want.

Perfect I will go with this approach then when updating datasets.

If we end up subsampling datasets to speed things up, I might recommend we have like an MMTEB lite and an MMTEB full.

I was thinking of doing something like MMTEB({subset}) e.g. MMTEB(eng), MMTEB(medical), MMTEB(mini), where the subset is just a list of tasks.

but it would be frustrating if performance on task A could differ between two sets (so would rather decide on one subsampling strategy for each dataset rather than allow it to be different depending on benchmark). A mini would then just be a set of representative tasks.

This work may be relevant: https://arxiv.org/abs/2402.14992

Yea that is pretty cool paper. We can do a similar thing if we treat each dataset as a sample (otherwise we will need to do a refactor of how we handle samples - which I believe is too much).

KennethEnevoldsen · 2024-04-23T09:25:10Z

Ah new suggestion I made in #481 is to add `task.superseeded_by = "new_dataset_name", which raises a warning when the original dataset is run.

I believe this keeps backward compatibility, it allows us to see which datasets are outdated and allow us to update datasets without influencing previous benchmarks.

KennethEnevoldsen · 2024-06-05T18:23:18Z

This issue seems outdated will multiple newer issues - to get an overview #784 is probably a good place to start.

KennethEnevoldsen closed this as completed Jun 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transitioning to a faster MTEB #482

Transitioning to a faster MTEB #482

KennethEnevoldsen commented Apr 22, 2024

Muennighoff commented Apr 22, 2024

orionw commented Apr 22, 2024

Muennighoff commented Apr 22, 2024

KennethEnevoldsen commented Apr 23, 2024

KennethEnevoldsen commented Apr 23, 2024

KennethEnevoldsen commented Jun 5, 2024

Transitioning to a faster MTEB #482

Transitioning to a faster MTEB #482

Comments

KennethEnevoldsen commented Apr 22, 2024

Muennighoff commented Apr 22, 2024

orionw commented Apr 22, 2024

Muennighoff commented Apr 22, 2024

KennethEnevoldsen commented Apr 23, 2024

KennethEnevoldsen commented Apr 23, 2024

KennethEnevoldsen commented Jun 5, 2024