Figure out an approach for adding experiments to the leaderboard #1211

KennethEnevoldsen · 2024-09-09T19:53:13Z

We currently do not have a good approach to adding experiment runs to the MTEB leaderboard e.g. experiment with the influence of hyperparameters. Such as using no-prompt for prompt-based models or changing the embedding size.

A solution to implement is to run it with a unique model_name, but there is currently no documentation on how one would need to do that.

edit: a potentially better solution is to add experiments. Where we add a layer to the visualization:

model_name
| - revision1
|    | - task1.json
     | - ...
     | - experiments
     |    | - exp_name1
     |    |    | -  task1.json
     |    |    | - ...
     |    | - exp_name1
     |    |    | -  task1.json
     |    |    | - ...

or potentially a slightly more consistent structure:

model_name
| - revision1
     | - default
     |    | -  task1.json
     |    | - ...
     | - exp_name1
     |    | -  task1.json
     |    | - ...

Experiment names could be e.g. no-instruct/instruct (whether you use instructions) or emb_size=256

To run a model as an experiment we could do:

model = MyModel(**custom_kwargs)
results = mteb.MTEB(model, experiment_name = "no-instruct")

The text was updated successfully, but these errors were encountered:

KennethEnevoldsen · 2024-09-09T20:38:29Z

This way also allows us to filter experiments on the leaderboard as well as know that two experiments belong to the same model.

orionw · 2024-09-09T20:47:18Z

I like this, gives more flexibility and cleans up the main folder.

The only potentially hard part is what do we do say for GritLM but for each type of prompt it runs? The user would have to identify some title for the prompt to put down (e.g. E5 prompt or Instructor prompt or "prompt-I-just-made-up"). These will then be non-standard since every user will name it differently (or perhaps run different prompts).

Samoed · 2024-09-09T20:53:48Z

I think there could be a file for each experiment that includes the parameters used for the experiment, similar to model_meta. This can be done by logging additional kwargs for model

KennethEnevoldsen · 2024-09-10T07:18:01Z

The only potentially hard part is what do we do say for GritLM but for each type of prompt it runs? The user would have to identify some title for the prompt to put down (e.g. E5 prompt or Instructor prompt or "prompt-I-just-made-up"). These will then be non-standard since every user will name it differently (or perhaps run different prompts).

Def. agree with this. I would imagine that some standards form, but there will be some custom stuff to allow for flexibility.

I think there could be a file for each experiment that includes the parameters used for the experiment, similar to model_meta. This can be done by logging additional kwargs for model

I think we can just log this in the model_meta?

KennethEnevoldsen changed the title ~~Figure out an approach for running experiments~~ Figure out an approach for adding experiments to the leaderboard Sep 9, 2024

This was referenced Sep 9, 2024

fix models path embeddings-benchmark/results#25

Merged

Re-name models to clearly separate instruction vs non-instruction models embeddings-benchmark/results#8

Open

KennethEnevoldsen mentioned this issue Sep 11, 2024

re-add GritLM-7B-noinstruct embeddings-benchmark/results#28

Merged

KennethEnevoldsen mentioned this issue Oct 3, 2024

e5-mistral evaluation via vLLM #1270

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Figure out an approach for adding experiments to the leaderboard #1211

Figure out an approach for adding experiments to the leaderboard #1211

KennethEnevoldsen commented Sep 9, 2024 •

edited

Loading

KennethEnevoldsen commented Sep 9, 2024 •

edited

Loading

orionw commented Sep 9, 2024

Samoed commented Sep 9, 2024 •

edited

Loading

KennethEnevoldsen commented Sep 10, 2024

Figure out an approach for adding experiments to the leaderboard #1211

Figure out an approach for adding experiments to the leaderboard #1211

Comments

KennethEnevoldsen commented Sep 9, 2024 • edited Loading

KennethEnevoldsen commented Sep 9, 2024 • edited Loading

orionw commented Sep 9, 2024

Samoed commented Sep 9, 2024 • edited Loading

KennethEnevoldsen commented Sep 10, 2024

KennethEnevoldsen commented Sep 9, 2024 •

edited

Loading

KennethEnevoldsen commented Sep 9, 2024 •

edited

Loading

Samoed commented Sep 9, 2024 •

edited

Loading