Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out an approach for adding experiments to the leaderboard #1211

Open
KennethEnevoldsen opened this issue Sep 9, 2024 · 4 comments
Open

Comments

@KennethEnevoldsen
Copy link
Contributor

KennethEnevoldsen commented Sep 9, 2024

We currently do not have a good approach to adding experiment runs to the MTEB leaderboard e.g. experiment with the influence of hyperparameters. Such as using no-prompt for prompt-based models or changing the embedding size.

A solution to implement is to run it with a unique model_name, but there is currently no documentation on how one would need to do that.


edit: a potentially better solution is to add experiments. Where we add a layer to the visualization:

model_name
| - revision1
|    | - task1.json
     | - ...
     | - experiments
     |    | - exp_name1
     |    |    | -  task1.json
     |    |    | - ...
     |    | - exp_name1
     |    |    | -  task1.json
     |    |    | - ...

or potentially a slightly more consistent structure:

model_name
| - revision1
     | - default
     |    | -  task1.json
     |    | - ...
     | - exp_name1
     |    | -  task1.json
     |    | - ...

Experiment names could be e.g. no-instruct/instruct (whether you use instructions) or emb_size=256

To run a model as an experiment we could do:

model = MyModel(**custom_kwargs)
results = mteb.MTEB(model, experiment_name = "no-instruct")
@KennethEnevoldsen KennethEnevoldsen changed the title Figure out an approach for running experiments Figure out an approach for adding experiments to the leaderboard Sep 9, 2024
@KennethEnevoldsen
Copy link
Contributor Author

KennethEnevoldsen commented Sep 9, 2024

This way also allows us to filter experiments on the leaderboard as well as know that two experiments belong to the same model.

@orionw
Copy link
Contributor

orionw commented Sep 9, 2024

I like this, gives more flexibility and cleans up the main folder.

The only potentially hard part is what do we do say for GritLM but for each type of prompt it runs? The user would have to identify some title for the prompt to put down (e.g. E5 prompt or Instructor prompt or "prompt-I-just-made-up"). These will then be non-standard since every user will name it differently (or perhaps run different prompts).

@Samoed
Copy link
Collaborator

Samoed commented Sep 9, 2024

I think there could be a file for each experiment that includes the parameters used for the experiment, similar to model_meta. This can be done by logging additional kwargs for model

@KennethEnevoldsen
Copy link
Contributor Author

The only potentially hard part is what do we do say for GritLM but for each type of prompt it runs? The user would have to identify some title for the prompt to put down (e.g. E5 prompt or Instructor prompt or "prompt-I-just-made-up"). These will then be non-standard since every user will name it differently (or perhaps run different prompts).

Def. agree with this. I would imagine that some standards form, but there will be some custom stuff to allow for flexibility.

I think there could be a file for each experiment that includes the parameters used for the experiment, similar to model_meta. This can be done by logging additional kwargs for model

I think we can just log this in the model_meta?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants