-
Notifications
You must be signed in to change notification settings - Fork 254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Figure out an approach for adding experiments to the leaderboard #1211
Comments
This way also allows us to filter experiments on the leaderboard as well as know that two experiments belong to the same model. |
I like this, gives more flexibility and cleans up the main folder. The only potentially hard part is what do we do say for GritLM but for each type of prompt it runs? The user would have to identify some title for the prompt to put down (e.g. E5 prompt or Instructor prompt or "prompt-I-just-made-up"). These will then be non-standard since every user will name it differently (or perhaps run different prompts). |
I think there could be a file for each experiment that includes the parameters used for the experiment, similar to |
Def. agree with this. I would imagine that some standards form, but there will be some custom stuff to allow for flexibility.
I think we can just log this in the model_meta? |
We currently do not have a good approach to adding experiment runs to the MTEB leaderboard e.g. experiment with the influence of hyperparameters. Such as using no-prompt for prompt-based models or changing the embedding size.
A solution to implement is to run it with a unique model_name, but there is currently no documentation on how one would need to do that.
edit: a potentially better solution is to add experiments. Where we add a layer to the visualization:
or potentially a slightly more consistent structure:
Experiment names could be e.g. no-instruct/instruct (whether you use instructions) or emb_size=256
To run a model as an experiment we could do:
The text was updated successfully, but these errors were encountered: