From b692c58891f8e7ba213c2bbfcf5801f4566893fd Mon Sep 17 00:00:00 2001 From: John Yang Date: Sat, 15 Jun 2024 17:35:36 -0400 Subject: [PATCH] Add notes to leaderboard --- index.html | 10 +++++++++- submit.html | 4 +--- template/template.html | 10 +++++++++- 3 files changed, 19 insertions(+), 5 deletions(-) diff --git a/index.html b/index.html index 8d8df9a..0e7415a 100644 --- a/index.html +++ b/index.html @@ -405,8 +405,16 @@

Leaderboard

- The % Resolved metric refers to the percentage of SWE-bench instances (2294 total) + - The % Resolved metric refers to the percentage of SWE-bench instances (2294 total) that were resolved by the model. +
+ - The leaderboard will be updated once a week on Monday. +
+ - If you would like to submit your model to the leaderboard, please check the submission page. +
+ - All submissions are Pass@1, do not use + hints_text, + and are in the unassisted setting.

diff --git a/submit.html b/submit.html index 44718a8..740a769 100644 --- a/submit.html +++ b/submit.html @@ -133,10 +133,8 @@

  1. The use of the hints_text field is not allowed. See our explanation here.
  2. The result should be pass@1. There should be one execution log per task instance for all 2294 task instances.
  3. +
  4. The result should not be in the "Oracle" retrieval setting. The agent cannot be told the correct files to edit, where "correct" refers to the files modified by the reference solution patch.
-

- We do not enforce the guidelines -

diff --git a/template/template.html b/template/template.html index d02789a..af2b332 100644 --- a/template/template.html +++ b/template/template.html @@ -159,8 +159,16 @@

Leaderboard

- The % Resolved metric refers to the percentage of SWE-bench instances (2294 total) + - The % Resolved metric refers to the percentage of SWE-bench instances (2294 total) that were resolved by the model. +
+ - The leaderboard will be updated once a week on Monday. +
+ - If you would like to submit your model to the leaderboard, please check the submission page. +
+ - All submissions are Pass@1, do not use + hints_text, + and are in the unassisted setting.