Skip to content

Commit

Permalink
Add notes to leaderboard
Browse files Browse the repository at this point in the history
  • Loading branch information
john-b-yang committed Jun 15, 2024
1 parent f9cfae3 commit b692c58
Show file tree
Hide file tree
Showing 3 changed files with 19 additions and 5 deletions.
10 changes: 9 additions & 1 deletion index.html
Original file line number Diff line number Diff line change
Expand Up @@ -405,8 +405,16 @@ <h2 class="text-title">Leaderboard</h2>
</table>
</div>
<p class="text-content">
The <span style="color:#0ea7ff;"><b>% Resolved</b></span> metric refers to the percentage of SWE-bench instances (2294 total)
- The <span style="color:#0ea7ff;"><b>% Resolved</b></span> metric refers to the percentage of SWE-bench instances (2294 total)
that were <i>resolved</i> by the model.
<br>
- The leaderboard will be updated once a week on Monday.
<br>
- If you would like to submit your model to the leaderboard, please check the <a href="submit.html">submission</a> page.
<br>
- All submissions are Pass@1, do not use
<code style="color:black;background-color:#ddd;border-radius: 0.25em">hints_text</code>,
and are in the unassisted setting.
</p>
</div>
</div>
Expand Down
4 changes: 1 addition & 3 deletions submit.html
Original file line number Diff line number Diff line change
Expand Up @@ -133,10 +133,8 @@ <h3>
<ol>
<li>The use of the <code>hints_text</code> field is <i>not</i> allowed. See our explanation <a href="https://github.com/princeton-nlp/SWE-bench/issues/133">here</a>.</li>
<li>The result should be pass@1. There should be one execution log per task instance for all 2294 task instances.</li>
<li>The result should <i>not</i> be in the "Oracle" retrieval setting. The agent cannot be told the correct files to edit, where "correct" refers to the files modified by the reference solution patch.</li>
</ol>
<p>
We do not enforce the guidelines
</p>
</div>
</div>
<div class="content-wrapper">
Expand Down
10 changes: 9 additions & 1 deletion template/template.html
Original file line number Diff line number Diff line change
Expand Up @@ -159,8 +159,16 @@ <h2 class="text-title">Leaderboard</h2>
</table>
</div>
<p class="text-content">
The <span style="color:#0ea7ff;"><b>% Resolved</b></span> metric refers to the percentage of SWE-bench instances (2294 total)
- The <span style="color:#0ea7ff;"><b>% Resolved</b></span> metric refers to the percentage of SWE-bench instances (2294 total)
that were <i>resolved</i> by the model.
<br>
- The leaderboard will be updated once a week on Monday.
<br>
- If you would like to submit your model to the leaderboard, please check the <a href="submit.html">submission</a> page.
<br>
- All submissions are Pass@1, do not use
<code style="color:black;background-color:#ddd;border-radius: 0.25em">hints_text</code>,
and are in the unassisted setting.
</p>
</div>
</div>
Expand Down

0 comments on commit b692c58

Please sign in to comment.