Skip to content

Commit

Permalink
v0.3.2 version bump
Browse files Browse the repository at this point in the history
  • Loading branch information
seanmacavaney committed Mar 12, 2021
1 parent bb8d444 commit 749ab38
Show file tree
Hide file tree
Showing 40 changed files with 16,816 additions and 23 deletions.
99 changes: 99 additions & 0 deletions docs/cli.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
<!DOCTYPE html>
<html>
<head>
<link rel="stylesheet" href="main.css" />
<script src="https://code.jquery.com/jquery-1.12.4.min.js" integrity="sha256-ZosEbRLbNQzLpnKIkEdrPv7lOy9C27hHQ+Xp8a4MxAQ=" crossorigin="anonymous"></script>
<script src="https://code.jquery.com/ui/1.12.1/jquery-ui.min.js" integrity="sha256-VazP97ZCwtekAsvgPBSUwPFKdrwD3unUfSGVYrahUqU=" crossorigin="anonymous"></script>
<script src="main.js"></script>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />

<title>Command Line Interface - ir_datasets</title>
<body>
<div class="page">

<div style="position: absolute; top: 4px; left: 4px;"><a href="index.html">&larr; home</a></div>

<div style="position: absolute; top: 4px; right: 4px;">Github: <a href="https://github.com/allenai/ir_datasets/">allenai/ir_datasets</a></div>
<h1><code>ir_datasets</code>: Command Line Interface</h1>
<h2 id="export">export command</h2>

<p>
Data can be exported to stdout in various formats using the <code>ir_datasets export</code> command.
</p>

<h4><code>ir_datasts export [dataset-id] docs [--fields] [--format]</code></h4>

<div class="methodinfo">
<p>Exports documents</p>
<p><code>--fields</code>: select which fields from the document to export (defaults to all)</p>
<p><code>--format</code>: select output format to use: <code>tsv</code> (default) or <code>jsonl</code></p>
</div>

<h4><code>ir_datasts export [dataset-id] queries [--fields] [--format]</code></h4>

<div class="methodinfo">
<p>Exports queries</p>
<p><code>--fields</code>: select which fields from the query to export (defaults to all)</p>
<p><code>--format</code>: select output format to use: <code>tsv</code> (default) or <code>jsonl</code></p>
</div>

<h4><code>ir_datasts export [dataset-id] qrels [--fields] [--format]</code></h4>

<div class="methodinfo">
<p>Exports queries</p>
<p><code>--fields</code>: select which fields from the qrels to export (defaults to all)</p>
<p><code>--format</code>: select output format to use: <code>trec</code> (default), <code>tsv</code> or <code>jsonl</code></p>
</div>

<h4><code>ir_datasts export [dataset-id] scoreddocs [--fields] [--format]</code></h4>

<div class="methodinfo">
<p>Exports queries</p>
<p><code>--fields</code>: select which fields from the scoreddocs to export (defaults to all)</p>
<p><code>--format</code>: select output format to use: <code>trec</code> (default), <code>tsv</code> or <code>jsonl</code></p>
</div>

<h2 id="export">lookup command</h2>

<p>
You can look up documents by their <code>doc_id</code> using the <code>ir_datasets lookup</code> command.
</p>

<h4><code>ir_datasts lookup [dataset-id] [doc_ids ...] [--fields] [--format]</code></h4>

<div class="methodinfo">
<p>Efficiently finds documents that have the provided doc_ids</p>
<p><code>--fields</code>: select which fields from the documents to export (defaults to all)</p>
<p><code>--format</code>: select output format to use: <code>trec</code> (default), <code>tsv</code> or <code>jsonl</code></p>
</div>


<h2 id="export">doc_fifo command</h2>

<p>
You can create output FIFOs suitable for Anserini indexing using the <code>ir_datasets doc_fifo</code> command.
</p>

<p>
Note that unlike export and lookup, these always output as JSONL in a format that Anserini can use to index
(id and content fields). All selected fields are concatenated.
</p>

<p>
This command will output a command you can run for indexing with Anserini. This process remains running
until all documents are sent to fifos.
</p>

<h4><code>ir_datasts doc_fifos [dataset-id] [--fields] [--count]</code></h4>

<div class="methodinfo">
<p>Creates a temporary directory with fifos</p>
<p><code>--fields</code>: select which fields from the documents to export (defaults to all). These fields are concatenated.</p>
<p><code>--count</code>: how many fifos to make? Defualts to 1 less than the number of processors (or 1).</p>
<p><code>--dir</code>: where to put the fifos? Defaults to a new temp directory.</p>
</div>

</div>
</body>
</html>
447 changes: 446 additions & 1 deletion docs/cord19.html

Large diffs are not rendered by default.

9 changes: 8 additions & 1 deletion docs/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,9 @@ <h2>Getting Started</h2>
<p>Guides:</p>

<ul>
<li><a href="https://colab.research.google.com/github/allenai/ir_datasets/blob/master/examples/ir_datasets.ipynb">Colab Tutorial</a></li>
<li>Colab Tutorials: <a href="https://colab.research.google.com/github/allenai/ir_datasets/blob/master/examples/ir_datasets.ipynb">python</a>, <a href="https://colab.research.google.com/github/allenai/ir_datasets/blob/master/examples/ir_datasets_cli.ipynb">CLI</a></li>
<li><a href="python.html">Python API Documentation</a></li>
<li><a href="cli.html">CLI Documentation</a></li>
</ul>

<h2>Dataset Index</h2>
Expand Down Expand Up @@ -116,6 +117,11 @@ <h2>Dataset Index</h2>
<tr><td><a href="cord19.html#cord19/fulltext"><kbd><span class="prefix">cord19</span>/fulltext</kbd></a></td><td class="center"><span style="cursor: help;" title="docs available as automatic download"></span></td><td class="center"></td><td class="center"></td><td class="center screen-small-hide"></td><td class="center screen-small-hide"></td></tr>
<tr><td><a href="cord19.html#cord19/fulltext/trec-covid"><kbd><span class="prefix">cord19</span>/fulltext/trec-covid</kbd></a></td><td class="center"><span style="cursor: help;" title="docs available as automatic download"></span></td><td class="center"><span style="cursor: help;" title="queries available as automatic download"></span></td><td class="center"><span style="cursor: help;" title="qrels available as automatic download"></span></td><td class="center screen-small-hide"></td><td class="center screen-small-hide"></td></tr>
<tr><td><a href="cord19.html#cord19/trec-covid"><kbd><span class="prefix">cord19</span>/trec-covid</kbd></a></td><td class="center"><span style="cursor: help;" title="docs available as automatic download"></span></td><td class="center"><span style="cursor: help;" title="queries available as automatic download"></span></td><td class="center"><span style="cursor: help;" title="qrels available as automatic download"></span></td><td class="center screen-small-hide"></td><td class="center screen-small-hide"></td></tr>
<tr><td><a href="cord19.html#cord19/trec-covid/round1"><kbd><span class="prefix">cord19</span>/trec-covid/round1</kbd></a></td><td class="center"><span style="cursor: help;" title="docs available as automatic download"></span></td><td class="center"><span style="cursor: help;" title="queries available as automatic download"></span></td><td class="center"><span style="cursor: help;" title="qrels available as automatic download"></span></td><td class="center screen-small-hide"></td><td class="center screen-small-hide"></td></tr>
<tr><td><a href="cord19.html#cord19/trec-covid/round2"><kbd><span class="prefix">cord19</span>/trec-covid/round2</kbd></a></td><td class="center"><span style="cursor: help;" title="docs available as automatic download"></span></td><td class="center"><span style="cursor: help;" title="queries available as automatic download"></span></td><td class="center"><span style="cursor: help;" title="qrels available as automatic download"></span></td><td class="center screen-small-hide"></td><td class="center screen-small-hide"></td></tr>
<tr><td><a href="cord19.html#cord19/trec-covid/round3"><kbd><span class="prefix">cord19</span>/trec-covid/round3</kbd></a></td><td class="center"><span style="cursor: help;" title="docs available as automatic download"></span></td><td class="center"><span style="cursor: help;" title="queries available as automatic download"></span></td><td class="center"><span style="cursor: help;" title="qrels available as automatic download"></span></td><td class="center screen-small-hide"></td><td class="center screen-small-hide"></td></tr>
<tr><td><a href="cord19.html#cord19/trec-covid/round4"><kbd><span class="prefix">cord19</span>/trec-covid/round4</kbd></a></td><td class="center"><span style="cursor: help;" title="docs available as automatic download"></span></td><td class="center"><span style="cursor: help;" title="queries available as automatic download"></span></td><td class="center"><span style="cursor: help;" title="qrels available as automatic download"></span></td><td class="center screen-small-hide"></td><td class="center screen-small-hide"></td></tr>
<tr><td><a href="cord19.html#cord19/trec-covid/round5"><kbd><span class="prefix">cord19</span>/trec-covid/round5</kbd></a></td><td class="center"><span style="cursor: help;" title="docs available as automatic download"></span></td><td class="center"><span style="cursor: help;" title="queries available as automatic download"></span></td><td class="center"><span style="cursor: help;" title="qrels available as automatic download"></span></td><td class="center screen-small-hide"></td><td class="center screen-small-hide"></td></tr>
</tbody><tbody><tr><td><a style="font-weight: bold;" href="gov.html"><kbd>gov</kbd></a></li></td><td class="center"><span style="cursor: help;" title="docs available from UoG">⚠️</span></td><td class="center"></td><td class="center"></td><td class="center screen-small-hide"></td><td class="center screen-small-hide"></td></tr>
<tr><td><a href="gov.html#gov/trec-web-2002"><kbd><span class="prefix">gov</span>/trec-web-2002</kbd></a></td><td class="center"><span style="cursor: help;" title="docs available from UoG">⚠️</span></td><td class="center"><span style="cursor: help;" title="queries available as automatic download"></span></td><td class="center"><span style="cursor: help;" title="qrels available as automatic download"></span></td><td class="center screen-small-hide"></td><td class="center screen-small-hide"></td></tr>
<tr><td><a href="gov.html#gov/trec-web-2002/named-page"><kbd><span class="prefix">gov</span>/trec-web-2002/named-page</kbd></a></td><td class="center"><span style="cursor: help;" title="docs available from UoG">⚠️</span></td><td class="center"><span style="cursor: help;" title="queries available as automatic download"></span></td><td class="center"><span style="cursor: help;" title="qrels available as automatic download"></span></td><td class="center screen-small-hide"></td><td class="center screen-small-hide"></td></tr>
Expand Down Expand Up @@ -242,6 +248,7 @@ <h2>Other Versions</h2>
<li><a href="master/index.html">master</a></li>
<li><a href="v0.3.0/index.html">v0.3.0</a></li>
<li><a href="v0.3.1/index.html">v0.3.1</a></li>
<li><a href="v0.3.2/index.html">v0.3.2</a></li>
</ul>

</div>
Expand Down
10 changes: 5 additions & 5 deletions docs/msmarco-passage.html
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ <h1><code>ir_datasets</code>: MSMARCO (passage)</h1>
<h3><kbd class="select"><span class="str">"msmarco-passage"</kdb></h3>

<div class="desc">
<p> A passage ranking benchmark with a collection of 8.8 million passages and question queries. Most relevance judgments are shallow (typically at most 1-2 per query), but the TREC Deep Learning track adds deep judgments. Evaluation typically conducted using MRR@10. </p> <ul> <li>See also: <a class="ds-ref">msmarco-document</a></li> <li>Documents: Short passages (from web)</li> <li>Queries: Natural language questions (from query log)</li> <li><a href="https://microsoft.github.io/msmarco/#ranking">Leaderboard</a></li> <li><a href="https://arxiv.org/abs/1611.09268">Dataset Paper</a></li> </ul>
<p> A passage ranking benchmark with a collection of 8.8 million passages and question queries. Most relevance judgments are shallow (typically at most 1-2 per query), but the TREC Deep Learning track adds deep judgments. Evaluation typically conducted using MRR@10. </p> <p> Note that the original document source files for this collection contain a double-encoding error that cause strange sequences like "å¬" and "ðºð". These are automatically corrrected (properly converting previous examples to "公" and "🇺🇸"). </p> <ul> <li>See also: <a class="ds-ref">msmarco-document</a></li> <li>Documents: Short passages (from web)</li> <li>Queries: Natural language questions (from query log)</li> <li><a href="https://microsoft.github.io/msmarco/#ranking">Leaderboard</a></li> <li><a href="https://arxiv.org/abs/1611.09268">Dataset Paper</a></li> </ul>
</div>
<div class="tabs">
<a class="tab" target="msmarco-passage__docs">docs</a>
Expand Down Expand Up @@ -1062,7 +1062,7 @@ <h3><kbd class="ds-name select"><span class="str">"msmarco-passage/trec-dl-2019"
<tr><th>Rel.</th><th>Definition</th></tr>
<tr><td class="relScore">0</td><td>Irrelevant: The passage has nothing to do with the query.</td></tr>
<tr><td class="relScore">1</td><td>Related: The passage seems related to the query but does not answer it.</td></tr>
<tr><td class="relScore">2</td><td>Highly relevant: The passage has some answer for the query, but the answer may be a bit unclear, or hiddenamongst extraneous information.</td></tr>
<tr><td class="relScore">2</td><td>Highly relevant: The passage has some answer for the query, but the answer may be a bit unclear, or hidden amongst extraneous information.</td></tr>
<tr><td class="relScore">3</td><td>Perfectly relevant: The passage is dedicated to the query and contains the exact answer.</td></tr>
</table>

Expand Down Expand Up @@ -1168,7 +1168,7 @@ <h3><kbd class="ds-name select"><span class="str">"msmarco-passage/trec-dl-2019/
<tr><th>Rel.</th><th>Definition</th></tr>
<tr><td class="relScore">0</td><td>Irrelevant: The passage has nothing to do with the query.</td></tr>
<tr><td class="relScore">1</td><td>Related: The passage seems related to the query but does not answer it.</td></tr>
<tr><td class="relScore">2</td><td>Highly relevant: The passage has some answer for the query, but the answer may be a bit unclear, or hiddenamongst extraneous information.</td></tr>
<tr><td class="relScore">2</td><td>Highly relevant: The passage has some answer for the query, but the answer may be a bit unclear, or hidden amongst extraneous information.</td></tr>
<tr><td class="relScore">3</td><td>Perfectly relevant: The passage is dedicated to the query and contains the exact answer.</td></tr>
</table>

Expand Down Expand Up @@ -1262,7 +1262,7 @@ <h3><kbd class="ds-name select"><span class="str">"msmarco-passage/trec-dl-2020"
<tr><th>Rel.</th><th>Definition</th></tr>
<tr><td class="relScore">0</td><td>Irrelevant: The passage has nothing to do with the query.</td></tr>
<tr><td class="relScore">1</td><td>Related: The passage seems related to the query but does not answer it.</td></tr>
<tr><td class="relScore">2</td><td>Highly relevant: The passage has some answer for the query, but the answer may be a bit unclear, or hiddenamongst extraneous information.</td></tr>
<tr><td class="relScore">2</td><td>Highly relevant: The passage has some answer for the query, but the answer may be a bit unclear, or hidden amongst extraneous information.</td></tr>
<tr><td class="relScore">3</td><td>Perfectly relevant: The passage is dedicated to the query and contains the exact answer.</td></tr>
</table>

Expand Down Expand Up @@ -1368,7 +1368,7 @@ <h3><kbd class="ds-name select"><span class="str">"msmarco-passage/trec-dl-2020/
<tr><th>Rel.</th><th>Definition</th></tr>
<tr><td class="relScore">0</td><td>Irrelevant: The passage has nothing to do with the query.</td></tr>
<tr><td class="relScore">1</td><td>Related: The passage seems related to the query but does not answer it.</td></tr>
<tr><td class="relScore">2</td><td>Highly relevant: The passage has some answer for the query, but the answer may be a bit unclear, or hiddenamongst extraneous information.</td></tr>
<tr><td class="relScore">2</td><td>Highly relevant: The passage has some answer for the query, but the answer may be a bit unclear, or hidden amongst extraneous information.</td></tr>
<tr><td class="relScore">3</td><td>Perfectly relevant: The passage is dedicated to the query and contains the exact answer.</td></tr>
</table>

Expand Down
Loading

0 comments on commit 749ab38

Please sign in to comment.