-
Notifications
You must be signed in to change notification settings - Fork 44
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
tweets2013-ia dataset with TREC microblog 2013-14
- Loading branch information
1 parent
7ef592b
commit 158a0e5
Showing
12 changed files
with
848 additions
and
8 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,244 @@ | ||
<!DOCTYPE html> | ||
<html> | ||
<head> | ||
<link rel="stylesheet" href="main.css" /> | ||
<script src="https://code.jquery.com/jquery-1.12.4.min.js" integrity="sha256-ZosEbRLbNQzLpnKIkEdrPv7lOy9C27hHQ+Xp8a4MxAQ=" crossorigin="anonymous"></script> | ||
<script src="https://code.jquery.com/ui/1.12.1/jquery-ui.min.js" integrity="sha256-VazP97ZCwtekAsvgPBSUwPFKdrwD3unUfSGVYrahUqU=" crossorigin="anonymous"></script> | ||
<script src="main.js"></script> | ||
<meta charset="utf-8" /> | ||
<meta name="viewport" content="width=device-width, initial-scale=1" /> | ||
<meta name="robots" content="noindex,nofollow" /> | ||
<title>Tweets 2013 (Internet Archive) - ir_datasets</title> | ||
<body> | ||
<div class="page"> | ||
|
||
<div class="banner">This documentation is for <strong>master</strong>. See <a href="../tweets2013-ia.html">here</a> for documentation of the current latest version on pypi.</div> | ||
|
||
<div style="position: absolute; top: 4px; left: 4px;"><a href="index.html">← home</a></div> | ||
|
||
<div style="position: absolute; top: 4px; right: 4px;">Github: <a href="https://github.com/allenai/ir_datasets/blob/master/ir_datasets/datasets/tweets2013_ia.py">datasets/tweets2013_ia.py</a></div> | ||
<h1><code>ir_datasets</code>: Tweets 2013 (Internet Archive)</h1> | ||
<div style="font-weight: bold; font-size: 1.1em;">Index</div> | ||
<ol class="index"> | ||
<li><a href="#tweets2013-ia"><kbd>tweets2013-ia</kbd></a></li> | ||
<li><a href="#tweets2013-ia/trec-mb-2013"><kbd><span class="prefix">tweets2013-ia</span>/trec-mb-2013</kbd></a></li> | ||
<li><a href="#tweets2013-ia/trec-mb-2014"><kbd><span class="prefix">tweets2013-ia</span>/trec-mb-2014</kbd></a></li> | ||
</ol> | ||
<hr /> | ||
<div class="dataset" id="tweets2013-ia"> | ||
<h3><kbd class="select"><span class="str">"tweets2013-ia"</kdb></h3> | ||
|
||
<div class="desc"> | ||
<p> A collection of tweets from a 2-month window achived by the Internet Achive. This collection can be a stand-in document collection for the TREC Microblog 2013-14 tasks. (Even though it is not exactly the same collection, <a href="https://cs.uwaterloo.ca/~jimmylin/publications/Sequiera_Lin_SIGIR2017.pdf">Sequiera and Lin</a> show that it it close enough.) </p> <p> This collection is automatically downloaded from the Internet Archive, though download speeds are often slow so it takes some time. ir_datasets constructs a new directory hierarchy during the download process to facilitate fast lookups and slices. </p> <ul> <li>Documents: Tweets</li> <li><a href="https://cs.uwaterloo.ca/~jimmylin/publications/Sequiera_Lin_SIGIR2017.pdf">Information about dataset (paper)</a></li> <li><a href="https://github.com/castorini/Tweets2013-IA">Information about dataset (repository)</a></li> </ul> | ||
</div> | ||
<div class="tabs"> | ||
<a class="tab" target="tweets2013-ia__docs">docs</a> | ||
<div id="tweets2013-ia__docs" class="tab-content"> | ||
<p>Language: <em>multiple/other/unknown</em></p> | ||
<div>Document type:</div> | ||
<div class="type"> | ||
<div class="type-name">TweetDoc: (<span class="kwd">namedtuple</span>)</div> | ||
<ol class="type-fields"> | ||
<li data-tuple-idx="0"><span class="">doc_id</span>: <span class="kwd">str</span></li><li data-tuple-idx="1"><span class="">text</span>: <span class="kwd">str</span></li><li data-tuple-idx="2"><span class="">user_id</span>: <span class="kwd">str</span></li><li data-tuple-idx="3"><span class="">created_at</span>: <span class="kwd">str</span></li><li data-tuple-idx="4"><span class="">lang</span>: <span class="kwd">str</span></li><li data-tuple-idx="5"><span class="">reply_doc_id</span>: <span class="kwd">str</span></li><li data-tuple-idx="6"><span class="">retweet_doc_id</span>: <span class="kwd">str</span></li><li data-tuple-idx="7"><span class="">source</span>: <span class="kwd">bytes</span></li><li data-tuple-idx="8"><span class="">source_content_type</span>: <span class="kwd">str</span></li> | ||
</ol> | ||
</div> | ||
<p>Example</p> | ||
<code class="example"> | ||
<div><span class="kwd">import</span> ir_datasets</div> | ||
<div>dataset = ir_datasets.load(<span class="str">'tweets2013-ia')</div> | ||
<div><span class="kwd">for</span> doc <span class="kwd">in</span> dataset.docs_iter():</div> | ||
<div> doc <span class="comment"># namedtuple<doc_id, text, user_id, created_at, lang, reply_doc_id, retweet_doc_id, source, source_content_type></span></div> | ||
</code> | ||
</div> | ||
|
||
<a class="tab" target="tweets2013-ia__citation">Citation</a> | ||
<div id="tweets2013-ia__citation" class="tab-content"> | ||
bibtex: | ||
<cite class="select">@inproceedings{Sequiera2017Finally, | ||
title={Finally, a Downloadable Test Collection of Tweets}, | ||
author={Royal Sequiera and Jimmy Lin}, | ||
booktitle={SIGIR}, | ||
year={2017} | ||
} | ||
</cite> | ||
</div> | ||
</div> | ||
</div> | ||
|
||
<hr /> | ||
<div class="dataset" id="tweets2013-ia/trec-mb-2013" data-parent="tweets2013-ia"> | ||
<h3><kbd class="ds-name select"><span class="str">"tweets2013-ia/trec-mb-2013"</kdb></h3> | ||
|
||
<div class="desc"> | ||
<p> TREC Microblog 2013 test collection. </p> <ul> <li><a href="https://trec.nist.gov/pubs/trec22/papers/MB.OVERVIEW.pdf">Shared Task Paper</a></li> <li><a href="https://github.com/lintool/twitter-tools/wiki/TREC-2013-Track-Guidelines">Shared Task Site</a></li> </ul> | ||
</div> | ||
<div class="tabs"> | ||
<a class="tab" target="tweets2013-ia/trec-mb-2013__queries">queries</a> | ||
<div id="tweets2013-ia/trec-mb-2013__queries" class="tab-content"> | ||
<p>Language: <span class="lang-code">en</span></p> | ||
<div>Query type:</div> | ||
<div class="type"> | ||
<div class="type-name">TrecMb13Query: (<span class="kwd">namedtuple</span>)</div> | ||
<ol class="type-fields"> | ||
<li data-tuple-idx="0"><span class="">query_id</span>: <span class="kwd">str</span></li><li data-tuple-idx="1"><span class="">query</span>: <span class="kwd">str</span></li><li data-tuple-idx="2"><span class="">time</span>: <span class="kwd">str</span></li><li data-tuple-idx="3"><span class="">tweet_time</span>: <span class="kwd">str</span></li> | ||
</ol> | ||
</div> | ||
<p>Example</p> | ||
<code class="example"> | ||
<div><span class="kwd">import</span> ir_datasets</div> | ||
<div>dataset = ir_datasets.load(<span class="str">'tweets2013-ia/trec-mb-2013')</div> | ||
<div><span class="kwd">for</span> query <span class="kwd">in</span> dataset.queries_iter():</div> | ||
<div> query <span class="comment"># namedtuple<query_id, query, time, tweet_time></span></div> | ||
</code> | ||
</div> | ||
|
||
<a class="tab" target="tweets2013-ia/trec-mb-2013__docs">docs</a> | ||
<div id="tweets2013-ia/trec-mb-2013__docs" class="tab-content"> | ||
<p>Language: <em>multiple/other/unknown</em></p> | ||
<div>Document type:</div> | ||
<div class="type"> | ||
<div class="type-name">TweetDoc: (<span class="kwd">namedtuple</span>)</div> | ||
<ol class="type-fields"> | ||
<li data-tuple-idx="0"><span class="">doc_id</span>: <span class="kwd">str</span></li><li data-tuple-idx="1"><span class="">text</span>: <span class="kwd">str</span></li><li data-tuple-idx="2"><span class="">user_id</span>: <span class="kwd">str</span></li><li data-tuple-idx="3"><span class="">created_at</span>: <span class="kwd">str</span></li><li data-tuple-idx="4"><span class="">lang</span>: <span class="kwd">str</span></li><li data-tuple-idx="5"><span class="">reply_doc_id</span>: <span class="kwd">str</span></li><li data-tuple-idx="6"><span class="">retweet_doc_id</span>: <span class="kwd">str</span></li><li data-tuple-idx="7"><span class="">source</span>: <span class="kwd">bytes</span></li><li data-tuple-idx="8"><span class="">source_content_type</span>: <span class="kwd">str</span></li> | ||
</ol> | ||
</div> | ||
<p>Example</p> | ||
<code class="example"> | ||
<div><span class="kwd">import</span> ir_datasets</div> | ||
<div>dataset = ir_datasets.load(<span class="str">'tweets2013-ia/trec-mb-2013')</div> | ||
<div><span class="kwd">for</span> doc <span class="kwd">in</span> dataset.docs_iter():</div> | ||
<div> doc <span class="comment"># namedtuple<doc_id, text, user_id, created_at, lang, reply_doc_id, retweet_doc_id, source, source_content_type></span></div> | ||
</code> | ||
</div> | ||
|
||
<a class="tab" target="tweets2013-ia/trec-mb-2013__qrels">qrels</a> | ||
<div id="tweets2013-ia/trec-mb-2013__qrels" class="tab-content"> | ||
<div>Query relevance judgment type:</div> | ||
<div class="type"> | ||
<div class="type-name">TrecQrel: (<span class="kwd">namedtuple</span>)</div> | ||
<ol class="type-fields"> | ||
<li data-tuple-idx="0"><span class="">query_id</span>: <span class="kwd">str</span></li><li data-tuple-idx="1"><span class="">doc_id</span>: <span class="kwd">str</span></li><li data-tuple-idx="2"><span class="">relevance</span>: <span class="kwd">int</span></li><li data-tuple-idx="3"><span class="">iteration</span>: <span class="kwd">str</span></li> | ||
</ol> | ||
</div> | ||
<p>Relevance levels</p> | ||
|
||
<table> | ||
<tr><th>Rel.</th><th>Definition</th></tr> | ||
<tr><td class="relScore">0</td><td>not relevant</td></tr> | ||
<tr><td class="relScore">1</td><td>relevant</td></tr> | ||
<tr><td class="relScore">2</td><td>highly relevant</td></tr> | ||
</table> | ||
|
||
<p>Example</p> | ||
<code class="example"> | ||
<div><span class="kwd">import</span> ir_datasets</div> | ||
<div>dataset = ir_datasets.load(<span class="str">'tweets2013-ia/trec-mb-2013')</div> | ||
<div><span class="kwd">for</span> qrel <span class="kwd">in</span> dataset.qrels_iter():</div> | ||
<div> qrel <span class="comment"># namedtuple<query_id, doc_id, relevance, iteration></span></div> | ||
</code> | ||
</div> | ||
|
||
<a class="tab" target="tweets2013-ia/trec-mb-2013__citation">Citation</a> | ||
<div id="tweets2013-ia/trec-mb-2013__citation" class="tab-content"> | ||
bibtex: | ||
<cite class="select">@inproceedings{Lin2013Microblog, | ||
title={Overview of the TREC-2013 Microblog Track}, | ||
author={Jimmy Lin and Miles Efron}, | ||
booktitle={TREC}, | ||
year={2013} | ||
} | ||
</cite> | ||
</div> | ||
</div> | ||
</div> | ||
|
||
<hr /> | ||
<div class="dataset" id="tweets2013-ia/trec-mb-2014" data-parent="tweets2013-ia"> | ||
<h3><kbd class="ds-name select"><span class="str">"tweets2013-ia/trec-mb-2014"</kdb></h3> | ||
|
||
<div class="desc"> | ||
<p> TREC Microblog 2014 test collection. </p> <ul> <li><a href="https://trec.nist.gov/pubs/trec23/papers/overview-microblog.pdf">Shared Task Paper</a></li> <li><a href="https://github.com/lintool/twitter-tools/wiki/TREC-2014-Track-Guidelines">Shared Task Site</a></li> </ul> | ||
</div> | ||
<div class="tabs"> | ||
<a class="tab" target="tweets2013-ia/trec-mb-2014__queries">queries</a> | ||
<div id="tweets2013-ia/trec-mb-2014__queries" class="tab-content"> | ||
<p>Language: <span class="lang-code">en</span></p> | ||
<div>Query type:</div> | ||
<div class="type"> | ||
<div class="type-name">TrecMb14Query: (<span class="kwd">namedtuple</span>)</div> | ||
<ol class="type-fields"> | ||
<li data-tuple-idx="0"><span class="">query_id</span>: <span class="kwd">str</span></li><li data-tuple-idx="1"><span class="">query</span>: <span class="kwd">str</span></li><li data-tuple-idx="2"><span class="">time</span>: <span class="kwd">str</span></li><li data-tuple-idx="3"><span class="">tweet_time</span>: <span class="kwd">str</span></li><li data-tuple-idx="4"><span class="">description</span>: <span class="kwd">str</span></li> | ||
</ol> | ||
</div> | ||
<p>Example</p> | ||
<code class="example"> | ||
<div><span class="kwd">import</span> ir_datasets</div> | ||
<div>dataset = ir_datasets.load(<span class="str">'tweets2013-ia/trec-mb-2014')</div> | ||
<div><span class="kwd">for</span> query <span class="kwd">in</span> dataset.queries_iter():</div> | ||
<div> query <span class="comment"># namedtuple<query_id, query, time, tweet_time, description></span></div> | ||
</code> | ||
</div> | ||
|
||
<a class="tab" target="tweets2013-ia/trec-mb-2014__docs">docs</a> | ||
<div id="tweets2013-ia/trec-mb-2014__docs" class="tab-content"> | ||
<p>Language: <em>multiple/other/unknown</em></p> | ||
<div>Document type:</div> | ||
<div class="type"> | ||
<div class="type-name">TweetDoc: (<span class="kwd">namedtuple</span>)</div> | ||
<ol class="type-fields"> | ||
<li data-tuple-idx="0"><span class="">doc_id</span>: <span class="kwd">str</span></li><li data-tuple-idx="1"><span class="">text</span>: <span class="kwd">str</span></li><li data-tuple-idx="2"><span class="">user_id</span>: <span class="kwd">str</span></li><li data-tuple-idx="3"><span class="">created_at</span>: <span class="kwd">str</span></li><li data-tuple-idx="4"><span class="">lang</span>: <span class="kwd">str</span></li><li data-tuple-idx="5"><span class="">reply_doc_id</span>: <span class="kwd">str</span></li><li data-tuple-idx="6"><span class="">retweet_doc_id</span>: <span class="kwd">str</span></li><li data-tuple-idx="7"><span class="">source</span>: <span class="kwd">bytes</span></li><li data-tuple-idx="8"><span class="">source_content_type</span>: <span class="kwd">str</span></li> | ||
</ol> | ||
</div> | ||
<p>Example</p> | ||
<code class="example"> | ||
<div><span class="kwd">import</span> ir_datasets</div> | ||
<div>dataset = ir_datasets.load(<span class="str">'tweets2013-ia/trec-mb-2014')</div> | ||
<div><span class="kwd">for</span> doc <span class="kwd">in</span> dataset.docs_iter():</div> | ||
<div> doc <span class="comment"># namedtuple<doc_id, text, user_id, created_at, lang, reply_doc_id, retweet_doc_id, source, source_content_type></span></div> | ||
</code> | ||
</div> | ||
|
||
<a class="tab" target="tweets2013-ia/trec-mb-2014__qrels">qrels</a> | ||
<div id="tweets2013-ia/trec-mb-2014__qrels" class="tab-content"> | ||
<div>Query relevance judgment type:</div> | ||
<div class="type"> | ||
<div class="type-name">TrecQrel: (<span class="kwd">namedtuple</span>)</div> | ||
<ol class="type-fields"> | ||
<li data-tuple-idx="0"><span class="">query_id</span>: <span class="kwd">str</span></li><li data-tuple-idx="1"><span class="">doc_id</span>: <span class="kwd">str</span></li><li data-tuple-idx="2"><span class="">relevance</span>: <span class="kwd">int</span></li><li data-tuple-idx="3"><span class="">iteration</span>: <span class="kwd">str</span></li> | ||
</ol> | ||
</div> | ||
<p>Relevance levels</p> | ||
|
||
<table> | ||
<tr><th>Rel.</th><th>Definition</th></tr> | ||
<tr><td class="relScore">0</td><td>not relevant</td></tr> | ||
<tr><td class="relScore">1</td><td>relevant</td></tr> | ||
<tr><td class="relScore">2</td><td>highly relevant</td></tr> | ||
</table> | ||
|
||
<p>Example</p> | ||
<code class="example"> | ||
<div><span class="kwd">import</span> ir_datasets</div> | ||
<div>dataset = ir_datasets.load(<span class="str">'tweets2013-ia/trec-mb-2014')</div> | ||
<div><span class="kwd">for</span> qrel <span class="kwd">in</span> dataset.qrels_iter():</div> | ||
<div> qrel <span class="comment"># namedtuple<query_id, doc_id, relevance, iteration></span></div> | ||
</code> | ||
</div> | ||
|
||
<a class="tab" target="tweets2013-ia/trec-mb-2014__citation">Citation</a> | ||
<div id="tweets2013-ia/trec-mb-2014__citation" class="tab-content"> | ||
bibtex: | ||
<cite class="select">@inproceedings{Lin2014Microblog, | ||
title={Overview of the TREC-2014 Microblog Track}, | ||
author={Jimmy Lin and Miles Efron and Yulu Wang and Garrick Sherman}, | ||
booktitle={TREC}, | ||
year={2014} | ||
} | ||
</cite> | ||
</div> | ||
</div> | ||
</div> | ||
|
||
</div> | ||
</body> | ||
</html> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.