Releases: allenai/ir_datasets
Releases · allenai/ir_datasets
v0.5.9: added missing default_text for BEIR (#274)
* added missing default_text for BEIR fixes #273 * bump version
0.5.8
v0.5.8 update version for release
0.5.7
v0.5.7 bump version for release
0.5.6
What's Changed
- TREC DL 2023 Topics by @seanmacavaney in #242
- MS MARCO Passage v2 deduplicated version by @seanmacavaney in #243
- MIRACL by @seanmacavaney in #248
- Use provided id_field by @bpiwowar in #252
- fix: fix the bug of considering the Path as a str when loading from TREC dataset by @yzong12138 in #247
- fix location of msmarco source files and bump version by @seanmacavaney in #257
Full Changelog: v0.5.5...v0.5.6
v0.5.5
What's Changed
- Fix typo by @heinrichreimer in #212
- Fix and add html extractor by @grodino in #201
- py310 by @seanmacavaney in #143
- Remove duplicate bib by @heinrichreimer in #214
- msmarco-passage/dev/2 by @seanmacavaney in #220
- Adding the SARA dataset by @JackMcKechnie in #225
- defaulttext by @seanmacavaney in #226
- [MINOR:TYPO] Update msmarco-passage.yaml by @cakiki in #231
- TREC tip-of-the-tongue by @seanmacavaney in #238
- trec-dl-2022 qrels by @seanmacavaney in #239
New Contributors
- @JackMcKechnie made their first contribution in #225
Full Changelog: v0.5.4...v0.5.5
v0.5.4
What's Changed
- Fix Touché file URLs by @heinrichreimer in #202
- Remove redundant query arg by @heinrichreimer in #203
- istella22/source moved by @seanmacavaney in #204
- Istella22 update links by @seanmacavaney in #205
- Touché 2022 by @heinrichreimer in #211
Full Changelog: v0.5.3...v0.5.4b
v0.5.3
What's Changed
- Add clueweb12 diversity task datasets by @grodino in #198
- Istella22 by @seanmacavaney in #199
- trec-dl-2022 topics and scoreddocs by @seanmacavaney in #200
New Contributors
Full Changelog: v0.5.2...v0.5.3
v0.5.2
New Datasets
- TREC Clinical Trials 2022
- TREC Fair Ranking 2022
- CODEC
Features / Bugfixes
- Fix TREC Genomics Track 2005 description
- Allow downloads to resume for all MSMARCO dataset resources larger than 500MB
- For format support for disks45
Full Changelog: v0.5.1...v0.5.2
v0.5.1
What's Changed
- [MINOR FIX / TYPO] Update trec-robust04.yaml by @cakiki in #137
- .z compression support for robust04 by @seanmacavaney in #139
- moving msmarco-passage scoreddocs around by @seanmacavaney in #142
- mmarco updates (files hosted elsewhere & new version of some sources) by @seanmacavaney in #145
- new data available for mmarco (scoreddocs, docpairs, and dev/small) by @seanmacavaney in #146
- added tripclick/train/hofstaetter-triples by @seanmacavaney in #147
- additional versions of msmarco-passage triples by @seanmacavaney in #149
- mMARCO v2 by @seanmacavaney in #150
- Anchor Text for msmarco-document and msmarco-document-v2 by @seanmacavaney in #155
- mmarco source files renamed by @seanmacavaney in #153
- TREC CAsT 2019, 2020 by @seanmacavaney in #156
- HC4 by @eugene-yang in #158
- LoTTE dataset by @seanmacavaney in #159
- kilt by @seanmacavaney in #161
- some trec 2021 qrels released by @seanmacavaney in #162
- some trec 2021 qrels released by @seanmacavaney in #171
- CODEC by @seanmacavaney in #172
- improved HTML/XML parser, TREC 7 and 8 by @seanmacavaney in #173
- fixed and tested issue affecting some clueweb lookups by @seanmacavaney in #174
- cache hc4 topics/qrels by @seanmacavaney in #176
- wikiclir by @seanmacavaney in #178
- NeuCLIR Collection 1 (documents and HC4-filtered subset) by @eugene-yang in #179
- neuMARCO by @seanmacavaney in #181
New Contributors
- @cakiki made their first contribution in #137
- @eugene-yang made their first contribution in #158
Full Changelog: v0.5.0...v0.5.1
v0.5.0
New Features:
- Metadata is included for all datasets, including record counts, without needing to download or process the data.
- New entity type (
qlogs
) for query log records
New datasets:
- argsme & touche (thanks @heinrichreimer!)
- aol-ia dataset
- tripclick logs
- trec-dl-2021 qrels (active participants only for now)
Miscellaneous:
- No longer updates root logger instance, allowing other applications to easily cusomise logging output from this package
- Updates to documentation