Releases: allenai/ir_datasets
Releases · allenai/ir_datasets
v0.4.3
Added:
trec-fair-2021/eval
topicsclinicaltrials/2021/trec-ct-2021
c4
andc4/en-noclean-tr/trec-misinfo-2021
wikir/en78k
andwikir/ens78k
msmarco-passage-v2/trec-dl-2021
andmsmarco-document-v2/trec-dl-2021
mr-tydi
mmarco
Misc:
- some minor changes to
clean
command - msmarco-passage-v2 lookups now performed by ID instead of lz4
- file linking info not shown when downloading small files
- fixed
cord19/fulltext
- other minor fixes
v0.4.2
Adds the following datasets:
- MS MARCO Passage version 2
- TREC Fair Ranking 2021
A few other minor improvements:
- Progress bars: units + totals in a few more places
- Checks for adequate disk space before big downloads (can be disabled with an environment variable)
v0.4.1
- Adds version 2 of the MS MARCO document collection.
- Using mirror.ir-datsets.com as a fallback for some small files
- More examples in the documentation (the python API is now joined by the CLI and a PyTerrier example)
- Improved bibtex, including a master bib file that can be imported papers (e.g., in overleaf).
- Other minor improvements
v0.4.0
New datasets:
- BEIR suite
- Cranfield
- CLIRMatrix
- DPR-W100
- NQ
- TREC DL Hard
- TREC News
- TripClick
Other:
- Download dashboard
- Improved documentation for non-downloadable datasets
- A beta "more pythonic API"
- Speeding up library load time
- Minor bug fixes, improvements, etc.
v0.3.3
dataset migration bugfix
v0.3.2
v0.3.2 version bump
v0.3.1
bump version for release
v0.3.0
slight updates to documentation code, bump version, rebuild docs
v0.2.0
Now includes language codes for queries and docs
v0.1.7
this should finally work for GH releases to pypi