Collection of WARC files which are compliant to the WARC-1.0 standard and can be used to run regression tests with Bitextor. This release includes three websites that were crawled between January 25 and 28 of 2019. The websites are:
- https://greenpeace.org/canada, which is under Creative Commons Attribution 2.0,
- http://kremlin.ru/, which is under Creative Commons Attribution 4.0,
- https://primeminister.gr/, which is under Creative Commons Attribution-NonDerivatives 4.0
25/11/2022: Added documents.tar.gz
file containing the necessary documents for testing dir2warc.