You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jul 4, 2023. It is now read-only.
>>> train = wmt_dataset(train=True)
tar: Error opening archive: Unrecognized archive format
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.9/site-packages/torchnlp/datasets/wmt.py", line 63, in wmt_dataset
download_file_maybe_extract(
File "/usr/local/lib/python3.9/site-packages/torchnlp/download.py", line 170, in download_file_maybe_extract
raise ValueError('[DOWNLOAD FAILED] `*check_files` not found')
ValueError: [DOWNLOAD FAILED] `*check_files` not found
The text was updated successfully, but these errors were encountered:
def_download_file_from_drive(filename, url): # pragma: no cover""" Download filename from google drive unless it's already in directory. Args: filename (str): Name of the file to download to (do nothing if it already exists). url (str): URL to download from. """confirm_token=None# Since the file is big, drive will scan it for virus and take it to a# warning page. We find the confirm token on this page and append it to the# URL to start the download process.confirm_token=Nonesession=requests.Session()
response=session.get(url, stream=True)
fork, vinresponse.cookies.items():
ifk.startswith("download_warning"):
confirm_token=vifconfirm_token:
url=url+"&confirm="+confirm_tokenlogger.info("Downloading %s to %s"% (url, filename))
response=session.get(url, stream=True)
# Now begin the download.chunk_size=16*1024withopen(filename, "wb") asf:
forchunkinresponse.iter_content(chunk_size):
ifchunk:
f.write(chunk)
# Print newline to clear the carriage return from the download progressstatinfo=os.stat(filename)
logger.info("Successfully downloaded %s, %s bytes."% (filename, statinfo.st_size))
I checked the not found *check_files
Result
data/wmt16_en_de/train.tok.clean.bpe.32000.en Extracting data/wmt16_en_de/wmt16_en_de.tar.gz tar: Error opening archive: Unrecognized archive format data/wmt16_en_de/train.tok.clean.bpe.32000.en
'data/wmt16_en_de/wmt16_en_de.tar.gz' file forms HTML document text, ASCII text
Expected Behavior
Actual Behavior
Steps to Reproduce the Problem
from torchnlp.datasets import wmt_dataset
train=wmt_dataset(train=True)
The text was updated successfully, but these errors were encountered: