You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The dataset loading code is taking too long. It downloads whole huge datasets (70G wiki, etc) to use just a handful of examples. setting split="train[0:2000]") is not helping since slicing happens only after full download
Suggestions:
download just the first files of the datasets.
replace c4 with allenai/c4: load_dataset("allenai/c4", "allenai--c4", data_files={"train": "en/c4-train.00000-of-01024.json.gz"}, split="train")
replace wiki with wikitext2. load_dataset("wikitext", "wikitext-2-raw-v1", split="train")
The text was updated successfully, but these errors were encountered:
The dataset loading code is taking too long. It downloads whole huge datasets (70G wiki, etc) to use just a handful of examples. setting
split="train[0:2000]")
is not helping since slicing happens only after full downloadSuggestions:
allenai/c4
:load_dataset("allenai/c4", "allenai--c4", data_files={"train": "en/c4-train.00000-of-01024.json.gz"}, split="train")
load_dataset("wikitext", "wikitext-2-raw-v1", split="train")
The text was updated successfully, but these errors were encountered: