-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot use cached dataset without Internet connection (or when servers are down) #6837
Comments
There are 2 workarounds, tho:
Second solution also shows where to find the bug. I suggest that the hashing functions should always use only original parameter |
Hi! You need to set the |
Met a pretty similar issue here, as I manually load the dataset into ~/.cache and try to let |
same here! |
Same issue here, my case is that I need to download the dataset from the login node and run the jobs on the compute node in which the internet is inaccessible, however, the |
Describe the bug
I want to be able to use cached dataset from HuggingFace even when I have no Internet connection (or when HuggingFace servers are down, or my company has network issues).
The problem why I can't use it:
data_files
argument fromdatasets.load_dataset()
function get it updates from the server before calculating hash for caching. As a result, when I run the same code with and without Internet I get different dataset configuration directory name.Steps to reproduce the bug
Expected behavior
When running without the Internet connection, the loader should be able to get dataset from cache
Environment info
datasets
version: 2.19.0huggingface_hub
version: 0.22.2fsspec
version: 2023.12.2The text was updated successfully, but these errors were encountered: