Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pin fsspec to use default expand_path #1681

Merged
merged 5 commits into from
Sep 21, 2023
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 30 additions & 28 deletions src/huggingface_hub/hf_file_system.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
from urllib.parse import quote, unquote

import fsspec
from packaging import version

from ._commit_api import CommitOperationCopy, CommitOperationDelete
from .constants import DEFAULT_REVISION, ENDPOINT, REPO_TYPE_MODEL, REPO_TYPES_MAPPING, REPO_TYPES_URL_PREFIXES
Expand Down Expand Up @@ -375,34 +376,35 @@ def info(self, path: str, **kwargs) -> Dict[str, Any]:
return {"name": name, "size": 0, "type": "directory"}
return super().info(path, **kwargs)

def expand_path(
self, path: Union[str, List[str]], recursive: bool = False, maxdepth: Optional[int] = None, **kwargs
) -> List[str]:
# The default implementation does not allow passing custom kwargs (e.g., we use these kwargs to propagate the `revision`)
if maxdepth is not None and maxdepth < 1:
raise ValueError("maxdepth must be at least 1")

if isinstance(path, str):
return self.expand_path([path], recursive, maxdepth)

out = set()
path = [self._strip_protocol(p) for p in path]
for p in path:
if has_magic(p):
bit = set(self.glob(p, **kwargs))
out |= bit
if recursive:
out |= set(self.expand_path(list(bit), recursive=recursive, maxdepth=maxdepth, **kwargs))
continue
elif recursive:
rec = set(self.find(p, maxdepth=maxdepth, withdirs=True, detail=False, **kwargs))
out |= rec
if p not in out and (recursive is False or self.exists(p)):
# should only check once, for the root
out.add(p)
if not out:
raise FileNotFoundError(path)
return list(sorted(out))
if version.parse(fsspec.__version__) < version.parse("2023.5.0"):
# The default implementation in fsspec<2023.5.0 does not allow passing custom kwargs (e.g., we use these kwargs to propagate the `revision`)
def expand_path(
self, path: Union[str, List[str]], recursive: bool = False, maxdepth: Optional[int] = None, **kwargs
) -> List[str]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens then on recent versions (fsspec>=2023.5.0)? It defaults back to the expand_path implemented in fsspec directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, expand_path allows passing **kwargs in versions >=2023.5.0 (thanks to fsspec/filesystem_spec@834115f), so there is no need to override it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, so the implementation is just a copy of the official fsspec one.
In your opinion, could we pin fsspec>=2023.5.0 in our dependencies to avoid having to support older versions?
Otherwise if you feel strongly against it, PR looks good to me. I'd just add a comment linking to https://github.com/fsspec/filesystem_spec/blob/4f0eb488f26299592df5cc6494a5ebd8cb8cca4f/fsspec/spec.py#L1139 to say that it's a duplicate (at first I understood that we has a custom implementation here, in addition of the kwargs).

Copy link
Contributor Author

@mariosasko mariosasko Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pinning the minimal version sounds good to me :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For instance, Colab builds the env with version 2023.6.0, so this should have little impact on the user experience.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know! Thanks for making the change.

if maxdepth is not None and maxdepth < 1:
raise ValueError("maxdepth must be at least 1")

if isinstance(path, str):
return self.expand_path([path], recursive, maxdepth)

out = set()
path = [self._strip_protocol(p) for p in path]
for p in path:
if has_magic(p):
bit = set(self.glob(p, **kwargs))
out |= bit
if recursive:
out |= set(self.expand_path(list(bit), recursive=recursive, maxdepth=maxdepth, **kwargs))
continue
elif recursive:
rec = set(self.find(p, maxdepth=maxdepth, withdirs=True, detail=False, **kwargs))
out |= rec
if p not in out and (recursive is False or self.exists(p)):
# should only check once, for the root
out.add(p)
if not out:
raise FileNotFoundError(path)
return list(sorted(out))


class HfFileSystemFile(fsspec.spec.AbstractBufferedFile):
Expand Down
Loading