-
Notifications
You must be signed in to change notification settings - Fork 362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tree method to display tree-like structure of the filesystem #1750
base: master
Are you sure you want to change the base?
Conversation
>>> fs.tree(path='/start/folder', display_size=True) /start/folder ├── folder1 │ ├── file1.txt (1.234MB) │ └── file2.txt (0.567MB) └── folder2 └── file3.txt (2.345MB)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nice addition!
I have some suggestions, and this code could really use a test, probably using memoryFS (fixture m
).
fsspec/spec.py
Outdated
more_message = f"{remaining_count} more item(s) not displayed." | ||
print(f"{prefix}{'└── ' if is_last else '├── '}{more_message}") | ||
|
||
except FileNotFoundError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these error message add anything, I would rather not catch the exceptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood!
fsspec/spec.py
Outdated
display_size: bool = False, | ||
prefix: str = "", | ||
is_last: bool = True, | ||
first: bool = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it hard to allow for str output, rather than calling print
directly? The easy way to change it from here would be to print to an io.StringIO maybe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll work on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you clarify what you mean? If I understand correctly, the tree
method would output a str
directly, but it would be displayed as a 'flattened' string, showing the \n
characters and losing the tree's visual structure. May I allow the user to choose between printing each line on the fly and returning the complete tree as a string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you return a string, the user may wish to print that:
print(fs.tree(...))
but could equally push it to some file, console/log or other output. For example, the string output would make a lot of sense in a text UI component.
fsspec/spec.py
Outdated
new_prefix = prefix + (" " if is_last_item else "│ ") | ||
|
||
name = os.path.basename(item.get('name', '')) | ||
size = f" ({item.get('size', 0) / 2**20:.3f}Mb)" if display_size and item.get('type') == 'file' else "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A helper function to choose between kb, Mb, Gb would be nice.
Also, we might know the total size of directories (in bytes and number of children).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, a function that choses the smartest unit for every displayed line ?
Also, I agree for the total size, I'll work on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Displaying the total size of directories would require a complete exploration of the folder tree, going beyond the recursion limit, which could be slow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. Showing the number of immediate children might be an option; but if the recursion limit is to be indefinite, or you know you have exhausted the tree depth, then no extra listings would be needed.
@@ -1567,6 +1567,79 @@ def modified(self, path): | |||
"""Return the modified timestamp of a file as a datetime.datetime""" | |||
raise NotImplementedError | |||
|
|||
def tree( | |||
self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some format options would be nice, like indent size.
@martindurant I went along with your proposals. This can be used as follows: from fsspec import filesystem
fs = filesystem('ftp', host='test.rebex.net', user='demo', password='password')
tree = fs.tree(display_size=True, recursion_limit=3, indent_size=8, max_display=10)
print(tree)
/
├────── pub (1 subfolder)
│ └────── example (16 files)
│ ├────── KeyGenerator.png (35.81 kb)
│ ├────── KeyGeneratorSmall.png (23.47 kb)
│ ├────── ResumableTransfer.png (11.28 kb)
│ ├────── WinFormClient.png (78.12 kb)
│ ├────── WinFormClientSmall.png (17.49 kb)
│ ├────── imap-console-client.png (18.71 kb)
│ ├────── mail-editor.png (16.08 kb)
│ ├────── mail-send-winforms.png (34.58 kb)
│ ├────── mime-explorer.png (47.86 kb)
│ ├────── pocketftp.png (56.66 kb)
│ └────── 6 more item(s) not displayed.
└────── readme.txt (379B) Or: fs = filesystem('gs')
tree = fs.tree(path='gs://gcp-public-data-arco-era5', display_size=True, recursion_limit=2, indent_size=8, max_display=5)
print(tree)
gs://gcp-public-data-arco-era5
├────── ar (19 subfolders)
│ ├────── 1959-2022-1h-240x121_equiangular_with_poles_conservative.zarr (3 files, 35 subfolders)
│ ├────── 1959-2022-1h-360x181_equiangular_with_poles_conservative.zarr (3 files, 35 subfolders)
│ ├────── 1959-2022-6h-128x64_equiangular_conservative.zarr (3 files, 42 subfolders)
│ ├────── 1959-2022-6h-128x64_equiangular_with_poles_conservative.zarr (3 files, 42 subfolders)
│ ├────── 1959-2022-6h-1440x721.zarr (3 files, 42 subfolders)
│ └────── 14 more item(s) not displayed.
├────── co (10 subfolders)
│ ├────── model-level-moisture.zarr (3 files, 13 subfolders)
│ ├────── model-level-moisture.zarr-v2 (3 files, 13 subfolders)
│ ├────── model-level-wind.zarr (3 files, 8 subfolders)
│ ├────── model-level-wind.zarr-v2 (3 files, 8 subfolders)
│ ├────── single-level-forecast.zarr (3 files, 28 subfolders)
│ └────── 5 more item(s) not displayed.
└────── raw (4 subfolders)
├────── ERA5GRIB (1 subfolder)
├────── date-variable-pressure_level (85 subfolders)
├────── date-variable-single_level (85 subfolders)
└────── date-variable-static (1 subfolder) Is this what you had in mind ? |
Looks good! |
It would still be nice to create some minimal test for this, based on files in memoryFS |
/start/folder
├── folder1
│ ├── file1.txt (1.234MB)
│ └── file2.txt (0.567MB)
└── folder2
└── file3.txt (2.345MB)