Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tree method to display tree-like structure of the filesystem #1750

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

CyrilJl
Copy link

@CyrilJl CyrilJl commented Nov 14, 2024

fs.tree(path='/start/folder', display_size=True)

/start/folder
├── folder1
│ ├── file1.txt (1.234MB)
│ └── file2.txt (0.567MB)
└── folder2
└── file3.txt (2.345MB)

>>> fs.tree(path='/start/folder', display_size=True)

/start/folder
├── folder1
│   ├── file1.txt (1.234MB)
│   └── file2.txt (0.567MB)
└── folder2
    └── file3.txt (2.345MB)
@CyrilJl CyrilJl marked this pull request as ready for review November 14, 2024 14:32
Copy link
Member

@martindurant martindurant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nice addition!

I have some suggestions, and this code could really use a test, probably using memoryFS (fixture m).

fsspec/spec.py Outdated
more_message = f"{remaining_count} more item(s) not displayed."
print(f"{prefix}{'└── ' if is_last else '├── '}{more_message}")

except FileNotFoundError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these error message add anything, I would rather not catch the exceptions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood!

fsspec/spec.py Outdated
display_size: bool = False,
prefix: str = "",
is_last: bool = True,
first: bool = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it hard to allow for str output, rather than calling print directly? The easy way to change it from here would be to print to an io.StringIO maybe.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll work on that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you clarify what you mean? If I understand correctly, the tree method would output a str directly, but it would be displayed as a 'flattened' string, showing the \n characters and losing the tree's visual structure. May I allow the user to choose between printing each line on the fly and returning the complete tree as a string?

Copy link
Member

@martindurant martindurant Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you return a string, the user may wish to print that:

print(fs.tree(...))

but could equally push it to some file, console/log or other output. For example, the string output would make a lot of sense in a text UI component.

fsspec/spec.py Outdated
new_prefix = prefix + (" " if is_last_item else "│ ")

name = os.path.basename(item.get('name', ''))
size = f" ({item.get('size', 0) / 2**20:.3f}Mb)" if display_size and item.get('type') == 'file' else ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A helper function to choose between kb, Mb, Gb would be nice.

Also, we might know the total size of directories (in bytes and number of children).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, a function that choses the smartest unit for every displayed line ?
Also, I agree for the total size, I'll work on that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Displaying the total size of directories would require a complete exploration of the folder tree, going beyond the recursion limit, which could be slow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. Showing the number of immediate children might be an option; but if the recursion limit is to be indefinite, or you know you have exhausted the tree depth, then no extra listings would be needed.

@@ -1567,6 +1567,79 @@ def modified(self, path):
"""Return the modified timestamp of a file as a datetime.datetime"""
raise NotImplementedError

def tree(
self,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some format options would be nice, like indent size.

@CyrilJl
Copy link
Author

CyrilJl commented Nov 15, 2024

@martindurant I went along with your proposals. This can be used as follows:

from fsspec import filesystem

fs = filesystem('ftp', host='test.rebex.net', user='demo', password='password')
tree = fs.tree(display_size=True, recursion_limit=3, indent_size=8, max_display=10)
print(tree)

/
├────── pub (1 subfolder)
│       └────── example (16 files)
│               ├────── KeyGenerator.png (35.81 kb)
│               ├────── KeyGeneratorSmall.png (23.47 kb)
│               ├────── ResumableTransfer.png (11.28 kb)
│               ├────── WinFormClient.png (78.12 kb)
│               ├────── WinFormClientSmall.png (17.49 kb)
│               ├────── imap-console-client.png (18.71 kb)
│               ├────── mail-editor.png (16.08 kb)
│               ├────── mail-send-winforms.png (34.58 kb)
│               ├────── mime-explorer.png (47.86 kb)
│               ├────── pocketftp.png (56.66 kb)
│               └────── 6 more item(s) not displayed.
└────── readme.txt (379B)

Or:

fs = filesystem('gs')
tree = fs.tree(path='gs://gcp-public-data-arco-era5', display_size=True, recursion_limit=2, indent_size=8, max_display=5)
print(tree)

gs://gcp-public-data-arco-era5
├────── ar (19 subfolders)
│       ├────── 1959-2022-1h-240x121_equiangular_with_poles_conservative.zarr (3 files, 35 subfolders)
│       ├────── 1959-2022-1h-360x181_equiangular_with_poles_conservative.zarr (3 files, 35 subfolders)
│       ├────── 1959-2022-6h-128x64_equiangular_conservative.zarr (3 files, 42 subfolders)
│       ├────── 1959-2022-6h-128x64_equiangular_with_poles_conservative.zarr (3 files, 42 subfolders)
│       ├────── 1959-2022-6h-1440x721.zarr (3 files, 42 subfolders)
│       └────── 14 more item(s) not displayed.
├────── co (10 subfolders)
│       ├────── model-level-moisture.zarr (3 files, 13 subfolders)
│       ├────── model-level-moisture.zarr-v2 (3 files, 13 subfolders)
│       ├────── model-level-wind.zarr (3 files, 8 subfolders)
│       ├────── model-level-wind.zarr-v2 (3 files, 8 subfolders)
│       ├────── single-level-forecast.zarr (3 files, 28 subfolders)
│       └────── 5 more item(s) not displayed.
└────── raw (4 subfolders)
        ├────── ERA5GRIB (1 subfolder)
        ├────── date-variable-pressure_level (85 subfolders)
        ├────── date-variable-single_level (85 subfolders)
        └────── date-variable-static (1 subfolder)

Is this what you had in mind ?

@martindurant
Copy link
Member

Looks good!

@martindurant
Copy link
Member

It would still be nice to create some minimal test for this, based on files in memoryFS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants