Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issues of opendalfs 0.1 release #6

Open
2 of 19 tasks
Xuanwo opened this issue Jul 26, 2024 · 16 comments
Open
2 of 19 tasks

Tracking issues of opendalfs 0.1 release #6

Xuanwo opened this issue Jul 26, 2024 · 16 comments

Comments

@Xuanwo
Copy link
Collaborator

Xuanwo commented Jul 26, 2024

This issue is used to track the progress of opendalfs 0.1 release. Welcome to join in the developmenet by leaving your comments here.

  • In 0.1 release, we might not cover all supported service. We will provide memory, fs, s3, azblob, gcs, oss at least.
  • In 0.1 release, we will only have blocking API first. Async API is on our plan.

Tasks

  • Figure out all fsspec APIs that need to implement (maybe refer to s3fs and ossfs)
  • Implement OpendalFileSystem APIs
    • fsid
    • mkdir
    • mkdirs
    • rmdir
    • ls
    • info
    • rm_file
    • _open
    • created
    • modified
  • Implement OpendalBufferedFile APIs
    • _upload_chunk
    • _initiate_upload
    • _fetch_range
  • Figure out how to perform releases
    • I expect to have a package for every different service.
  • Figure out how to do test on different service
  • Add docs for service.
@Xuanwo
Copy link
Collaborator Author

Xuanwo commented Jul 26, 2024

cc @wey-gu and @BeautyyuYanli, are there any features you'd like included in our initial release?

@Xuanwo Xuanwo pinned this issue Jul 26, 2024
@BeautyyuYanli
Copy link
Collaborator

I think no need to have a package for every different service, since the opendal has all in one package

@BeautyyuYanli
Copy link
Collaborator

Some methods have already been implemented in the abstract class. Others we can implement them by opendal-python or directly rust binding. Maybe the package will not depend on opendal-python, but becomes a new Python binding.

@Xuanwo
Copy link
Collaborator Author

Xuanwo commented Jul 27, 2024

I think no need to have a package for every different service, since the opendal has all in one package

Hi, thanks for joining the discussion first.

Please allow me to provide some context before going deeper:

  • the opendal rust core have all services in a single crate (the package in Rust). Users can enable only the services they need, rather than importing unnecessary ones.
  • The python binding for OpenDAL has opted to include as many services as possible. However, this decision has not been well-received by the community because it results in a very large python package. Users are required to install the entire huge package even if they only need access to S3.
    image
  • To further complicate matters, some services require additional dynamic libraries to function. For instance, sqlite requires libsqlite, while hdfs needs both libhdfs and libjvm. We must decide whether to discontinue support for these services or require all users to install the necessary libraries.

In opendalfs, I plan to separate various services into distinct packages, allowing users to selectively install the services they need, such as with pip install opendalfs[s3, azblob]. The implementation details are still being researched; however, I personally believe this is the better approach.

Maybe the package will not depend on opendal-python, but becomes a new Python binding.

I believe we can directly build from opendal rust to better align with fsspec's behavior without additional abstraction.

@Xuanwo
Copy link
Collaborator Author

Xuanwo commented Jul 27, 2024

I have establish the project layout. Adding a service should be as easy as add a simple config: #11

@Xuanwo
Copy link
Collaborator Author

Xuanwo commented Jul 28, 2024

@BeautyyuYanli, I have updated the API that we need to implement and added place holders for them, welcome to take a look.

@martindurant
Copy link
Member

I would like to add to your list:

  • decide on async or blocking API (this might already be done)
  • pick a protocol string, and think about how users should express the inner protocol. For example, "opendal://gdrive/shared_folder/path/file" might be a reasonable way to refer to a path which is to be handled by DAL, but resides in gdrive (or s3, or ...). Or you could introduce separate protocols for each, like "dal_gdrive", or you might want to "register" your implementation to overwrite the known protocols in fsspec, such that "gdrive" refers to DAL's implementation for the rest of the session.

@dongshunyao
Copy link
Contributor

Hello! I would like to contribute to this project.

I have experience with Python and C++, but I am not yet familiar with Rust. I am very willing to learn it for this project.

I am still a beginner in open source. @Xuanwo Could you please assign me some simple and basic tasks to familiarize me with the whole process? I could get started and learn from them.

Thanks to @wey-gu for the guidance!

@Xuanwo
Copy link
Collaborator Author

Xuanwo commented Sep 24, 2024

Hello! I would like to contribute to this project.

I have experience with Python and C++, but I am not yet familiar with Rust. I am very willing to learn it for this project.

I am still a beginner in open source. @Xuanwo Could you please assign me some simple and basic tasks to familiarize me with the whole process? I could get started and learn from them.

Thanks to @wey-gu for the guidance!

Hi, @dongshunyao, nice to meet you! I think we can start with implementing info or mkdir like we do for ls.

@dongshunyao
Copy link
Contributor

Hello! I would like to contribute to this project.
I have experience with Python and C++, but I am not yet familiar with Rust. I am very willing to learn it for this project.
I am still a beginner in open source. @Xuanwo Could you please assign me some simple and basic tasks to familiarize me with the whole process? I could get started and learn from them.
Thanks to @wey-gu for the guidance!

Hi, @dongshunyao, nice to meet you! I think we can start with implementing info or mkdir like we do for ls.

Thank you! I prefer to implement mkdir first. Should I write it in Rust like fs.rs#L18 and in Python like fs.py#L39, and complete the corresponding tests?

@Xuanwo
Copy link
Collaborator Author

Xuanwo commented Sep 24, 2024

Thank you! I prefer to implement mkdir first. Should I write it in Rust like fs.rs#L18 and in Python like fs.py#L39, and complete the corresponding tests?

Exactly!

@dongshunyao
Copy link
Contributor

Thank you! I prefer to implement mkdir first. Should I write it in Rust like fs.rs#L18 and in Python like fs.py#L39, and complete the corresponding tests?

Exactly!

Hi! I am so sorry for the delay in this PR #18, which was caused by my health reasons and some unexpected situations. I implemented the info because this PR #17 implemented the mkdir. Please feel free to look it over and comment. Thank you!

wey-gu added a commit to wey-gu/opendalfs that referenced this issue Dec 28, 2024
Implements core fsspec APIs adds a comprehensive test suite.

Core Implementation Changes:
- Converted sync operations to async due to S3 service limitations:
  * S3 backend doesn't implement blocking operations
  * Moved to tokio Runtime for async operation handling
  * Affected methods: ls(), mkdir(), rmdir(), info(), exists()

Implemented APIs:
- OpendalFileSystem:
  * mkdir(), mkdirs(), rmdir()
  * ls(), info()
  * rm_file()
  * _open()
  * modified()
  * created() (raises NotImplementedError - OpenDAL limitation)
- OpendalBufferedFile (partial):
  * _fetch_range()
  * Basic _upload_chunk()

Test Organization:
- Core functionality tests (tests/core/):
  * Basic operations (test_basic.py)
  * IO operations (test_io.py)
  * Metadata operations (test_metadata.py)
- Backend-specific tests (tests/backends/):
  * S3 virtual directories
  * S3 bucket operations
  * S3 path handling
- Centralized test utilities (utils/s3.py)

Remaining TODOs:
1. Implement fsid()
2. Complete OpendalBufferedFile implementation:
   * Proper _initiate_upload() implementation
   * Enhanced _upload_chunk() with chunking support
   * Add specific tests for buffered operations
3. Add documentation for services
4. Expand test coverage for different services
5. Add service-specific documentation

This PR implements most core filesystem operations but needs additional work
on buffered file operations and their testing.

References: fsspec#6
@wey-gu
Copy link
Collaborator

wey-gu commented Dec 28, 2024

Sorry @martindurant I missed your collaborator invitation, could you or @Xuanwo please help resend it?

I spend some free time making some contributions with the help of LLMs(not a rust coder :-p), to heal myself and find some peace in mind!

Thanks and kindly help review! #19

@wey-gu
Copy link
Collaborator

wey-gu commented Jan 4, 2025

I would like to add to your list:

  • decide on async or blocking API (this might already be done)

@martindurant

We impl both async and blocking APIs following the method name convention rules of fsspec.s3fs as a protocol: async methods starts with _, also the blocking API leverages the fsspec asyn sync wrappers for the blocking methods.

Underlying we wire the pyo3 bridged rust future async methods to _foo methods.

In recent PR #19 :) what do you think?

cc @Xuanwo

@wey-gu
Copy link
Collaborator

wey-gu commented Jan 5, 2025

#20 made the ci work; build, and publish part were added for release.

@martindurant
Copy link
Member

I re-sent your invite, @wey-gu , and made @Xuanwo admin, to be able to add or change other permissions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants