-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support public s3 files automatically #67
Support public s3 files automatically #67
Conversation
fe11a3f
to
8e2917a
Compare
example_file = tmp_path / "temp-example.txt" | ||
example_file.write_text("just some example text here") | ||
return example_file | ||
yield example_file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to main changeset, this is just to make the tests faster by only creating the file once since they seemed slow
@pytest.fixture | ||
def dummy_plugin() -> typing.Generator[str, None, None]: | ||
with InstallPackage(package_path=DUMMY_PLUGIN_PATH, package_name=DUMMY_PLUGIN_NAME): | ||
yield DUMMY_PLUGIN_NAME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to main changeset, this is just to make the tests faster by only installing the plugin once and re-using this fixture since they seemed slow
image, | ||
reader, | ||
use_plugin_cache, | ||
fs_kwargs | {"anon": True}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the implication here, then, is that if you passed fs_kwargs={anon:True} in the initial bioImage constructor then s3 reads do work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah 👍 (and I believe were we to always pass True
in the initial constructor then users that need to auth like for private buckets would fail, haven't tested that explicitly yet)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I do wonder if we can handle the exception at an even lower level to avoid needing to run determine_plugin twice, but it's probably not a big deal for performance.
|
Yeah I couldn't find a better place other than inside each individual plug-in and I thought it would be nicer to take the hit here for S3 files rather than be forced to do in in each plugin. Totally down to try something else out though if someone has an idea of where this could go it does seem like a performance bummer to have to straight up retry the read. |
one thought: Can we tell if we need the anon key by checking early for "s3://" in the path and looking at what fs_kwargs were passed in? If anon is omitted, I assume the s3fs just looks for environment variables with credentials or aws credential files. So maybe it's hard for us to make a good guess. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Thanks for fixing up the dummy reader stuff :)
I believe this is true, I think it looks at the same cred file |
Thanks @SeanLeRoy! So glad to see this taken care of |
Link to Relevant Issue
This pull request resolves #61
Description of Changes
When reading from an S3 protocol our current file system handlers require either explicit credentials passed in or an explicit declaration of an "anonymous" reader. This change makes it so we try to read a file given the current
fs_kwargs
(used by the readers for S3) and if that fails we try to read again as an explicitly "anonymous" reader.Additionally I made it so we only create a text file once and only install the dummy-plugin once because I felt running the tests was rather slow and this sped it up IMO.
Testing
Added a unit test, unable to get this to work the S3 file I was provideds3://allencell/aics/emt_timelapse_dataset/data/3500005548_25_all_cells_mask.ome.zarr
bioio-ome-zarr has unreleased changes, when using the newest version of bioio-ome-zarr that is unreleased it works, so I am working on releasing that now but should be GTG after that