Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use list_objects_v2 to check isdir #265

Merged
merged 8 commits into from
Aug 7, 2022
Merged

Use list_objects_v2 to check isdir #265

merged 8 commits into from
Aug 7, 2022

Conversation

fchorney
Copy link
Contributor

@fchorney fchorney commented Aug 2, 2022

isdir for an s3 directory currently calls s3_list_buckets to check if a top level bucket is a directory. I ran into an issue where the role I was using didn't have permissions to call list_buckets.

This change essentially calls S3.list_objects_v2 with max-keys set to 0 as we don't actually need to retrieve any keys. If the bucket exists and you have permissions for it, it will return with no error, but if the bucket doesn't exist we return false or raise an exception if some other error occurred.

Not too sure if this is exactly how we want to go about this (re: the exceptions), so if you have any suggestions let me know. I don't think there should be any issues for using S3.list_objects_v2 instead of s3_list_buckets.

src/s3path.jl Outdated Show resolved Hide resolved
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@rofinn rofinn self-requested a review August 3, 2022 16:03
@rofinn
Copy link
Member

rofinn commented Aug 3, 2022

Since this is just changing the behaviour for root bucket paths I think it's fine. You should add tests for each of the conditions though:

  1. Normal isdir (probably already exists)
  2. No such bucket error
  3. Any other error

@mattBrzezinski
Copy link
Member

I think these changes are fine, but don't really solve the root problem here, which is to add the permissions needed to the role? We're now requiring users to have permissions to ListObjects on a bucket, instead of ListBuckets.

However, I do think those are the more correct permissions to require. I would also tag this as a 0.10.0 release instead of 0.9.X as it could be breaking for some users.

@fchorney
Copy link
Contributor Author

fchorney commented Aug 3, 2022

I think these changes are fine, but don't really solve the root problem here, which is to add the permissions needed to the role? We're now requiring users to have permissions to ListObjects on a bucket, instead of ListBuckets.

However, I do think those are the more correct permissions to require. I would also tag this as a 0.10.0 release instead of 0.9.X as it could be breaking for some users.

You already need ListObjects to check exists(fp) which happens on a non root bucket, so in this case you would theoretically already have ListObjects if you're calling readdir or isdir

@mattBrzezinski
Copy link
Member

You already need ListObjects to check exists(fp) which happens on a non root bucket, so in this case you would theoretically already have ListObjects if you're calling readdir or isdir

Oh rad, I missed that! Just needs test cases and it should be good to go!

Side thought, made we should include in our documentation the permissions users would need to run these functions?

src/s3path.jl Outdated Show resolved Hide resolved
test/runtests.jl Show resolved Hide resolved
test/s3path.jl Outdated
@test isdir(path) == true

# No Such Bucket
path = S3Path("s3://some_non_existent_bucket_7051649213"; config=config)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make this an actual AWSException w/ the NoSuchBucket error so we're not dependent on someone not making this an actual bucket.

@fchorney
Copy link
Contributor Author

fchorney commented Aug 4, 2022

bors try

bors bot added a commit that referenced this pull request Aug 4, 2022
@bors
Copy link
Contributor

bors bot commented Aug 4, 2022

test/s3path.jl Outdated Show resolved Hide resolved
test/s3path.jl Outdated
Comment on lines 427 to 431
code,
"",
nothing,
AWS.HTTP.Exceptions.StatusError(404, "", "", ""),
nothing,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
code,
"",
nothing,
AWS.HTTP.Exceptions.StatusError(404, "", "", ""),
nothing,
code, "", nothing, AWS.HTTP.Exceptions.StatusError(404, "", "", ""), nothing

test/s3path.jl Outdated
Comment on lines 444 to 446
patch = @patch AWSS3.S3.list_objects_v2(
args...; kwargs...
) = throw(test_exception)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
patch = @patch AWSS3.S3.list_objects_v2(
args...; kwargs...
) = throw(test_exception)
patch = @patch function AWSS3.S3.list_objects_v2(args...; kwargs...)
throw(test_exception)
end

test/s3path.jl Outdated
Comment on lines 455 to 457
patch = @patch AWSS3.S3.list_objects_v2(
args...; kwargs...
) = throw(test_exception)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
patch = @patch AWSS3.S3.list_objects_v2(
args...; kwargs...
) = throw(test_exception)
patch = @patch function AWSS3.S3.list_objects_v2(args...; kwargs...)
throw(test_exception)
end

Copy link
Member

@rofinn rofinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine to me. I do think we should bump the minor release though.

Project.toml Outdated
@@ -1,6 +1,6 @@
name = "AWSS3"
uuid = "1c724243-ef5b-51ab-93f4-b0a88ac62a95"
version = "0.9.8"
version = "0.9.9"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically we're changing behaviour that folks could depend on?

Suggested change
version = "0.9.9"
version = "0.10.0"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm somewhat unsure if that's the case, what behaviour could someone depend on? Theoretically you already need list_objects permissions to do an isdir on a non root bucket. I guess this could potentially be an issue for someone that is only doing an isdir on a root bucket somehow? I'm fine to bump the minor if you still think that should be the case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bumped the version regardless

@fchorney
Copy link
Contributor Author

fchorney commented Aug 4, 2022

bors r+

bors bot added a commit that referenced this pull request Aug 4, 2022
265: Use list_objects_v2 to check isdir r=fchorney a=fchorney

isdir for an s3 directory currently calls `s3_list_buckets` to check if a top level bucket is a directory. I ran into an issue where the role I was using didn't have permissions to call `list_buckets`.

This change essentially calls `S3.list_objects_v2` with `max-keys` set to `0` as we don't actually need to retrieve any keys. If the bucket exists and you have permissions for it, it will return with no error, but if the bucket doesn't exist we return false or raise an exception if some other error occurred. 

Not too sure if this is exactly how we want to go about this (re: the exceptions), so if you have any suggestions let me know. I don't think there should be any issues for using `S3.list_objects_v2` instead of `s3_list_buckets`. 

Co-authored-by: Fernando Chorney <[email protected]>
Co-authored-by: Fernando Chorney <[email protected]>
@bors
Copy link
Contributor

bors bot commented Aug 4, 2022

Build failed:

@fchorney
Copy link
Contributor Author

fchorney commented Aug 4, 2022

This is frustrating. AWSS3.jl depends on Minio.jl for tests, but Minio.jl depends on AWSS3.jl and its deps don't allow for anything above 0.9

@ExpandingMan
Copy link
Contributor

I would have to not only push the change to master but actually tag the commit for this to resolve the issue here. Can I get some assurances that this is what will be tagged as 0.10 before I do this? I don't think this is the typical procedure for how this is done, but none are ideal so I will go along with it, I just don't want to have Minio.jl sitting around with 0.10 and then somebody tag a different 0.10 where it breaks.

@ExpandingMan
Copy link
Contributor

ExpandingMan commented Aug 4, 2022

Also, FYI, I am getting my own CI/CD failures on Minio.jl right now and I have no clue why. I have successful tests locally with multiple different binaries so I don't know what's going on.

I'm going to try updating the package to use minio_jll.

@fchorney
Copy link
Contributor Author

fchorney commented Aug 5, 2022

Minio.jl sitting around with 0.10 and then somebody tag a different 0.10 where it breaks

This needs to get in ASAP for something I am working on, so I think it makes sense that this will definitely be the 0.10.0 release.

@mattBrzezinski
Copy link
Member

Minio.jl sitting around with 0.10 and then somebody tag a different 0.10 where it breaks

This needs to get in ASAP for something I am working on, so I think it makes sense that this will definitely be the 0.10.0 release.

Tagging this as a 0.10.0 release means we're stuck in this awkward situation when we want to do a 0.11.0 release. I'm not sure what people think, but I'm fine with moving forward to a 1.0.0 release here. The package is stable enough, and we've agreed upon being fine with doing major version bumps more frequently with breaking changes.

This would punt the need for this awkward version bump except in the case of a 2.0.0 release, but I do not foresee anyone having the time to do a major re-write of this package any time soon.

Any thoughts? @fchorney @rofinn @ExpandingMan

@fchorney
Copy link
Contributor Author

fchorney commented Aug 5, 2022

sure I'm fine with going to 1.0.0

@rofinn
Copy link
Member

rofinn commented Aug 5, 2022

I'm not sure what people think, but I'm fine with moving forward to a 1.0.0 release here.

Yeah, I think the core behaviour hasn't changed much in the last couple years and I don't think we have a roadmap for major breaking changes. If we ever get around to doing the FilePathsBase redesign we've been discussing for a while that can just be a 2.0 release.

https://gitlab.com/ExpandingMan/FilePaths2.jl

awkward situation when we want to do a 0.11.0 release

I don't think the awkward dependency situation should guide our decision on tagging a 1.0 release though. We should probably plan to resolve the package interdependencies regardless of which release we do. We should either merge minio into a module in AWSS3 or limit AWSS3 to a test dependency for minio?

@fchorney
Copy link
Contributor Author

fchorney commented Aug 5, 2022

So in that case, we'd need minio.jl to allow AWSS3.jl v1 and then the tests here can pass and be merged in.

@ExpandingMan
Copy link
Contributor

ExpandingMan commented Aug 5, 2022

Minio.jl

I have just tagged Minio.jl 0.2.0 which nominally includes support for AWSS3.jl. Unfortunately it seems that the minio devs are on a warpath against single-node use cases which drastically limits its usefulness for unit testing. Besides what was mentioned in that thread, I have also noticed that minio is now rather idiosyncratic about what directory it wants to mount, even when empty; when trying to rewrite the unit tests I notice that it freaks out if you give it absolute paths like /tmp/somepath. I don't think any of the AWSS3.jl tests use pre-existing directories, but there may be some subtle changes needed somewhere to get tests passing again.

Sadly at this point I would not be totally surprised if minio ultimately became totally useless for us here, but there are not many alternatives. On the other hand, they have hit 1.0 so it'll likely be a while before they make further breaking changes.

tagging of AWSS3.jl

I strongly recommend we wait on tagging a 1.0 of this. In my opinion this package still has major issues that are unresolved, in particular we still lack a file path abstraction which is appropriate for all use cases. Some of my previous comments on that can be found here. In my opinion we still very much need a new or completely overhauled file paths package at which point the many "gotchas" in the weirdly idiosyncratic AWSS3.jl code can be eliminated.

By the way, I have not been working on FilePaths2.jl (either there or as an overhaul of FilePathsBase.jl). I'm satisfied I have figured out what is basically the "correct" approach, however there are still many engineering subtleties to be resolved, in particular syncing (i.e. between remote file systems and local path object). If there is a broad consensus that we really do need to get this done I might consider picking it up again, I was rather hesitant to put in a ton of effort just to wind up splitting the ecosystem in 2 or 3 pieces.

@fchorney
Copy link
Contributor Author

fchorney commented Aug 5, 2022

bors try

bors bot added a commit that referenced this pull request Aug 5, 2022
@bors
Copy link
Contributor

bors bot commented Aug 5, 2022

@fchorney
Copy link
Contributor Author

fchorney commented Aug 5, 2022

since I need this to go in, is everyone fine with just keeping this 0.10 and we can do a 1.0 release later?

@fchorney
Copy link
Contributor Author

fchorney commented Aug 7, 2022

bors r+

@bors
Copy link
Contributor

bors bot commented Aug 7, 2022

@bors bors bot merged commit 89493b5 into master Aug 7, 2022
@bors bors bot deleted the fc/readdir branch August 7, 2022 00:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants