Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include bucket region in domain name of S3 URLs #1853

Closed
jwodder opened this issue Feb 2, 2024 · 4 comments
Closed

Include bucket region in domain name of S3 URLs #1853

jwodder opened this issue Feb 2, 2024 · 4 comments
Assignees

Comments

@jwodder
Copy link
Member

jwodder commented Feb 2, 2024

(This is a low-priority request, but I thought I'd get it out there anyway.)

Currently, the S3 URLs in assets' contentUrl metadata fields have domains of the form {bucket}.s3.amazonaws.com, known as the "legacy global endpoint." However, certain S3 SDKs (such as the official Rust one) require supplying an S3 region in order to query an S3 bucket. While a bucket's region can be found via a HEAD request to https://{bucket}.s3.amazonaws.com, it would be more efficient if this weren't required, i.e., if our S3 URLs had domain names of the form {bucket}.s3.{region}.amazonaws.com as seems to currently be preferred by S3. See https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html for more information.

@waxlamp
Copy link
Member

waxlamp commented Feb 6, 2024

Thanks for the issue report. Can you say why you need the region name?

@jwodder
Copy link
Member Author

jwodder commented Feb 6, 2024

@waxlamp dandidav needs to query S3 to get details on folders & entries in Zarrs, and, as I stated above:

[C]ertain S3 SDKs (such as the official Rust one) require supplying an S3 region in order to query an S3 bucket.

@yarikoptic
Copy link
Member

Good to know! I thought that it was some generic S3 functionality to redirect to underlying region, and possibly to have buckets replicated across regions etc... didn't know that it is specific to "old" regions.

I wonder if it wouldn't cause us some disturbance to change it this late in the game as we already have good number of such URLs "dumped" in a good number of places (e.g., dandiset manifests on S3, datalad dandisets).

@waxlamp
Copy link
Member

waxlamp commented Feb 8, 2024

I don't think there's much we can do about this--we use django-storages with the S3 backend (which in turn uses boto3) to manage our bucket usage, and we directly use the URLs provided by boto3, which are of the form that is suboptimal for your use case, @jwodder.

Suggested workarounds:

  • make a single HEAD call when the service starts up (or maybe once per endpoint call) to determine the bucket location, then "cache" that
  • just stick us-east-2 in an environment variable, since that could be considered part of the static configuration of DANDI

For some extra background info: I did read that article, and then I did some experimentation of my own with buckets in us-east-2 and also in a us-west zone. I was never able to get boto3 to provide me with presigned URLs that had the region name in the URL. I don't know what to make of that, except:

I thought that it was some generic S3 functionality to redirect to underlying region

afaict, this is in fact what is happening. John's linked article mentions that URLs without a region are sent to us-east, and if necessary a 307 response can then redirect you to the correct region, but in my testing I was never able to get a redirect from boto3's URLs. And

if our S3 URLs had domain names of the form {bucket}.s3.{region}.amazonaws.com as seems to currently be preferred by S3

I suppose S3 might "prefer" these URLs in some mild, inferred way, but according my reading of that article, S3 is happy to respond properly to all forms of these URLs. Fortunately (for us, and unfortunately for the S3 folks) I doubt they will deprecate these other forms of the URLs anytime soon. And if they do, we would likely be relying on boto3 to give us better URLs.

@waxlamp waxlamp closed this as not planned Won't fix, can't repro, duplicate, stale Feb 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants