Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matching access method and scheme of returned access URLs #361

Open
hannes-ucsc opened this issue Jul 20, 2021 · 6 comments
Open

Matching access method and scheme of returned access URLs #361

hannes-ucsc opened this issue Jul 20, 2021 · 6 comments

Comments

@hannes-ucsc
Copy link

hannes-ucsc commented Jul 20, 2021

I was surprised to not being able to find any mention in the DRS specification of constraints on the the scheme of the access URLs returned by a DRS server.

Should there be constraints on the contents of the access_url property in each AccessMethod item of a GET /objects/{object_id} response? For example, should the access_url in an AccessMethod with "type": "https" be required to start start with https://? Similarly, should the access_url in an AccessMethod with "type": "gs" be required to start with gs://?

Likewise, should there be similar constraints on the url property of an AccessUrl response to GET /objects/{object_id}/access/{access_id}? For example, if a particular access_id was taken from an AccessMethod with "type": "gs", should the url property of the resulting AccessUrl be required to start with gs://?

As currently written, the specification makes no such constraints. This could potentially allow returning a file: URL for an access method s3. It significantly complicates client implementations, which, I assume, are written with the specific goal to obtain the bytes using a particular protocol. No client I can think of would look for an S3 access method and then dynamically switch to using the local file system to access the bytes.

@ctb
Copy link

ctb commented Oct 22, 2021

hello, side note, we have a similar question in re access_url - at least one client system requires that the access_url be an HTTP URL.

@susheel
Copy link
Member

susheel commented Oct 22, 2021

@hannes-ucsc @ctb I think this is because the OpenAPIv3 spec doesn't provide an easy way to apply data model constraints in the current spec.

OpenAPIv3 only extends JSON-Schema DRAFT-5 spec and hence some of the useful DRAFT-7 keywords (if-then-else) that are unavailable to OpenAPIv3, and hence difficult to model and apply.

With the JSON-Schema DRAFT-07 spec you could apply this contract/constraint using:

"if": {
  "properties": { "type": { "const": "s3" } }
},
"then": {
  "properties": { "access_url": { "pattern": "^s3:\/\/.*$" } }
},

I admit this will get very convoluted with loads of nested if-then-else statements but allows you have schema independence if required. In the AccessMethod and AccessURL case, I feel this independence is not really required as they are both very much dependant on each other.

So working within the DRAFT-05 and OpenAPIv3 spec a clean way to achieve this would be with the following:

...
 "anyOf": [
    { "properties": { "type": { "type": "string", "pattern": "s3" }, "access_url": { "pattern": "^s3:\/\/.*$" }, ...} },
    { "properties": { "type": { "type": "string", "pattern": "gs" }, "access_url": { "pattern": "^gs:\/\/.*$" }, ... } },
   ...
  ]
...

Even though const is part of the DRAFT-05 spec, it is not part of OpenAPIv3 spec, hence the use of a pattern for AccessMethod.type.

For this to work, both AccessMethod and AccessURL need to be part of the same object.

Hope this helps.

@ctb
Copy link

ctb commented Oct 22, 2021

thanks, @susheel. My question: is there a list of AccessMethod and/or URI schemes that must or should be supported for full compatibility? Is there any official guidance on this?

(I'm happy to make this a new issue if you prefer.)

@susheel
Copy link
Member

susheel commented Oct 23, 2021

@ctb Good point, and I agree to having a minimum compliance list of AccessMethods would make sense.

One for the maintainers of the standard I'm afraid. I'm just one of the original contributors to the standard. I would suggest splitting this out into a separate issue, as the minimum supported AccessMethods could be provided via the /service-info endpoint. The main question for the maintainers and community would be what does the minimum set look like - possibly via survey perhaps?

@hannes-ucsc
Copy link
Author

@susheel I don't think we need to necessarily express the constraint in the schema, but the reference documentation should be updated. If people agree that this is a desirable constraint to add, that is. I know one prominent server implementation that currently returns https URLs for the gs access method and it really makes my client implementation hacky.

@jb-adams
Copy link
Member

Happy to see this conversation taking place, I agree there should be alignment between the type and the access_url's scheme. If these 2 attributes should be matching however, it may indicate that type is redundant and could be removed.

We can look at this issue at a future Cloud work stream call if there's a PR. Submitting a PR will trigger a docs build with the proposed changes, making it easier for us to review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants