Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A discussion on needed
/paths
properties and metadata #1851base: master
Are you sure you want to change the base?
A discussion on needed
/paths
properties and metadata #1851Changes from 3 commits
c87cc8c
24e1188
e3c4500
9994f2c
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what for do we need blob_id if we have contentUrl?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ThecontentUrl
returned in.../assets/paths/
responses is a plain S3 URL, and serving it to the user in a web browser would result in the file being downloaded and named with the blob ID. In order for the downloaded file to be named with the asset's filename instead, we need a signed S3 URL, which is obtained via the API download URL in thecontentUrl
field of the metadata.EDIT: Sorry, I misinterpreted the question. For blob ID, the code just checks whether
blob_id
orzarr_id
is non-None in order to determine whether the asset is a blob or a Zarr.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, so we the actually need
result_type
(Folder, AssetBlob, AssetZarr) as I suggested in #1837 (comment) .There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess
zarr_id
is needed for traversal etc? or could contentUrl version be used? (probably we do not want to rely on parsing it).So it feels that we might ask for asset record to include
blob_id
(although not sure what for yet exactly) andzarr_id
, henceThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dandidav
does parse S3contentUrl
s, as that's necessary in order to discover the name of the S3 bucket without hardcoding it or requiring it to be passed on the command line.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so in principle we can just parse out bucket, zarr_id and blob_id from contenturl, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but I'd rather dandidav not have to make too many assumptions about how our S3 URLs are structured.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree. We would only need to be able to construct zarr "http s3" URLs from manifests so it would need to know that , right?
For blobs -- contentUrl should be good enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that asset metadata lists two
contentUrl
s, an unsigned S3 URL and an API download URL of the formhttps://api.dandiarchive.org/api/assets/{asset_id}/download/
.For Zarr assets,
dandidav
needs the S3 URL in order to know how to query the Zarr's entries from S3.However, for blob assets,
dandidav
currently uses the API download URL, as it provides a better UX. Specifically, ifdandidav
were to redirectGET
requests for a blob to the blob's unsigned S3 URL, then the browser would save the response to a file named after the last component of the S3 URL, which is (always?) the blob ID. In order to get the downloaded files to be named after the filename portion of the asset's path instead, a signed S3 URL with a Content-Disposition has to be used, and that is acquired by redirecting to the API download URL, which in turn redirects to such a signed S3 URL.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what this one for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Etag (reported in
PROPFIND
responses)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in principle could be any other checksum I guess , right?
note: I thought that besides ETag functionality a commonly known checksum could then be used by receiving end to validate the download if that would be the target purpose for such a listing.
for versioned zarrs we would need
"dandi:dandi-zarr-checksum"
to use in conjunction with thezarr_id
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the
dandi:dandi-etag
is the same as the ETag header reported by S3 when downloading a blob asset. In contrast, there's no single file that has adandi:dandi-zarr-checksum
as an ETag.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant that we can use
dandi:dandi-zarr-checksum
as a value for ETag on zarr collections (folders).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it makes sense to apply etags to collections, as
GET
ing a collection won't give you a response with that etag. (Cf. dandi/dandidav#26)