Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

404 for a legit subject "folder" #39

Closed
yarikoptic opened this issue Jan 31, 2024 · 13 comments
Closed

404 for a legit subject "folder" #39

yarikoptic opened this issue Jan 31, 2024 · 13 comments
Labels
blocked Cannot be implemented until something else happens bug Something isn't working

Comments

@yarikoptic
Copy link
Member

yarikoptic commented Jan 31, 2024

For http://localhost:8080/dandisets/000108/draft/sub-MITU01/ which should correspond to https://dandiarchive.org/dandiset/000108/draft/files?location=sub-MITU01&page=1

image

@yarikoptic yarikoptic changed the title 404 for a legit 404 for a legit subject "folder" Jan 31, 2024
@jwodder
Copy link
Member

jwodder commented Jan 31, 2024

This is because the optimization that dandidav takes to quickly determine whether something is a folder or asset is broken by dandi/dandi-archive#1839.

@jwodder jwodder added bug Something isn't working blocked Cannot be implemented until something else happens labels Jan 31, 2024
@yarikoptic
Copy link
Member Author

interesting -- could you point to that optimization?

@jwodder
Copy link
Member

jwodder commented Jan 31, 2024

@yarikoptic Here:

dandidav/src/dandi/mod.rs

Lines 318 to 327 in 49d49f3

while let Some(asset) = stream.try_next().await? {
if asset.path() == path {
return Ok(AtAssetPath::Asset(asset));
} else if asset.path().is_strictly_under(&dirpath) {
return Ok(AtAssetPath::Folder(AssetFolder { path: dirpath }));
} else if **asset.path() > *dirpath {
break;
}
}
Err(DandiError::NotFound { url })

When the DANDI client needs to determine whether a given path points to an asset or an asset folder, it makes a request to /dandisets/:id/versions/:id/assets/?path={path}&order=path. If it finds an asset with the given path, it returns that. If it finds an asset whose path starts with {path}/, then it knows the path is a folder. The optimization is: If it finds an asset whose path comes lexicographically after {path}/, then (based on the assumption that order=path orders paths lexicographically), the client concludes that it's gone past where the asset or folder would be if it existed, and so it must not exist, and so it can return early without checking everything in the returned list.

@yarikoptic
Copy link
Member Author

yarikoptic commented Jan 31, 2024

oh, you are using that end point , for which we also have

Why don't you use the /dandisets/:id/versions/:id/assets/paths/ endpoint for listing a specific immediate folder?

then if 200 with count - 0 -- it is a file (we do not contain empty folders). If 404 -- unknown path. If count >= 1 -- folder, e.g.

❯ curl -X 'GET' \
  'https://api.dandiarchive.org/api/dandisets/000029/versions/draft/assets/paths/?path_prefix=bar-renamed-3334'
{"count":0,"next":null,"previous":null,"results":[]}%                                               
❯ curl -X 'GET' 'https://api.dandiarchive.org/api/dandisets/000029/versions/draft/assets/paths/?path_prefix=sub-RAT123'
{"count":1,"next":null,"previous":null,"results":[{"path":"sub-RAT123/sub-RAT123.nwb","version":419,"aggregate_files":1,"aggregate_size":18792,"asset":{"asset_id":"356b20b7-ae80-4d42-9715-075492eb025d","url":"https://dandiarchive.s3.amazonaws.com/blobs/2db/af0/2dbaf0fd-5003-4a0a-b4c0-bc8cdbdb3826"}}]}%                                                                                                 
❯ curl -X 'GET' 'https://api.dandiarchive.org/api/dandisets/000029/versions/draft/assets/paths/?path_prefix=sub-unknown'
{"detail":"Specified path not found."}%  
❯ curl -X 'GET' 'https://api.dandiarchive.org/api/dandisets/000029/versions/draft/assets/paths/?path_prefix=sub-'
{"detail":"Specified path not found."}%    

?

@jwodder
Copy link
Member

jwodder commented Jan 31, 2024

@yarikoptic I also use that endpoint, but for a different purpose. The key difference is that, while .../assets/paths/ can provide a list of assets & subfolders directly within a folder, it doesn't provide any asset properties (other than ID) or metadata. When dandidav receives a request for a path, it first queries the .../assets/ endpoint to determine whether than path is an asset (and get its properties & metadata) or a folder; if it's a folder (and the request wasn't a PROPFIND request with "Depth: 0"), then dandidav makes another request to .../assets/paths/ to get the contents of the folder, and for each asset found there, it has to make one more request each to get the assets' properties & metadata.

See also dandi/dandi-archive#1837, where I requested a dedicated endpoint for doing all of this at once.

@yarikoptic
Copy link
Member Author

I wonder if this is the same issue. After establishing rclone connection to the dandidav with following configuration

yoh@typhon:~/proj/dandi/zarr-benchmarking/tools$ cat ~/.config/rclone/rclone.conf
[DANDI-WEBDAV]
type = webdav
url = https://dandi.centerforopenneuroscience.org
vendor = other

I get

yoh@typhon:~$ rclone ls DANDI-WEBDAV:dandisets/000108/sub-U01hm15x/
2024/02/12 14:58:39 ERROR : : error listing: directory not found
2024/02/12 14:58:39 Failed to ls with 2 errors: last error was: directory not found

where on server side it is

2024-02-12T20:02:02.519964Z DEBUG request{method=PROPFIND uri=/dandisets/000108/sub-U01hm15x/ version=HTTP/1.1}: tower_http::trace::on_request: started processing request
2024-02-12T20:02:02.520345Z DEBUG request{method=PROPFIND uri=/dandisets/000108/sub-U01hm15x/ version=HTTP/1.1}: tower_http::trace::on_response: finished processing request latency=0 ms status=404

but in the browser (visiting https://dandi.centerforopenneuroscience.org/dandisets/000108/draft/sub-U01hm15x/) it looks different -- seems to not use PROPFIND but goes to GET:

2024-02-12T20:04:07.450524Z DEBUG request{method=GET uri=/dandisets/000108/draft/sub-U01hm15x/ version=HTTP/1.1}: tower_http::trace::on_request: started processing request
2024-02-12T20:04:07.451088Z TRACE request{method=GET uri=/dandisets/000108/draft/sub-U01hm15x/ version=HTTP/1.1}: hyper::client::pool: take? ("https", api.dandiarchive.org): expiration = Some(90s)
2024-02-12T20:04:07.451202Z DEBUG request{method=GET uri=/dandisets/000108/draft/sub-U01hm15x/ version=HTTP/1.1}: hyper::client::pool: reuse idle connection for ("https", api.dandiarchive.org)

in any case (the same or not) let's for now, until dandi-archive is fixed/released (it is being looked at but who knows when fix gets released etc), let's sort ourselves to workaround - I bet Rust has some sorted() ;-)

@jwodder
Copy link
Member

jwodder commented Feb 12, 2024

@yarikoptic Trying to work around the problem via sorting would require first fetching every page of https://api.dandiarchive.org/api/dandisets/000108/versions/draft/assets/?path=sub-MITU01&order=path&metadata=1, which means 69 requests to get 6850 items.

@yarikoptic
Copy link
Member Author

isn't it required now anyways to fetch half of those pages "on average" while considering paths underneath? in other words that it is already as expensive (O(N)) and we are to work out solution for the

@jwodder
Copy link
Member

jwodder commented Feb 12, 2024

@yarikoptic

isn't it required now anyways to fetch half of those pages "on average" while considering paths underneath?

I don't know what you're referring to.

@yarikoptic
Copy link
Member Author

my bad, I re-reviewed your description of optimization - I remembered it wrong.

But is it the same issue I described above?

@jwodder
Copy link
Member

jwodder commented Feb 12, 2024

@yarikoptic For this specific case, I believe the reason it's not working with rclone is because you left out the "draft" component after "000108", i.e., you should have run rclone ls DANDI-WEBDAV:dandisets/000108/draft/sub-U01hm15x/. If it had been the same issue, viewing the URL (which does contain "draft") in the web browser would have returned 404.

@yarikoptic
Copy link
Member Author

doh ... sorry for the "typo" and thanks for catching it, so it was a human bug indeed.

@jwodder
Copy link
Member

jwodder commented Mar 7, 2024

Fixed by dandi/dandi-archive#1839.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked Cannot be implemented until something else happens bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants