-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mtdata downloader #10
Conversation
This reverts commit 2be4bc4. Turns out it was working beforehand.
mtdata-stuff.py
Outdated
@@ -53,3 +94,10 @@ def read_dataset(did: str): | |||
@app.get("/datasets/{did}/sample") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is duplicated with the function above. I assume that this is part of the javascript interface. @jelmervdl is that how it's supposed to work?
I propose we merge this for now so that it's in the tree and it can be picked up when we have a downloader interface. |
Stand-alone this doesn't add much actual functionality. I'll take this pull request over, and add some sort of minimal interface for it at least before merging it. |
Looks good to me, pending on the answer of thammegowda/mtdata#129 . Ideally i'd like to see how much i am downloading before starting to download (and also we had the issue with the mozilla pipeline that downloads would fail because we would be throttled, so we should be able to limit the number of parallel downloads.) |
Also, crash on exit:
|
Downloads are currently limited to two concurrent downloads. Maybe I can get it do a HEAD request to get the size (or Content-Length really) of the download. It would be infeasible to do this for all datasets that are listed, but should be doable for the ones in your "shopping list" at least. |
This is just a draft as I am not exactly sure how to hook it up to the GUI, but I have coded the backend bits necessary for
I imagine it should be some "tab" such as "discover" datasets, where we get a list of them and we can manually exclude some/label them clean/medium/dirty. The downloader automatically splits train/test/dev based on the dataset id provided by mtdata.
This should fix #6 eventually.