Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publish md5 hashes of datasets #647

Open
adefazio opened this issue Feb 20, 2024 · 3 comments
Open

Publish md5 hashes of datasets #647

adefazio opened this issue Feb 20, 2024 · 3 comments

Comments

@adefazio
Copy link

Description

Is it possible to publish file hashes and directory layouts for all datasets, post processing. I would like to run some checks to ensure that there are no discrepancies with the data my team has downloaded and processed.

@priyakasimbeg
Copy link
Contributor

The dataset layouts and final sizes are documented in datasets/README.md in the dropdown items saying "The final directory structure should look like this:".
image

@adefazio
Copy link
Author

Thanks, that's useful. Would it be possible to publish hashes of the files as well?

@priyakasimbeg
Copy link
Contributor

@chandramouli-sastry could you help close this request?
I have all the data from the setup scripts downloaded in kasimbeg-8 in /home/kasimbeg/data.
The remaining work is to:

  1. Check the README for data setup to make sure the file structure matches and there are no additional files left from the download.
  2. Get the hashes for all of the datasets and add them to the data setup README.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants