Replies: 1 comment
-
I would towards representing the specific branch tag or commit in the URL (or separate dedicated fields), and not just in hash fields like cr:md5, cr:sha1 or sc:sha256. Those fields are intended to provide a checksum to verify content, not as a mechanism to address content, like git does with hashes. On the exact representation:
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Context
Croissant supports git-based repositories (e.g., GitHub or Hugging Face). However, we already define a way to extract either from the default branch or from a ref.
Proposal
Reference a specific branch or a tag. We could encode this in the URL just as we encoded the ref. This encoding would be specific to each repository.
https://github.com/<username>/<repository_name>/tree/<branch_name>
where branch_name is encoded (e.g.,feature/new updates
->feature%2Fnew%20updates
).https://huggingface.co/datasets/<dataset_id>/tree/<branch_name>
.Reference a specific commit.
cr:sha1
(just like we addedcr:md5
).https://github.com/<username>/<repository_name>/commit/<commit>
. The inconvenient of this method is that we lose the information about the branch or the ref.Beta Was this translation helpful? Give feedback.
All reactions