-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCR0065: Create repo on Hugging face for all the datasets we have of OCR #4
Comments
@ta4tsering kindly reach out to @gangagyatso4364 regarding how to combine the uchen dataset in one hugging face dataset. |
Kindly add url to the dataset |
https://huggingface.co/datasets/openpecha/OCR-Norbuketaka |
I wasnt able to do for the Derge Tenjur, it is taking way too long to fix the issue which is that the images arent present in the zip as it is compromised when downloaded from the hugging face and since it is not on the s3 I cant use the url as well. |
Description:
So currently we have a lot of OCR data that we have annotated and all of those images are on s3 with each image as a single object and also the cvs files are all in s3. So to make the datasets easily usable and easily accessible I will creating the zip files with all the images and upload to the hugging face repo with the transcriptions and all with the data split or data distribution that eric used.
Completion Criteria:
All the Tibetan OCR data uploaded to Openpecha hugging face.
Subtasks:
note:
for the Norbuketaka and Google books, we already have a hugging face repo but without the data distributions so I am using that hugging face repo to create the new hugging face repo on Openpecha hugging face with the data distribution but without the zipped image file
Card Reviewer:
The text was updated successfully, but these errors were encountered: