-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bibl records not included in API Download from Zotero #1191
Comments
@wsalesky I have downloaded the JSON of the missing zotero records; they are currently in a separate repo: https://github.com/wlpotter/zotcsv/tree/main/data. There are 1,546 of them chunked into files with 100 items each. I believe we can add these to the existing folder of data downloaded from zotero and re-run the transform? I also want to flag that when we re-run the transform can we update the bibl URI base to use "/cbss/" rather than "/bibl/"? So currently, |
Updates includes missing records and update to idno.
Updates here: #1196 |
The updates to the idno format look good. I am still a bit confused about the number of records. We have 29,331 TEI XML bibls now but should only have 27,545. Are we still transforming notes or are they getting filtered out? I think we may still have some notes getting transformed, e.g. https://github.com/srophe/syriaca-data/blob/cbssDataUpdate8-8-24/data/bibl/tei/75RBT6SK.xml which is a note (cf. https://www.zotero.org/groups/4861694/a_comprehensive_bibliography_on_syriac_studies/items/RUT3P26M/note/75RBT6SK/library) |
@wlpotter Okay. I will take another look, maybe there is another way to filter. Don't merge this I will make a new branch with the update data. |
@wsalesky I did a bit more digging after #1186 and thing I've figured out which records are missing.
First, we have the same number of XML files and the exact same item keys as the records from the Zotero data dump that we got back in March/April. It does look like 1,797 of those items are notes, however. This means:
"itemType": "note"
So I next compared the item keys from an export from the Zotero desktop client and found that 1,546 item keys from Zotero do not appear in the TEI XML files or the API data dump.
(note that we did add new records starting last week, but I filtered those out based on the "dateAdded" field)
I have a list of those item keys, so I could write a quick script to just query the API for those specific item keys and download the JSON to run through the transform
The text was updated successfully, but these errors were encountered: