Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bibl records not included in API Download from Zotero #1191

Open
wlpotter opened this issue Jul 2, 2024 · 4 comments
Open

Bibl records not included in API Download from Zotero #1191

wlpotter opened this issue Jul 2, 2024 · 4 comments
Assignees
Labels

Comments

@wlpotter
Copy link
Contributor

wlpotter commented Jul 2, 2024

@wsalesky I did a bit more digging after #1186 and thing I've figured out which records are missing.

First, we have the same number of XML files and the exact same item keys as the records from the Zotero data dump that we got back in March/April. It does look like 1,797 of those items are notes, however. This means:

  1. I think we should revert the transform so it does filter out the items with "itemType": "note"
  2. We look to be missing about 1,500 records from Zotero

So I next compared the item keys from an export from the Zotero desktop client and found that 1,546 item keys from Zotero do not appear in the TEI XML files or the API data dump.
(note that we did add new records starting last week, but I filtered those out based on the "dateAdded" field)

I have a list of those item keys, so I could write a quick script to just query the API for those specific item keys and download the JSON to run through the transform

@wlpotter
Copy link
Contributor Author

@wsalesky I have downloaded the JSON of the missing zotero records; they are currently in a separate repo: https://github.com/wlpotter/zotcsv/tree/main/data. There are 1,546 of them chunked into files with 100 items each. I believe we can add these to the existing folder of data downloaded from zotero and re-run the transform?

I also want to flag that when we re-run the transform can we update the bibl URI base to use "/cbss/" rather than "/bibl/"? So currently, http://syriaca.org/bibl/8VSKN4EE (in Dev: https://dev.syriaca.org/bibl/8VSKN4EE) would become http://syriaca.org/cbss/8VSKN4EE

wsalesky added a commit that referenced this issue Aug 8, 2024
Updates includes missing records and update to idno.
@wsalesky
Copy link
Contributor

wsalesky commented Aug 8, 2024

Updates here: #1196

@wlpotter
Copy link
Contributor Author

wlpotter commented Aug 9, 2024

The updates to the idno format look good. I am still a bit confused about the number of records. We have 29,331 TEI XML bibls now but should only have 27,545. Are we still transforming notes or are they getting filtered out? I think we may still have some notes getting transformed, e.g. https://github.com/srophe/syriaca-data/blob/cbssDataUpdate8-8-24/data/bibl/tei/75RBT6SK.xml which is a note (cf. https://www.zotero.org/groups/4861694/a_comprehensive_bibliography_on_syriac_studies/items/RUT3P26M/note/75RBT6SK/library)

@wsalesky
Copy link
Contributor

wsalesky commented Aug 9, 2024

@wlpotter Okay. I will take another look, maybe there is another way to filter. Don't merge this I will make a new branch with the update data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants