-
-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Notebook for bulk downloading of AJCP material #58
Comments
I'd like to be able to download sections from the AJCP digitised collection. For instance, material from the Miscellaneous Series, London Missionary Society Collection. From here, it would be great to search by three categories- name, date, and geographical location. The example below shows the general data in the finding aid- Letters mainly from missionaries in the Society, Hervey and Samoan Islands and also the New Hebrides, Loyalty Islands and Savage Island (Niue), 1862 - 1863 (File Box 29) But what I'd really like is to harvest files based on the descriptive section of the file, as seen below. For my research, I would target information about Lawes. The correspondents include Charles Barff (Huahine), P.G. Bird (Savaii, Apia), Stephen M. Creagh (Uea, Lifu), George Drummond (Upolu), Samuel Ella (Aneiteum), John Geddie (Aneiteum), Henry Gee (Apia), W. Wyatt Gill (Mangaia), James L. Green (Taha'a), William Howe (Papeete), John Jones (Mare), Ernst R.W. Krause (Rarotonga), William G. Lawes (Savage Island), Samuel Macfarlane (Lifu), George Morris (Raiatea), Archibald W. Murray (Malua), Henry Nisbet (Malua), George Platt (Raiatea), Thomas Powell (Tutuila), George Pratt (Matautu, Savage Island),Carl Schmidt (Apia), James Sleigh (Lifu) and George Turner (Sydney). |
So to break this down:
Is that what you'd like? |
Notes to self: Searching within a finding aid fires off a POST request that returns an HTML fragment. The params are something like this: params = {"faIdentifier":"nla.obj-1126174847","term":"lawes","nuc":"ANL:AJCP","facets":"all","zone":"collection","selectedFacets":[],"pageSize":10,"cursorMark":"AoErc3UyMzcxMDI4Nzk=","start":1,"previous":["*"]} And are posted as json to Results are HTML so would need to scrape identifiers from the HTML for further processing. |
Worth noting too that dezoomify (https://dezoomify.ophir.dev/) works a treat in downloading high-resolution versions of pages in the AJCP. |
Thanks for the dezoomify link, Tim. Bart mentioned he spoke with you recently and just commented on how good the images are! As for the query above, I think that sounds good! |
Some recent notebooks should meet parts of this need: In addition, the 'Images' tutorials in the Trove Data Guide provide detailed instructions in getting data out of a finding aid and then loading into other tools for analysis/annotation: There's also documentation about getting high-res versions of images in the TDG, here and here. I'll think more about the searching part of this issue as I do some more work on Finding Aids as part of wragge/trove-data-guide#157 |
See: https://twitter.com/MichWatsonOz/status/1521725616735014912
The text was updated successfully, but these errors were encountered: