-
Notifications
You must be signed in to change notification settings - Fork 0
Collaboratory members interests
- Academic research interests
- Specific interests in large language models
I use gen AI extensively, both in my public history work and in my PhD research, but it's clear that I have only just scratched the surface of what's possible.
My own research (for which I have just gained my doctorate) explored language use as a means of understanding C18 attitudes to key cultural ideas, my thesis focussing principally on obligation. My main database was a sub-set of Eighteenth Century Collections Online, ECCO-TCP (69m words, 2,188 texts) which I analysed using corpus linguistics and word vector analysis (word2vec).
I currently have two research interests which interconnect. Firstly, within cultural history I have been exploring how to reach safer conclusions about past attitudes by using large corpora rather than the more common approach of relying on select texts which are frequently assumed to exemplify general social behaviours. I have focussed on large-scale language use to demonstrate which historical standpoints are most typical and, whilst my results have often been supportive of previous research, at times I have been able to reveal unnoticed trends or challenge traditional findings.
I have been keen to harness the power of LLMs but lacked the knowledge to do so. I am still at the stage of being slightly in awe of its potential so have only a very general sense of where I would like it to take me. Looking at the early examples in the Notebook, I can see how that prompt engineering can be very valuable in extracting information, but I will be particularly interesting in exploring over the next few weeks how the various techniques can be used in more qualitative work.
[PLACE HOLDER]
My area of interest is in using various AI techniques for my research on American political discourse. I recently finished building this web-based app through which historians can ask questions of the entire corpus of FDR or Reagan's spoken addresses and remarks. It's basically a Retrieval Augmented Generation pipeline, that retrieves and reranks source excerpts and then feeds the top results to the Anthropic API, where a LLM writes a response that is virtually verbatim based on these sources, for accuracy. I'm very interested in how we, as researchers, access these vast digital archives that are being created, and I'm particularly excited by the power of vector search as exhibited in my tool, to allow researchers to have natural language dialogue with their sources.
My area of research interest is centred on the English High Court of Admiralty (HCA) in the period 1570 to 1685.
My interest in large language models takes several forms:
-
As a collaborative tool for MarineLive volunteers to create analytical ontological summaries of machine transcribed HCA depositions at scale. These summaries will be converted into Linked Open Data and potentially injected into IIIF manifests, together with full text and images.
-
As an experimental tool to explore across a wide range of analogue historical research tasks to increase individual and team effectiveness and efficiency.
-
As a platform to support the training of mental health therapists.
[PLACE HOLDER]
My research focuses on the movement of ships and related people and objects in the 17th/18th century and I primarily work with logbooks from the Prize Papers collection but also use examinations, lading lists etc. I'd like to use LLMs to automatically extract places and dates to not only have a start and end point - which is what the Prize Papers project is working with - but to closely track routes and hopefully be able to also extract weather and other events to research how they impacted early modern movement.
[PLACE HOLDER]
A medievalist by training, my narrow research interests focus on perceptions of corruption and oversight of trusted officials and how they shift through time. I particularly utilise digital methods, legal records, and political poetry to unpick this.
I have used Large Language Models to help with coding in both R and Python but I am well aware that there is a much greater sophistication to these tools which I have not yet had the chance to grasp. I am particularly interested in how LLMs can help to speed up and improve the creation and correction of HTR models.
[PLACE HOLDER]
My book project examines the networks of Portuguese Sephardic merchants, with particular attention given to the community in the Caribbean island of Curaçao. I’m interested in their ties to Spanish markets, and argue that many of the New Christian merchants whose American ventures were upended in the turmoil of the 1640s refashioned their operations through ties to family and associates who relocated to Amsterdam and then the Caribbean in the late 1640s and 50s. Blas de la Peña is one such merchant (identified in the High Court of Admiralty Deposition Books: HCA 13/39-HCA 13/79 Google Notebook), and I believe that cases like his demonstrate how Portuguese merchants managed to reinvent themselves to continue and expand their investments in Spanish American markets after the end of the Iberian Union.
Research project: Global and colonial contexts of the Sloane Herbarium
[PLACE HOLDER]
[PLACE HOLDER]
I work on a project "Regionalizing Infrastructures in Chinese History (RegInfra)" under the supervision of Professor Hilde De Weerdt.
The MarineLives project was founded in 2012. It is a volunteer lead collaboration dedicated to the transcription, enrichment and publication of English High Court of Admiralty depositions.
AI assistants and agents. Nov 19, 2024 talk
Analytical ontological summarization prompt
APIs and batch processing - second collaboratory session
APIs and batch processing ‐ learnings from second collaboratory session
Barbary pirate narrative summarization prompt
Barbary pirate deposition identification and narrative summarization prompt
Batch processing of raw HTR for clean up and summarization
Collaboratory members interests
Early Modern English Language Models
Fine-tuning - third oollaboratory session
History domain training data sets
Introduction to machine learning for historians
MarineLives and machine transcription
New skill set for historians? July 19, 2024 talk
Prompt engineering - first collaboratory session
Prompt engineering - learnings from first collaboratory session