-
Notifications
You must be signed in to change notification settings - Fork 0
Collaboratory members interests
- Academic research interests
- Specific interests in large language models
My dissertation explores the transnational family relationships of Lithuanian Jews, 1899-1949. I’m primarily interested in how Jewish migrants and their relatives in eastern Europe cultivated bonds with one another through family letters. I work as a public historian, currently running a project commemorating the 150th anniversary of a large New York City mental health and social services agency.
I use generative AI in my work in a variety of ways. The most transformative has been as an aid for translating (post Transkribus-transcribed) handwritten Yiddish letters. I have found GPT4 to be remarkably good at translating colloquial, highly non-standard Yiddish, far better than any traditional machine translation. I also use generative AI to radically speed up various repetitive research tasks, and for data cleaning. I am particularly interested in learning how to use vector bases with my data.
My own research (for which I have just gained my doctorate) explored language use as a means of understanding C18 attitudes to key cultural ideas, my thesis focussing principally on obligation. My main database was a sub-set of Eighteenth Century Collections Online, ECCO-TCP (69m words, 2,188 texts) which I analysed using corpus linguistics and word vector analysis (word2vec).
I currently have two research interests which interconnect. Firstly, within cultural history I have been exploring how to reach safer conclusions about past attitudes by using large corpora rather than the more common approach of relying on select texts which are frequently assumed to exemplify general social behaviours. I have focussed on large-scale language use to demonstrate which historical standpoints are most typical and, whilst my results have often been supportive of previous research, at times I have been able to reveal unnoticed trends or challenge traditional findings.
I have been keen to harness the power of LLMs but lacked the knowledge to do so. I am still at the stage of being slightly in awe of its potential so have only a very general sense of where I would like it to take me. Looking at the early examples in the Notebook, I can see how that prompt engineering can be very valuable in extracting information, but I will be particularly interesting in exploring over the next few weeks how the various techniques can be used in more qualitative work.
My research focuses on Barbary pirate attacks on English ships during the 1610s-1620s, specifically through the lens of the High Court of Admiralty (HCA) records. Since piracy in the criminal records of HCA 1 has already been studied, I have chosen to focus on the civil cases of the largely-unexplored HCA 13 series. While these depositions generally centre on insurance and payment disputes, the context in which these issues arise offers a compelling glimpse into how piracy was experienced by ship crews.
Upon commencing my research, I encountered a significant challenge; the depositions are written in a mixture of Latin and Early Modern English. What’s more, much of the handwriting is near illegible, at least to an inexperienced transcriber, such as myself. However, with the use of Large Language Models (LLMs) and the support of Colin Greenstreet, I have not only overcome this problem but also gained access to far more records than I initially thought possible. Transkribus has enabled me to transcribe thousands of depositions and Claude 3.5 has generated concise summaries of those depositions that are of particular interest to my research. As a result, I’ve managed to sift through two entire volumes in less than 20 hours. This has opened up entirely new avenues for my research, providing me with the resources to conduct a comprehensive survey of piracy over a much broader period.
My area of interest is in using various AI techniques for my research on American political discourse. I recently finished building this web-based app through which historians can ask questions of the entire corpus of FDR or Reagan's spoken addresses and remarks. It's basically a Retrieval Augmented Generation pipeline, that retrieves and reranks source excerpts and then feeds the top results to the Anthropic API, where a LLM writes a response that is virtually verbatim based on these sources, for accuracy. I'm very interested in how we, as researchers, access these vast digital archives that are being created, and I'm particularly excited by the power of vector search as exhibited in my tool, to allow researchers to have natural language dialogue with their sources.
My area of research interest is centred on the English High Court of Admiralty (HCA) in the period 1570 to 1685. I am particularly interested in the exploration of mariner and shore trade literacy as evidenced in Admiralty Court records and am working on a monograph on this subject.
My interest in large language models takes several forms: As a collaborative tool for MarineLives volunteers to clean up raw machine trancriptions prior to further processing. As a collaborative tool for MarineLives volunteers to create analytical ontological summaries of machine transcribed HCA depositions at scale. These summaries will be converted into Linked Open Data and potentially injected into IIIF manifests, together with full text and images. As an experimental tool to explore across a wide range of analogue historical research tasks to increase individual and team effectiveness and efficiency. As a platform to support the training of mental health therapists.
Lucas Hassis' research focusses on 18th-century letter-writing practices, mercantile culture of the 18th century, materiality studies, and praxeological approaches in historiography and global microhistory.
My research focuses on the movement of ships and related people and objects in the 17th/18th century and I primarily work with logbooks from the Prize Papers collection but also use examinations, lading lists etc. I'd like to use LLMs to automatically extract places and dates to not only have a start and end point - which is what the Prize Papers project is working with - but to closely track routes and hopefully be able to also extract weather and other events to research how they impacted early modern movement.
Thiago Krause is researching the global history of Salvador da Bahia, Brazil, to explore the rise of slavery and capitalism from a Global South perspective. He is particularly interested in leveraging large language models for improving HTR transcriptions, named entity recognition, and analyzing extensive datasets.
A medievalist by training, my narrow research interests focus on perceptions of corruption and oversight of trusted officials and how they shift through time. I particularly utilise digital methods, legal records, and political poetry to unpick this.
I have used Large Language Models to help with coding in both R and Python but I am well aware that there is a much greater sophistication to these tools which I have not yet had the chance to grasp. I am particularly interested in how LLMs can help to speed up and improve the creation and correction of HTR models.
I'm an historian of Anglo-Ottoman trade and early modern money and am interested in exploring ways to use LLMs to construct information on goods, coins, and Levant Company merchants.
Oren Okhovat works on early modern merchant networks and is currently working on a project examining Portuguese Jewish merchants in the Caribbean island of Curaçao with the goal of examining the continued entanglement of Portuguese and Spanish commercial networks, through the intermediacy of Dutch ones, following the end of the Iberian Union in 1640. He is particularly interested in the impact that emerging capitalistic ventures had on the formation of a regional economic culture in the Caribbean and the broader Atlantic world. His work has benefited from the expansion of HTR AI initiatives that have made cross-referencing material across archives in several languages more efficient. He is also interested in the capability of new AI-initiatives to help locate individuals and specific material of interest in large archives, and in using AI as a research assistant to help streamline certain tasks like organizing data, translating, and generating bibliographies.
My main area of research concerns the making and interactions of global plant knowledges, through the lens of the vast herbarium amassed by Hans Sloane (1660-1753). This work encompasses the geographies of collecting and transmission by Europeans, as well as the trans-cultural encounters and knowledges that supported and sustained it, though the latter are infrequently visible in the archive. I am now in my final year of my PhD, part of the work for which has resulted in the creation and generation of an XML representation of the herbarium; a limited tabular version of that data is available from the Natural History Museum in London. I have very little knowledge of LLMs, but am curious to explore how they could be used to analyse structured and unstructured data, especially as a means of uncovering unsuspected relationships, gaps and silences in the sources. A large proportion of the plant specimens in the herbarium depended on maritime transportation at many points in their journeys from the places where they grew to their deposit in London. Some were even acquired from a naval prize. So, I am also keen to explore the HCA data as a source for the transmission of natural things. Prior to my PhD, I worked in digital academic publishing for 30 years.
I am an early American historian whose past research has mainly focused on the seventeenth-century Atlantic World, but since 2021 I have been working on a digital project called H-GEAR (Historiographing the Era of the American Revolution), which uses digital methods to analyze and explore large corpora of early American texts. I am particularly interested in using LLMs as experimental tools for textual analysis but I'm also curious how they might be used in teaching and public outreach, as well. In my own case, I was recently involved in a successful effort to build a chatbot that was trained on a corpus of early American printed texts (Early American HistoriChat).
My interest and passion is in British early modern history. My research focuses on eighteenth-century antiquaries, specifically their networks and how these were used to recover the past. I plan to incorporate AI as both a productivity tool and a potential research methodology, enabling me to achieve outcomes that might otherwise be unattainable.
I am an historian of French global commerce under Louis XIV. In truth, I am ambivalent about LLMs, but wish to learn more about them, their potential functions and how I could use them either for my current projects or for future work.
I work on a project "Regionalizing Infrastructures in Chinese History (RegInfra)" under the supervision of Professor Hilde De Weerdt.
The MarineLives project was founded in 2012. It is a volunteer lead collaboration dedicated to the transcription, enrichment and publication of English High Court of Admiralty depositions.
AI assistants and agents. Nov 19, 2024 talk
Analytical ontological summarization prompt
APIs and batch processing - second collaboratory session
APIs and batch processing ‐ learnings from second collaboratory session
Barbary pirate narrative summarization prompt
Barbary pirate deposition identification and narrative summarization prompt
Batch processing of raw HTR for clean up and summarization
Collaboratory members interests
Early Modern English Language Models
Fine-tuning - third oollaboratory session
History domain training data sets
Introduction to machine learning for historians
MarineLives and machine transcription
New skill set for historians? July 19, 2024 talk
Prompt engineering - first collaboratory session
Prompt engineering - learnings from first collaboratory session