Introduction to machine learning for historians

TABLE OF CONTENTS

A. Technology

1. Adoption of technology

2. Digital libraries and Digital archives

3. Large Language Models

4. Vectorbases

5. Knowledge Graphs

B. Environments

1. Hugging Face

2. GitHub

C. Documents and metadata

1. Document characteristics

2. Metadata

3. Linked open data

D. Academic research process

1. Academic research tasks

2. Archival research workflow

E. Techniques

1. Prompt engineering

2. Fine-tuning

3. Retrieval Augmented Generation

F. Text oriented machine learning tasks (alphabetical)

1. Classification

2. Entity extraction

3. Question answering

4. Semantic search

5. Splitting

6. Text correction

7. Text extraction - OCR

8. Text extraction - HTR

9. Text summarization

10. Text translation

G. Assistants and Agents

1. Assistants - designing and executing text oriented tasks

2. Assistants - coding

3. Agents

H. Sound and image modalities

1. Image analysis

2. Sound annotation

I. Use cases

1. Creation of a personal doctoral research archive powered by a vectorbase

2. Design and production of analytical summarizations of historical legal depositions

3. Creation of linked data from analytical summarizations

4. Creation of a linked data web browser and visualizer

5. Creation of a knowledge graph from linked data

6. Creation of tailored LLM assistants

7. Devising and running an assistant supported history simulation

J. Looking ahead

1. Near future

2. MarineLives Collaboratory

General bibliography

Topical bibliography

Appendices

> A. Systems prompts

> B. Support for software, tools and standards that historians use

> C. Working with EEBO

> D. Wish list for 2025

The MarineLives project was founded in 2012. It is a volunteer lead collaboration dedicated to the transcription, enrichment and publication of English High Court of Admiralty depositions.

Home page

Adoption of technology

AI assistants and agents. Nov 19, 2024 talk

Analytical ontological summarization prompt

Anthropic workbench

APIs and batch processing - second collaboratory session

APIs and batch processing ‐ learnings from second collaboratory session

Barbary pirate narrative summarization prompt

Barbary pirate deposition identification and narrative summarization prompt

Batch processing of raw HTR for clean up and summarization

Bibliography

Collaboratory members

Collaboratory members interests

Curriculum

Early Modern English Language Models

Fine-tuning - third oollaboratory session

Glossary

History prompt library

History domain training data sets

Hugging Face

Impact

Introduction to machine learning for historians

MarineLives and machine transcription

Metadata

New skill set for historians? July 19, 2024 talk

Prompt engineering - first collaboratory session

Prompt engineering - learnings from first collaboratory session

Useful tools

Introduction to machine learning for historians

A. Technology

1. Adoption of technology

2. Digital libraries and Digital archives

3. Large Language Models

4. Vectorbases

5. Knowledge Graphs

B. Environments

1. Hugging Face

2. GitHub

C. Documents and metadata

1. Document characteristics

2. Metadata

3. Linked open data

D. Academic research process

1. Academic research tasks

2. Archival research workflow

E. Techniques

1. Prompt engineering

2. Fine-tuning

3. Retrieval Augmented Generation

F. Text oriented machine learning tasks (alphabetical)

1. Classification

2. Entity extraction

3. Question answering

4. Semantic search

5. Splitting

6. Text correction

7. Text extraction - OCR

8. Text extraction - HTR

9. Text summarization

10. Text translation

G. Assistants and Agents

1. Assistants - designing and executing text oriented tasks

2. Assistants - coding

3. Agents

H. Sound and image modalities

1. Image analysis

2. Sound annotation

I. Use cases

1. Creation of a personal doctoral research archive powered by a vectorbase

2. Design and production of analytical summarizations of historical legal depositions

3. Creation of linked data from analytical summarizations

4. Creation of a linked data web browser and visualizer

5. Creation of a knowledge graph from linked data

6. Creation of tailored LLM assistants

7. Devising and running an assistant supported history simulation

J. Looking ahead

1. Near future

2. MarineLives Collaboratory

General bibliography

Topical bibliography

Appendices

> A. Systems prompts

> B. Support for software, tools and standards that historians use

> C. Working with EEBO

> D. Wish list for 2025

Clone this wiki locally