The preparation workflow for images and transcriptions consists of the following steps:
- generation of IIIF manifests, see https://github.com/dse-as/i3f
- uploading IIIF images contained in the manifests to Transkribus
- automated transcription in three document collections
- downloaing PAGE XML from Transkribus
- transforming PAGE XML from Transkribus to raw TEI
- transforming raw TEI to final format
Auxiliary methods:
To facilitate handling, most scripts are executable directly on Github, either by opening an issue (using the appropriate template) or by committing a metadata file to the repository.
Automated upload workflow of IIIF images into a Transkribus collection.
python scripts/PAGE-from-Transkribus/download_latest_pagexml.py -u 'USERNAME' -p 'PASSWORD' -c 'COLLECTION-ID-1' 'COLLECTION-ID-2' -o 'OUTFOLDER'
python scripts/PAGE-to-raw-TEI/page2TEI.py -i download -o download_out
───────────────────────────────────────────────────────────────────────────────────────╮
document and image identifiers remain stable after initial creation │
│
┌─┬──┬─┬─┬──┬──┬─┬──┬─┬─┐ │
│small forms │ │ │ │ │ │
│ID │metadata IIIF│ │
│ │ ┌┴┬──┬─┬─┬──┬──┬─┬──┼─┬─┐ iiif.annemarie-schwarzenbach.ch/presentation │
│ │ │letters │ │ │ │ │ │
│ │ │ID │metadata IIIF│ ┌─────┐ │
│ │ │ │ │ │ │ │ │ │ │ │ │ ━━━━━━━━━━━━━━▶ ┌┴────┐│ one .toml file per document │
└─┴─┤ │ │ │ │ │ │ │ │ │ │ ┌┴────┐││ │
│ │ │ │ │ │ │ │ │ │ │ │ │├┘ │
│ │ │ │ │ │ │ │ │ │ │ │ ├┘ │
└─┴──┴─┴─┴──┴──┴─┴──┴─┴─┘ └─────┘ │
docs.google.com/spreadsheets commit to dse-as.github.io/i3f │
generates IIIF presentation manifest │
│
┃ │
┃ │
┃ │
┃ │
┃ │
▼ │
│
dse-as.github.io/workflow_IIIF-Transkribus-AT│
┌────────────────────────────┐ ┌─────────────┐ │
│ ┌──────────┐ ═══════════ │ │ │ │
│ │ │ ══════════ │ form-based image upload ├─────────────┤ │
│ │ │ ═══════════ │ ◀━━━━━━━━━━━━━━ into Transkribus │───── │ │
│ │ │ ═════════════ │ collection │───── │ │
│ │ │ ═══════════ │ ├─────────────┤ │
│ │ │ ═════════════ │ └─────────────┘ │
│ │ │ ═══════════ │ │
│ └──────────┘ ══════════ │ │
│ │ │
│ │ ▼
└────────────────────────────┘ │
app.transkribus.org │
│
text recognition, (rough) structural annotation │
│
┃ │
┃ │
┃ 3 Transkribus collections │
┃ │
┃ │
┗━━━━▶ as-dse_wait │
┃ │
┃ │
┃ │
┃ │
┃ │
┗━━━━▶ as-dse_work │
┃ TEI-XML data │
┃ │
┃ ┌───────┐ │
┃ │ ├─┐ │
┃ │ │ ├─┐ 1 file per text │
┗━━━━▶ as-dse_finalised ━━━━━━━━━━━━━━━▶ │ │ │ │ │
│ │ │ │ │
└─┬─────┘ │ │ │
└─┬─────┘ │ ▼
script-based export from └───────┘
Transkribus and data ║
transformation (raw TEI, ║
project TEI) ╔════╩════════════════╗
║ ║
║ ║
║ ║
║ ║
║ ║
║ ║
▼ ▼
┌──────────────────────────────────────────────────────────────────────────┐ ┌───────────────┐
│ development of web presentation │ │ │
│ │ │ FAIR data │
├────────────────────────┬────────────────────────┬────────────────────────┤ │ repository │
│ │ │ │ │ │
│ data transformation, │ index, register │ frontend │ │ │
│ (static) backend │ │ │ │ │
│ │ │ │ │ │
└────────────────────────┴────────────────────────┴────────────────────────┘ └───────────────┘
The code in this repository is based on