This repository holds the code and data accompanying the following paper:
"Forgotten Books: The Application of Unseen Species Models to the Survival of Culture" [2021], by Mike Kestemont, Folgert Karsdorp, Elisabeth de Bruijn, Matthew Driscoll, Katarzyna A. Kapitan, Pádraig Ó Macháin, Daniel Sawyer, Remco Sleiderink & Anne Chao.
Abstract: The study of ancient cultures is hindered by the incomplete survival of material artefacts, so that we commonly underestimate the diversity of cultural production in historic societies. To correct this survivorship bias, we apply unseen species models from ecology to gauge the loss of narratives from medieval Europe, such as the romances about King Arthur. The estimates obtained are compatible with the scant historic evidence. Besides events like library fires, we identify the original evenness of cultural populations as an overlooked factor in these assemblages’ stability in the face of immaterial loss. We link the elevated evenness in island literatures to analogous accounts of ecological and cultural diversity in insular communities. These analyses call for a wider application of these methods across the heritage sciences.
The Jupyter notebooks under the notebooks
folder hold all the Python code which we used for the analysis, including the additional experiments reported in the SI and the R code for reproducing our findings:
analysis.ipynb
: code for the unseen species models;analyze_pop.ipynb
(andsim_pop.py
) for the evenness simulations;geolocate.ipynb
: code used for plotting the heatmaps;Copia.R
: the analogous R code to reproduce our findings for the unseen species models.
The code heavily relies on the open-source copia
package, co-developed by Mike Kestemont and Folgert Karsdorp, that is available from PyPI:
>>> pip install copia
The copia package is documented here. The code was executed in an Anaconda environment using Python 3.7.10. All dependencies can be installed from the requirements.txt
file from the top-level directory in the repository:
>>> pip install -r requirements.txt
- The
datasets/master
folder contains spreadsheets (.xlsx
) with the main work-document counts for the six main medieval vernaculars considered in the paper: Dutch, English, French, German, Icelandic, Irish (as well as Anglo-Norman). The compilation of these and the data format is detailed in the SI. - The
datasets/geolocated
contains the data that was for plotting the heatmaps of the document dispersal (four vernaculars), with a latitude-longitude pair for each documents. Here, only documents are included that we were able to geolocalize approximately. If a document has multiple signatures (because its remnants are curated across multiple repositories), only the first signature is geolocalized.)
Releases of this repository are sustainably mirrored on Zenodo, ensuring long-term archival access to this material. Please consider citing the accompanying paper if you re-use this code for academic purposes.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.