Skip to content

Living-with-machines/Computer-Vision-for-the-Humanities-workshop

Repository files navigation

camera Computer Vision for the Humanities

This workshop aims to provide an introduction to computer vision aimed for humanities applications. In particular this workshop focuses on providing a high level overivew of machine learning based approaches to computer vision focusing on supervised learning. The workshop includes discussion on working with historical data. The materials are based on in progress Programming Historian lessons.

Blurb

Over the last 10 years, the field of computer vision, which seeks to gain a high-level understanding of images using computational techniques, has seen rapid innovation. For example, computer vision models are able to locate and identify people, animals and thousands of objects on images with high levels of accuracy. This technological innovation promises the same innovation for images that the combination of Optical Character Recognition/NLP (Natural language processing) techniques caused for texts. They open up a part of the digital archive for large-scale analysis, which, until now, has been left uncovered: the millions of images in digitized books, newspapers, periodicals, and historical documents.

This workshop will:

  • Provide an introduction to deep learning based computer vision methods for humanities research.
  • Give an overview of the steps involved in training a deep learning model.
  • Discuss some of the specific considerations around using deep learning/computer vision for humanities research.
  • Help you decide whether deep learning might be a useful tool for you.

Materials

The Data

Newspaper Navigator

This workshops makes use of the Newspaper Navigator dataset. This dataset "consists of extracted visual content for 16,358,041 historic newspaper pages in Chronicling America. The visual content was identified using an object detection model trained on annotations of World War 1-era Chronicling America pages, including annotations made by volunteers as part of the Beyond Words crowdsourcing project." source


newspaper image

You can also find data dervided from the Newspaper Navigator datasets on Zenodo:

You may also want to checkout nnanno which was used to develop these datasets.

Credit: This project, funded by the UK Research and Innovation (UKRI) Strategic Priority Fund, is a multidisciplinary collaboration delivered by the Arts and Humanities Research Council (AHRC), with The Alan Turing Institute, the British Library and the Universities of Cambridge, East Anglia, Exeter, and Queen Mary University of London.