DARIAH-ERIC · charlottejmc · Jul 11, 2024 · Jul 25, 2024 · Sep 11, 2024
diff --git a/content/posts/corpus-analysis-with-spacy/images/11238783706_e7ca6c0c35_o.jpg b/content/posts/corpus-analysis-with-spacy/images/11238783706_e7ca6c0c35_o.jpg
diff --git a/content/posts/corpus-analysis-with-spacy/index.mdx b/content/posts/corpus-analysis-with-spacy/index.mdx
@@ -0,0 +1,49 @@
+---
+title: Corpus Analysis with spaCy
+lang: en
+date: 2024-07-11T15:20:06.548Z
+version: 1.0.0
+authors:
+  - s-kane-megan
+editors:
+  - ladd-john-r
+tags:
+  - python
+  - big-data
+categories:
+  - programming-historian
+featuredImage: images/11238783706_e7ca6c0c35_o.jpg
+abstract: This lesson demonstrates how to use the Python library spaCy for
+  analysis of large collections of texts. This lesson details the process of
+  using spaCy to enrich a corpus via lemmatization, part-of-speech tagging,
+  dependency parsing, and named entity recognition. Readers will learn how the
+  linguistic annotations produced by spaCy can be analyzed to help researchers
+  explore meaningful trends in language patterns across a set of texts.
+domain: Social Sciences and Humanities
+targetGroup: Domain researchers
+type: training-module
+remote:
+  date: 2023-11-02T16:31:00.000Z
+  url: https://doi.org/10.46430/phen0113
+  publisher: ProgHist Ltd
+licence: ccby-4.0
+toc: false
+draft: false
+uuid: E7Hh84XHeikiofOoQpNW2
+---
+Say you have a big collection of texts. Maybe you’ve gathered speeches from the French Revolution, compiled a bunch of Amazon product reviews, or unearthed a collection of diary entries written during the first world war. In any of these cases, computational analysis can be a good way to compliment close reading of your corpus… but where should you start?
+
+One possible way to begin is with spaCy, an industrial-strength library for Natural Language Processing (NLP) in Python. spaCy is capable of processing large corpora, generating linguistic annotations including part-of-speech tags and named entities, as well as preparing texts for further machine classification. This lesson is a ‘spaCy 101’ of sorts, a primer for researchers who are new to spaCy and want to learn how it can be used for corpus analysis. It may also be useful for those who are curious about natural language processing tools in general, and how they can help us to answer humanities research questions.
+
+#### Reviewed by:
+- Maria Antoniak
+- William Mattingly
+
+## Learning outcomes
+After completing this lesson, you will be able to:
+- Upload a corpus of texts to a platform for Python analysis (using Google Colaboratory)
+- Use spaCy to enrich the corpus through tokenization, lemmatization, part-of-speech tagging, dependency parsing and chunking, and named entity recognition
+- Conduct frequency analyses using part-of-speech tags and named entities
+- Download an enriched dataset for use in future NLP analyses
+
+<ExternalResource title="Interested in learning more?" subtitle="Check out this lesson on Programming Historian's website" url="https://doi.org/10.46430/phen0113" />