Skip to content

Material for the class Automated Web Data Collection and Text as Data (2023)

Notifications You must be signed in to change notification settings

cornelius-erfort/web-data-and-text-r

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Automated Web Data Collection and Text as Data

Slides and exercises for BA class Automated Web Data Collection and Text as Data (2023)

Slides

  1. Introduction
  2. R Basics
  3. HTML
  4. Regular expressions
  5. Webscraping
  6. APIs, JSON and other file formats
  7. OCR and PDFs
  8. Recap, and more advanced approaches
  9. Ethics and legal aspects
  10. Text as Data
  11. Text models

Description

The internet has not only become an integral part of everyday life but also an important data source for social science research. Instead of conducting costly surveys or experiments, many research questions can be answered using openly available data and innovative methods. Access to social media data, for example, allows us to measure the attitudes and issue agendas of political actors by analyzing their communication. Students will acquire valuable research skills including automated web data collection and quantitative text analysis using the statistical software R. These skills will enable students to generate datasets for their own empirical research projects. Additionally, we will discuss cutting-edge social science applications of these methods to showcase the advantages these methods have to offer for empirical research.

Textbooks

  • Munzert, Simon, Christian Rubba, Peter Meißner, Dominic Nyhuis (2014). Automated Data Collection with R – A Practical Guide to Web Scraping and Text Mining. John Wiley & Sons, Chichester. https://doi.org/10.1002/9781118834732
  • Grolemund, G., & Wickham, H. (2023). R for Data Science (2nd Edition). O’Reilly Media. https://r4ds.hadley.nz/

Author

  • Cornelius Erfort
    Post-doctoral Researcher
    University of Witten/Herdecke
    Department of Philosophy, Politics, and Economics
    Alfred-Herrhausen-Straße 50, 58455 Witten, Germany
    [email protected]
    ORCID: 0000-0001-8534-7748

This work was supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – 390285477/ GRK 2458.

About

Material for the class Automated Web Data Collection and Text as Data (2023)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published