Slides and exercises for BA class Automated Web Data Collection and Text as Data (2023)
- Introduction
- Teaser: Approval for nuclear
- R Basics
- HTML
- Regular expressions
- Webscraping
- APIs, JSON and other file formats
- OCR and PDFs
- Recap, and more advanced approaches
- Ethics and legal aspects
- Text as Data
- Text models
The internet has not only become an integral part of everyday life but also an important data source for social science research. Instead of conducting costly surveys or experiments, many research questions can be answered using openly available data and innovative methods. Access to social media data, for example, allows us to measure the attitudes and issue agendas of political actors by analyzing their communication. Students will acquire valuable research skills including automated web data collection and quantitative text analysis using the statistical software R. These skills will enable students to generate datasets for their own empirical research projects. Additionally, we will discuss cutting-edge social science applications of these methods to showcase the advantages these methods have to offer for empirical research.
- Munzert, Simon, Christian Rubba, Peter Meißner, Dominic Nyhuis (2014). Automated Data Collection with R – A Practical Guide to Web Scraping and Text Mining. John Wiley & Sons, Chichester. https://doi.org/10.1002/9781118834732
- Grolemund, G., & Wickham, H. (2023). R for Data Science (2nd Edition). O’Reilly Media. https://r4ds.hadley.nz/
- Cornelius Erfort
Post-doctoral Researcher
University of Witten/Herdecke
Department of Philosophy, Politics, and Economics
Alfred-Herrhausen-Straße 50, 58455 Witten, Germany
[email protected]
ORCID: 0000-0001-8534-7748
This work was supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – 390285477/ GRK 2458.