Data availability

Due to copyright, we are unable to provide the unprocessed scraped data used in this analysis. To ensure reproducability without violating copyright, we provide the word frequencies for each news article, the coreNLP output, all analyzed names with an article identifier, as well as any other associated data used in the analyses such as quotes and citations. We provide all of this data on github. We also provide data descriptions in our github README, under the header "Quick data folder overview".

Code availability

This manuscript was written using Manubot [@doi:10.1371/journal.pcbi.1007128] and is available on github: manuscript repository link. All code and metadata is also available on github, full analysis repository link, under a BSD 3-Clause License. The code to generate all main and supplemental figures are available as R markdown documents within our main analysis github, in the following subfolder: notebooks. We provide a docker image that can re-run the analysis pipeline using intermediate, pre-processed data and produce all the main and supplemental figures. To re-run the entire pipeline (including scraping), the docker image contains all necessary packages and code. Scraping code is available on our main analysis github, in the following subfolder: scraper. The shell scripts to re-run the entire analysis are provided in the README file in the github repository.

Acknowledgements

We would like to thank Jeffrey Perkel for asking thoughtful questions that spurred this line of research, and providing feedback and insight into the news-gathering process during the course of this project.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

06.resources.md

06.resources.md

Data availability

Code availability

Acknowledgements

Files

06.resources.md

Latest commit

History

06.resources.md

File metadata and controls

Data availability

Code availability

Acknowledgements