-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.Rmd
75 lines (51 loc) · 6.21 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
---
title: "Olympics medalists, not by country, but by NUTS region"
author:
- name: Giorgio Comai
url: https://giorgiocomai.eu
affiliation: OBCT/CCI - EDJNet
affiliation_url: https://europeandatajournalism.eu/
orcid_id: 0000-0002-0515-9542
date: "`r Sys.Date()`"
output:
distill::distill_article:
toc: true
toc_depth: 3
includes:
in_header: header.html
---
```{r load_packages, messages = FALSE, warning = FALSE, echo = FALSE}
source(here::here("functions", "setup.R"))
source(here::here("functions", "extract_medals_from_wikipedia_page.R"))
source(here::here("functions", "get_list_of_lists.R"))
```
## What this is about
The Olympics medals table is notoriously all about countries and national teams. But what if the Olympic medal table was based on the number of medals won by regions, not by countries?
We check this based on the place of birth for all athletes born in one of the European NUTS regions; [NUTS](https://en.wikipedia.org/wiki/Nomenclature_of_Territorial_Units_for_Statistics) - Nomenclature of Territorial Units for Statistics - is a standardised classification of administrative entities defined by the European Union ([here’s the geographic dataset for download](https://ec.europa.eu/eurostat/web/gisco/geodata/statistical-units/territorial-units-statistics)).
We conduct this analysis for the 2024 Summer Olympics, and we checked these data for the 2020 Summer Olympics. But in principle the procedure can be expanded in order to include *all* modern Olympics, all over the world.
A map including all known places of birth including also non-European-born athletes, the corresponding dataset, and all code used to generate them are [shared in this repository](https://github.com/EDJNet/OlympicsGoNUTS).
## Haven't I seen this before?
This is an updated version of a similar excercise I conducted for the 2020 Summer Olympics.
- [Ranking European regions by Olympics medals](https://www.europeandatajournalism.eu/cp_data_news/ranking-european-regions-by-olympics-medals/)
- [The data you need to win the Olympics if you go NUTS](https://www.europeandatajournalism.eu/cp_data_news/the-data-you-need-to-win-the-olympics-if-you-go-nuts/)
- [repository with data and code](https://github.com/EDJNet/olympics2020nuts) / [web version](https://edjnet.github.io/olympics2020nuts/)
- [interactive full screeen map](https://edjnet.github.io/olympics2020nuts/medalists_map.html) with place of birth of all medalists at the 2020 Summer Olympics
## How do we go about it?
This whole data retrieval process is based on Wikipedia and Wikidata. Wikidata provides data in a more machine-readable format, but, at least in the past, the Wikipedia page listing all medalists was more complete and was updated much more quickly (there's a [WikiProject Olympics](https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Olympics), if you're looking for inputs on how to contribute). In brief, we get medalists from the Wikipedia page, then retrieve data about their place of birth from Wikidata, then with some geo-matching we attribute each of them to a NUTS region.
We can then combine these data with other statistics about these regions, such as population and economic indicators.
Due to statistical and administrative changes, geographic data about NUTS regions and other indicators such as population or economic activity are not always published at the same time. In particular, the latest NUTS dataset realeased for 2024 covers [more non EU-regions](https://ec.europa.eu/eurostat/web/nuts/non-eu-regions) in the Balkans, as well as Ukraine, but does not include the UK. In some countries, such as Netherlands and Portugal, administrative boundaries have changed, so population data are not immediately available for all NUTS 2024 regions. In order to maximise coverage and data availability, we include NUTS boundaries from 2021 for the UK, Netherlands and Portugal, as well as NUTS 2024 for all other countries. Population data for some NUTS regions (e.g. in Ukraine and Kosovo) are not available.
It is worth highlighting that each person receiving a medal is counted as a medal. This is very much unlike the official medal table: in the following table a victory, e.g., in a quadruple sculls rowing competition results in four golden medals, not one; the same is true for all other team sports. This is coherent with the principle of counting medals based on place of birth of athletes: ultimately, each of them will bring home a shiny medal each, not, e.g., a quarter of a medal. In terms of ranking, compared to the most established medal table, this favours regions and countries that are strongest in team sports.
## Datasets and maps
Even if the main objective of this initiative is to get data about ongoing and recent summer Olympics, this repository includes data for all Summer Olympics since 1960 __which have a dedicated page with all medal winners on Wikipedia__ ([see a list of all Olympics with full medalist pages available on Wikipedia](https://en.wikipedia.org/wiki/Category:Lists_of_Summer_Olympic_medalists_by_year)). This second criteria is important. __Only starting with 1988 all Olympics are present__. The data for missing Olympics seem to be available scattered around Wikipedia (and of course, elsewhere on-line), but until some patient Wikipedia contributor collates this data in a somewhat standardised page, this automatic approach cannot work. Data on medal winners in Wikidata itself are much more incomplete even for recent Olympics events; when they *are* available, they are used as a quality check to ensure consistency with data parsed from the medal table on Wikipedia.
**Warning**: data have not been thoroughly checked for accuracy, with the partial exception of 2020 and 2024 Olympics!
```{r link_to_all_html, results='asis', echo=FALSE}
years_v <- list_of_lists_df |>
dplyr::filter(year>=1960) |>
dplyr::pull(year)
purrr::walk(years_v, \(x) {
cat(stringr::str_c("- [", x, " Summer Olympics]","(", x,")"))
cat("\n")
})
```
## Credits
This dataset has been generated by Giorgio Comai (OBCT/CCI) within the scope of EDJNet, the European Data Journalism Network. It is released under a CC-BY license (Giorgio Comai/OBCT/EDJNet). Please do include a reference to EDJNet - the European Data Journalism Network if you use these data.