-
Notifications
You must be signed in to change notification settings - Fork 4
/
repo_structure.html
180 lines (176 loc) · 6.76 KB
/
repo_structure.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
---
layout: docs
---
<h1> Getting the most recent data </h1>
<p class="section-intro">
If you just browse the site, you’ll find that EveryPolitician simply links to
the most recently updated version of its data files.
</p>
<p>
But if you’re accessing the data programmatically, that might not work nicely
for you because the data is hosted on GitHub, which doesn’t serve the right
Content-type when your application fetches the data. Instead, <a
href="use_the_data.html#rawgit">you should get it from
<code>cdn.rawgit.com</code></a> (while you’re there, you should also read our
notes on <a href="use_the_data.html">using the data</a> in general).
</p>
<p>
<strong>In summary</strong>:
get the latest data file URLs from the
<code>popolo_url</code> and <code>csv_url</code> fields within
<code>countries.json</code>.
</p>
<h2>
The details
</h2>
<p>
EveryPolitician data is frequently updated. Typically you'll be interested in
the most recent data, but if you need to get old versions you can. This is
because all the files are being stored in git.
</p>
<p>
This means that every time we change the data, that change is given a unique
code (called a hash, which looks something like <code>0b47c0c</code>). You
can use this code to uniquely refer to the files as they were at that moment.
</p>
<p>
If you're familiar with git or GitHub, this will seem like second nature. But
if you're not, all you really need to know is that the URLs to the data
always have such a hash in them. This means that EveryPolitician data URLs
are really pointing at a <em>snapshot</em> of the data.
</p>
<p>
So if you want to get the <em>most recent version</em> of the data, make sure
you're using the latest hash in the URL. We put the data URLs in
<code>countries.json</code>, as shown below. So to get the most
recent data programmatically, you need to look in the current version of
<code>countries.json</code> —
possibly also inspecting the value of the <code>lastmod</code> field (which
contains the <abbr
title="number of seconds that have elapsed since January 1, 1970">Unix epoch</abbr>
time of the last time the data was updated) for the legislature you're interested in.
</p>
<h3>
Explaining <code>countries.json</code>
</h3>
<p>
EveryPolitician publishes a list in JSON format of
all the countries we have data for:
<a href="https://raw.githubusercontent.com/everypolitician/everypolitician-data/master/countries.json"><code>countries.json</code></a>.
This file includes metadata about what’s available to download, including the
URLs to use.
</p>
<p>
Key fields that <code>countries.json</code> has for each country include:
</p>
<ul>
<li>
<code>name</code>:
the name of the country to help human-reading of the data
</li>
<li>
<code>slug</code>:
the name with spaces replaced by hyphens and punctuation
stripped, so suitable for use in URLs and directory names
</li>
<li>
<code>code</code>:
the country code (for example, <code>EE</code> for Estonia),
which helps with programmatic lookups. We’re using
<a href="https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2">ISO 3166-1 alpha-2</a>
country codes where possible (with additional ISO 3166-2
three-letter codes where appropriate: for example, <code>GB-WLS</code> for Wales).
</li>
<li>
<code>legislatures</code>:
a list of legislatures within this county. There are lots of
ways governments are organised. Some have a single house, others
have separate upper and lower chambers. Sometimes a country
changes its entire legislative process completely, such as the
Council of Deputies in Libya replaced the General National
Congress. We separate each of these out individually.
</li>
</ul>
<p>
Within the legislatures, in addition to a similar <code>name</code>
and <code>slug</code>, these fields help you identify and locate the
data you want:
</p>
<ul>
<li>
<code>popolo_url</code>:
the fully-qualified URL to the Popolo JSON file
</li>
<li>
<code>popolo</code>:
the path to the Popolo JSON file
</li>
<li>
<code>lastmod</code>:
Unix time when the data was last modified
</li>
<li>
<code>legislative_periods</code>:
a list of each of the terms, or periods. Each one of those has these fields:
<ul>
<li><code>id</code>, <code>name</code>, <code>start_date</code>, and <code>slug</code></li>
<li>
<code>csv_url</code>:
the fully-qualified URL to the CSV file of data for this term
</li>
</ul>
</li>
</ul>
<h3>
Data URLs: <code>popolo_url</code> and <code>csv_url</code>
</h3>
<p>
The <code>countries.json</code> file contains the path and the URL of two types
of data file:
</p>
<ul>
<li>
<code>popolo_url</code> — the Popolo JSON data for the whole legislature
</li>
<li>
<code>csv_url</code> — the comma-separated data for individual terms within the legislature
</li>
</ul>
<p>
This is what an EveryPolitician data URL looks like:
</p>
<p class="url-demo-box">
<code><span class="url-part-domain">https://<em>domain</em></span><br>
/everypolitician/everypolitician-data<br>
/<span class="url-part-sha">commit-SHA</span><br>
/<span class="url-part-file">path-to-data-file</span></code>
</p>
<p>
You might expect the domain to be <code>https://github.com/</code>, but for
your application <a href="/use_the_data.html#rawgit">we recommend you use <code><span class="url-part-domain">https://cdn.rawgit.com</span></code></a>.
</p>
<p>
For example: the URLs for the most recent (at time of writing — and
that’s <em>already out of date</em>, which neatly demonstrates
why you need to know how to determine these SHAs) politicians’
data for the UK’s House of Commons legislature, for the 56th term
(which started 2015-05-08), look like this:
</p>
<ul>
<li>
JSON: <a style="font-size:80%;padding-left:0.5em" href="https://cdn.rawgit.com/everypolitician/everypolitician-data/0b47c0c/data/UK/Commons/ep-popolo-v1.0.json">try it</a>
<br>
<code><span span class="url-part-domain">https://cdn.rawgit.com</span>/everypolitician/everypolitician-data/<span span class="url-part-sha">0b47c0c</span>/<span class="url-part-file">data/UK/Commons/ep-popolo-v1.0.json</span></code>
</li>
</ul>
<ul>
<li>
CSV for the term “56th Parliament of the United Kingdom”:
<a style="font-size:80%;padding-left:0.5em" href="https://cdn.rawgit.com/everypolitician/everypolitician-data/0b47c0c/data/UK/Commons/term-56.csv">try it</a>
<br>
<code><span class="url-part-domain">https://cdn.rawgit.com</span>/everypolitician/everypolitician-data</span>/<span class="url-part-sha">0b47c0c</span>/<span class="url-part-file-alt">data/UK/Commons/term-56.csv</span></code>
</li>
</ul>