-
Notifications
You must be signed in to change notification settings - Fork 10
/
Copy path07_Crime_Hotspot_Map.Rmd
260 lines (223 loc) · 10.2 KB
/
07_Crime_Hotspot_Map.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
---
title: "Making a Hotspot Map"
author:
- affiliation: University of Pennsylvania
email: [email protected]
name: Greg Ridgeway
- affiliation: University of Pennsylvania
email: [email protected]
name: Ruth Moyer
- affiliation: University of Pennsylvania
email: [email protected]
name: Li Sian Goh
date: "`r format(Sys.time(), '%B %d, %Y')`"
output:
html_document:
css: htmlstyle.css
---
<!-- HTML YAML header Ctrl-Shift-C to comment/uncomment -->
<!-- PDF YAML header Ctrl-Shift-C to comment/uncomment -->
<!-- --- -->
<!-- title: "Making a Hotspot Map" -->
<!-- author: -->
<!-- - Greg Ridgeway ([email protected]) -->
<!-- - Ruth Moyer ([email protected]) -->
<!-- date: "`r format(Sys.time(), '%B %d, %Y')`" -->
<!-- output: -->
<!-- pdf_document: -->
<!-- latex_engine: pdflatex -->
<!-- html_document: default -->
<!-- fontsize: 11pt -->
<!-- fontfamily: mathpazo -->
<!-- --- -->
<!-- Make RMarkdown cache the results -->
```{r echo=FALSE}
knitr::opts_chunk$set(echo=TRUE, cache=TRUE, cache.lazy=FALSE, out.width='100%')
```
<!-- A function for automating the numbering and wording of the exercise questions -->
```{r echo=FALSE}
.counterExercise <- 0
.exerciseQuestions <- NULL
.exNum <- function(.questionText="")
{
.counterExercise <<- .counterExercise+1
.questionText <- gsub("@@", "`", .questionText)
.exerciseQuestions <<- c(.exerciseQuestions, .questionText)
return(paste0(.counterExercise,". ",.questionText))
}
.exQ <- function(i)
{
return( paste0(i,". ",.exerciseQuestions[i]) )
}
```
<!-- This document can't be created after running Lesson #8 SQL part 2. To recreate this document, first rerun Lesson #6 SQL Part 1. -->
# Introduction
Now that we've learned how to work with SQL, we can make a hotspot map with a large dataset. The concept of determining the geographic locations where crime is most intense - crime hotspots - is very important to criminological theory as well as to the very practical question of where police officers should be deployed.
Here, we will make hotspot maps with our Chicago crime data (which requires use of SQL). Of course, if you had a small dataset, you could use these same methods to make a hotspot map using only R.
# Setting up the data
Let's first load up the `sqldf` library and reconnect to our Chicago crime database.
```{r comment="", results='hold', cache=FALSE}
library(sqldf)
con <- dbConnect(SQLite(), dbname="chicagocrime.db")
```
Let's run a SQL query to extract the two columns, `Latitude` and `Longitude` from our crime database, creating a datafame with one row for each crime incident in the dataset.
```{r comment="", results='hold'}
res <- dbSendQuery(con, "
SELECT Latitude, Longitude
FROM crime
WHERE Latitude IS NOT NULL AND
Longitude IS NOT NULL")
dataAllcrime <- fetch(res, n = -1)
dbClearResult(res)
```
How would your query differ if you wanted to do a hotspot map of only assaults?
```{r comment="", results='hold'}
res <- dbSendQuery(con, "
SELECT Latitude,Longitude
FROM crime
WHERE [Primary.Type]='ASSAULT' AND
Latitude IS NOT NULL AND
Longitude IS NOT NULL")
dataAssaults <- fetch(res, n = -1)
dbClearResult(res)
```
# Creating the maps
We'll be using two packages to make our maps, `ggmap` and `hexbin`, so be sure you have those installed.
```{r comment="", results='hold', cache=FALSE}
library(ggmap)
library(hexbin)
```
Next, we'll need to get a map of Chicago. Google Maps used to be readily available through `ggmap`, but as of June 2018 the account requirements became more complicated for Google Maps. We will use other map services instead here. Stamen is a design organization that provides maps of several styles. Here we define the bounding box for the City of Chicago.
```{r comment="", results='hold', message=FALSE}
chicago.map <- ggmap(get_map(c(-88.0, 41.6, -87.5, 42.1), source="stamen"))
chicago.map
```
There are other styles of Stamen maps including `toner`
```{r comment="", results='hold', message=FALSE}
print(ggmap(get_map(c(-88.0, 41.6, -87.5, 42.1), source="stamen", maptype="toner")))
```
and `watercolor`.
```{r comment="", results='hold', message=FALSE}
print(ggmap(get_map(c(-88.0, 41.6, -87.5, 42.1), source="stamen", maptype="watercolor")))
```
Let's plot as points on the Chicago map, a random sample of 100,000 rows from our dataset.
```{r comment="", results='hold'}
i <- sample(1:nrow(dataAllcrime), 100000)
chicago.map +
geom_point(aes(x=Longitude,y=Latitude),
data = dataAllcrime[i,],
alpha = 0.5,
color = "darkred",
size = 1)
```
It doesn't look terribly useful because it is just a giant red splotch. These are all the points where a crime occurred. There's no density to show us the high crime or low crime areas. But, we're making process with getting points onto a map!
Let's instead try plotting hexagonal bins, which the `hexbin` package let us do. Why hexagons? It turns out that squares tend to have problems in the corners. The precision is low there and our eye tends to get drawn to the parallel lines the square grid makes. Ideally, we want to use a shape that has a small perimeter-to-area ratio. Circles have the smallest perimeter-to-area ratio, but we can't use circles to tile over the map. Hexagons fall in between squares and circles.
Note: when you run the following lines of code, you will get the following message: "Coordinate system already present. Adding new coordinate system, which will replace the existing one." Don't panic, this message is supposed to appear.
```{r comment="", results='hold'}
chicago.map +
coord_cartesian() +
stat_binhex(aes(x=Longitude, y=Latitude),
bins = 60,
data = dataAllcrime)
```
Or try the following. Setting `alpha=2/4` makes the bins a little more transparent. Try comparing with `alpha=1/4` or `alpha=3/4`.
```{r comment="", results='hold'}
chicago.map +
coord_cartesian() +
stat_binhex(aes(x=Longitude, y=Latitude),
bins = 60,
alpha = 0.5,
data = dataAllcrime)
```
The previous map was a default monochromatic color (blue). You can change the color gradient. To see what colors are available, type `colors()`.
```{r comment="", results='hold'}
chicago.map +
coord_cartesian() +
stat_binhex(aes(x=Longitude, y=Latitude),
bins = 60,
alpha = 0.5,
data = dataAllcrime) +
scale_fill_gradient('Crime Density',
low="springgreen",
high="hotpink")
```
You can also create a map that shows the high crime areas in a way that is akin to elevation on a topographic map. The more concentrated the concentric areas, the higher the crime (or, in the case of a topographic map, the higher the terrain). Note that a plot using `stat_density2d()` requires a lot of memory. That's why it's a good idea to use a sample of cases.
```{r comment="", results='hold'}
i <- sample(1:nrow(dataAllcrime),100000)
chicago.map +
stat_density2d(aes(x=Longitude,y=Latitude),
bins = 5,
data = dataAllcrime[i,],
geom = 'density2d',
col = 'white')
```
Let's make a hotspot map with a gradient, red for high crime and green for low crime.
```{r comment="", results='hold'}
chicago.map +
stat_density2d(aes(x=Longitude, y=Latitude,
fill=..level..,
alpha=..level..),
data = dataAllcrime[i,],
geom = 'polygon') +
scale_fill_gradient('Crime Density',
low = "green",
high = "red") +
scale_alpha(range = c(.4, .75),
guide = FALSE) +
guides(fill = guide_colorbar(barwidth = 1.5, barheight = 10))
```
We can use `bins` to control the number of levels of colors plotted.
```{r comment="", results='hold'}
chicago.map +
stat_density2d(aes(x=Longitude, y=Latitude,
fill=..level..,
alpha=..level..),
bins = 20,
data = dataAllcrime[i,],
geom = 'polygon') +
scale_fill_gradient('Crime Density',
low = "green",
high = "red") +
scale_alpha(range = c(.4, .75),
guide = FALSE) +
guides(fill = guide_colorbar(barwidth = 1.5, barheight = 10))
```
Let's create hotspot maps with just motor vehicle theft, and let's make a map for each year between 2005 and 2014. Note the line with the SQL function `SUBSTR()`. The `SUBSTR()` command helps us grab a section of a string. Here's an example. Take a look at the `Date` column in our initial dataset.
```{r comment="", results='hold'}
res <- dbSendQuery(con, "SELECT Date
FROM crime
LIMIT 5")
fetch(res, n = -1)
dbClearResult(res)
```
We want only the year part of these dates. The year starts with the 7th character and has a length of 4 characters. So to extract just the year we select `SUBSTR(Date, 7, 4)`.
```{r comment="", results='hold'}
res <- dbSendQuery(con, "
SELECT Latitude,
Longitude,
SUBSTR(Date,7,4) AS year
FROM crime
WHERE [Primary.Type]='MOTOR VEHICLE THEFT' AND
Latitude IS NOT NULL AND
Longitude IS NOT NULL")
dataCarTheft <- fetch(res, n = -1)
dbClearResult(res)
for(year0 in 2005:2017)
{
print(
chicago.map +
stat_density2d(aes(x=Longitude, y=Latitude,
fill=..level..,
alpha=..level..),
size = 2,
bins = 4,
geom = 'polygon',
data = subset(dataCarTheft, year==year0)) +
scale_fill_gradient(year0) +
scale_alpha(range = c(.4, .75), guide = FALSE) +
guides(fill = guide_colorbar(barwidth = 1.5, barheight = 10))
)
# Sys.sleep(5) # uncomment this if you want a 5 second pause between plots
}
```
Many cities other than Chicago have accessible incident-level data so that you can try to make a hotspot map. Cities include Jersey City, Philadelphia, Los Angeles, Seattle, San Francisco, Baltimore, and Washington, DC.