-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathproject2.Rmd
349 lines (256 loc) · 15.1 KB
/
project2.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
---
title: "Chapter 2: Customizing graphs in ggplot2"
description: |
Learn how to customize the aesthetics, labels and axes of a graph in ggplot2.
author:
- name: Jewel Johnson
date: 12-04-2021
output:
distill::distill_article:
toc_depth: 3
preview: photos/ggplot2.png
---
```{r setup, include=FALSE}
library(ggplot2)
library(Stat2Data)
data("Hawks")
knitr::opts_chunk$set(echo = TRUE)
```
```{r, xaringanExtra-clipboard, echo=FALSE}
htmltools::tagList(
xaringanExtra::use_clipboard(
button_text = "<i class=\"fa fa-clone fa-2x\" style=\"color: #301e64\"></i>",
success_text = "<i class=\"fa fa-check fa-2x\" style=\"color: #90BE6D\"></i>",
error_text = "<i class=\"fa fa-times fa-2x\" style=\"color: #F94144\"></i>"
),
rmarkdown::html_dependency_font_awesome()
)
```
---
#start editing from here
---
After [Chapter 1](file:///home/jeweljohnson/Work/R_distill_github/jeweljohnson.github.io/docs/project1.html) you must be familiar with the different types of graphs that you can plot using `ggplot2`. So for this tutorial, we will be learning how to customize those ggplot graphs to our liking. We will learn how to tweak the aesthetics, how to change labels and how to modify and change the axes in a graph.
So let us plot a graph from scratch and learn how to use different aesthetics available.
## 1. Setting up the prerequisites
First, we need to install the `ggplot2` package in R as it does not come in the standard distribution of R. For the dataset, we will first download the `Stat2Data` package which houses a lot of cool datasets. For this tutorial let us use the `Hawks` dataset which showcases body measurements from three different species of Hawks. This data was collected by students and faculty at Cornell College in Mount Vernon and the dataset was made available by late Prof. Bob Black at Cornell College.
* To install packages in R we use the command `install.packages()` and to load packages we use the command `library()`. Therefore to install and load `ggplot2` and `Stats2Data` packages we use the following lines of command. Call the `Hawks` data using the `data()` command.
```{r, eval=FALSE}
install.packages("ggplot2")
install.packages("Stat2Data")
library(ggplot2)
library(Stat2Data)
data("Hawks")
```
* Let us look at how the dataset is structured. Use `str()` command
```{r, eval=FALSE}
str(Hawks)
```
```{r, echo=FALSE}
library(magrittr)
library(rmarkdown)
# to display the str() information in the form of a table use the following command below
# https://stackoverflow.com/questions/44200394/show-str-as-table-in-r-markdown (credits: scoa)
paged_table(data.frame(variable = names(Hawks), classes = sapply(Hawks, typeof),
first_values = sapply(Hawks, function(x) paste0(head(x), collapse = ", ")),
row.names = NULL))
```
So there is a lot of information in the dataset which we can use for plotting. So let us try plotting them.
## 2. Building a plot
One thing to remember here is that how ggplot2 builds a graph is by adding layers. Let us start by plotting the basic layer first where the x-axis shows 'weight of the hawks' and the y-axis shows 'wingspan of the hawks'.
```{r}
ggplot(data = Hawks, aes(x = Weight, y = Wing))
```
Wait a sec! Where are my data points? So right now if we look at the syntax of the ggplot code we can see that we have not told ggplot2 which geometry we want. Do we want a scatter plot or a histogram or any other type of graph? So let us plot a scatter plot first. Use `geom_point()` command. By adding `geom_point()` to the `ggplot()` command is equivalent to adding an extra layer to the already existing layer that we got previously. Let us also use `theme_bw()` for a nice looking theme.
```{r, warning=TRUE}
ggplot(data = Hawks, aes(x = Weight, y = Wing)) + geom_point() +
theme_bw()
```
We got the graph! but we also got a warning message. The warning message tells us that the dataset which we had used to plot the graph had 11 rows of NA values and which could not be plotted into the graph. In real-life cases, we can have datasets with NA values due to various reasons, so this is fine.
Now, this graph even though shows us data points we are not sure which point belongs to which species, as this dataset contains data for three species of Hawks. So let us try giving different colours to the points concerning the different species so that we are able to differentiate them.
### 1. Changing colour
* To change colour of the 'element' with respect to species, we have to add `colour = Species` within the `aes()` of the ggplot command. I use the general term 'element' here to emphasize that the same change in aesthetics will work for most of other types of geometries in ggplot2 (something which you have seen extensively in [Chapter 1](file:///home/jeweljohnson/Work/R_distill_github/jeweljohnson.github.io/docs/project1.html). Like for a line graph, the 'element' would be lines, here our 'element' is point.
```{r}
ggplot(data = Hawks, aes(x = Weight, y = Wing, colour = Species)) +
geom_point() + theme_bw()
```
The species abbreviations are the following: CH=Cooper's, RT=Red-tailed, SS=Sharp-Shinned.
Now, this graph is way better than the previous one.
### 2. Changing point shape.
* Now instead of the colour let us change the shape of the point. Use `shape()` command in `aes()`
```{r}
ggplot(data = Hawks, aes(x = Weight, y = Wing, shape = Species)) + #instead of colour use shape.
geom_point() + theme_bw()
```
Now we did change the shape of points but it is still hard to make out the difference. Let us try specifying colour along with the shape
* Adding both `colour` and `shape` in aesthetics
```{r}
ggplot(data = Hawks, aes(x = Weight, y = Wing,
colour = Species, shape = Species)) + geom_point() + theme_bw()
```
This plot is much better than the previous one.
Now let us try specifying `colour` within the `aes()` of the `geom()`
```{r}
ggplot(data = Hawks, aes(x = Weight, y = Wing)) +
geom_point(aes(colour = Species)) + theme_bw()
```
We got the same graph as before! So what is the difference in specifying `colour` within `aes()` of `ggplot()` compared to the same but within `geom()`. Let us look at another example.
```{r}
ggplot(data = Hawks, aes(x = Weight, y = Wing, colour = "red")) +
geom_point() + theme_bw()
```
I manually changed the colour of the points to red colour. Now let try specifying `colour` to the `aes()` within the `geom()`
```{r}
ggplot(data = Hawks, aes(x = Weight, y = Wing, colour = "red")) +
geom_point(aes(colour = Species)) + theme_bw()
```
You can see that the red colour is overridden by other colours. So the `aes()` mapping (in this case `colour`) within `geom()` will override any `aes()` mapping within `ggplot()`. And whatever `aes()` mapping we give within `ggplot()` will be inherited by all `geom()` layers that will be added along with the `ggplot()` layer.
Let us see another case.
```{r, code_folding=TRUE}
ggplot(data = Hawks, aes(x = Weight, y = Wing)) +
geom_point(colour = "darkred") + theme_bw()
```
```{r, code_folding=TRUE}
ggplot(data = Hawks, aes(x = Weight, y = Wing)) +
geom_point(aes(colour = "darkred")) + theme_bw()
```
If you compare both the codes, the only difference is that the `colour = "darkred"` command was outside `aes()` in the first code and inside `aes()` in the second code. So why didn't the second graph have the same dark-red coloured points as the first one? The reason is that in the first code we are explicitly told to have all data points to be coloured dark-red but that is not the case with the second code. In the second code, since we have specified it inside `aes()`, ggplot is trying to look for a variable called "darkred" inside the dataset and colour it accordingly. This is why the legend that appears in the second graph has listed "darkred" as a category. And ggplot fails to find the variable called "darkred" but it still recognizes the `colour` command line and colour all the points in red. So the bottom line is that R has a pre-determined way of reading a code, so we users should well-understand what each line is expected to do and should not expect R to just fill it in accordingly to what we write.
Now let us try a few other examples;
### 3. Changing size
* Use `size()` in `aes()`
```{r}
ggplot(data = Hawks, aes(x = Species, y = Hallux, size = Culmen)) +
geom_point() + theme_bw()
```
### 4. Changing colour, shape and size manually
* Use `scale_shape_manual()` for changing shape, similarly `scale_color_manual()` for changing colour and `scale_size_manual()` for changing size of the element.
```{r}
ggplot(data = Hawks, aes(x = Weight, y = Hallux, colour = Species,
shape = Species, size = Species)) +
geom_point() +
scale_shape_manual(values=c(1, 2, 3)) +
scale_color_manual(values=c('red','blue', 'green')) +
scale_size_manual(values=c(1,5,10)) + theme_bw()
```
```{r, echo=FALSE}
ggpubr::show_point_shapes()
```
### 5. Changing the opcaity of the elements
* Use `alpha()` within the `geom()` with a numeric value to change the opacity of the elements. This is useful for visualising large datasets such as this.
```{r}
ggplot(data = Hawks, aes(x = Weight, y = Wing, colour = Species)) +
geom_point(alpha = 1/5) + theme_bw()
```
The same commands also work for most of the other types of `geom()`. Now let us see a few other aesthetics in other types of geoms.
### 6. Changing fill colour
* Use `fill()` in `aes()`
```{r}
ggplot(data = Hawks, aes(x = Weight, fill = Species)) +
geom_histogram(bins = 25) + theme_bw()
```
* Use `scale_fill_manual()` to manually change the colours.
```{r}
ggplot(data = Hawks, aes(x = Weight, fill = Species)) +
geom_histogram(bins = 25) + theme_bw() +
scale_fill_manual(values = c("darkred", "darkblue", "darkgreen"))
```
### 7. Changing line type
* Use `linetype` in `aes()`
```{r}
ggplot(data = Hawks, aes(x = Weight, y = Wing, linetype = Species)) +
geom_line() + theme_bw()
```
* You can manually change line types using `scale_linetype_manual()`
```{r}
ggplot(data = Hawks, aes(x = Weight, y = Wing, linetype = Species)) +
geom_line() +
scale_linetype_manual(values= c("twodash", "longdash", "dotdash")) +
theme_bw()
```
```{r, echo=FALSE}
ggpubr::show_line_types()
```
Now let us also see how to change the labels in a graph.
### 8. Changing labels in the axes
* Use `xlab()` to change x-axis title, `ylab()` to change y-axis title, `ggtitle()` with `label` and `subtitle` to add title and subtitle respectively.
```{r}
ggplot(data = Hawks, aes(x = Weight, y = Wing, colour = Species)) +
geom_point() + theme_bw() + xlab("Weight (gm)") + ylab("Wing (mm)") +
ggtitle(label = "Weight vs Wing span in three different species of Hawks",
subtitle = "CH=Cooper's, RT=Red-tailed, SS=Sharp-Shinned")
```
* The same result can be obtained by using `labs()` to specify each label in the graph. For renaming the legend title, the command will depend on what is there within the `aes()` or in other words what is the legend based on.
```{r}
ggplot(data = Hawks, aes(x = Weight, y = Wing, colour = Species)) +
geom_point() + theme_bw() + labs(x = "Weight (gm)", y = "Wing (mm)",
title= "Weight vs Wing span in three different species of Hawks",
subtitle = "CH=Cooper's, RT=Red-tailed, SS=Sharp-Shinned",
caption = "Source: Hawk dataset from Stat2Data r-package", #caption for the graph
colour = "Hawk Species", # rename legend title
tag = "A") #figure tag
```
### 9. Tweaking the axes
* Use `xlim()` and `ylim()` for limiting x and y axes respectively.
```{r}
ggplot(data = Hawks, aes(x = Weight, y = Wing, colour = Species)) +
geom_point() + theme_bw() + xlim(c(0,1000)) + ylim(c(200,350))
```
* Use `coord_cartesian()` to zoom in on a particular area in the graph
```{r}
ggplot(data = Hawks, aes(x = Weight, y = Wing, colour = Species)) +
geom_point() + theme_bw() + coord_cartesian(xlim = c(0,1000),
ylim = c(200,350))
```
* Use `coord_flip()` to flip the x and y axes.
```{r}
ggplot(data = Hawks, aes(x = Weight, y = Wing, colour = Species)) +
geom_point() + theme_bw() + coord_flip()
```
* Use `scale_x_continuous()` for tweaking the x-axis. The same command work for the y-axis also. You can include `label()` inside the command to manually label the breaks of the axes.
```{r, code_folding=TRUE}
ggplot(data = Hawks, aes(x = Weight, y = Wing, colour = Species)) +
geom_point() + theme_bw() + scale_x_continuous(breaks = c(0,1000,2000))
```
```{r, code_folding=TRUE}
ggplot(data = Hawks, aes(x = Weight, y = Wing, colour = Species)) +
geom_point() + theme_bw() +
scale_x_continuous(breaks = c(0,1000,2000),label = c("low", "medium", "high"))
```
* Use `scale_y_reverse()` to display the y values in the descending order. Same command applies to x-axis also.
```{r}
ggplot(data = Hawks, aes(x = Weight, y = Wing, colour = Species)) +
geom_point() + theme_bw() + scale_y_reverse()
```
## Summary
In this tutorial, we learned how to modify aesthetic present for different geoms in ggplot2. Then we learned how to modify labels in a graph and finally, we learned how to modify and change the axes elements. This tutorial is in no way exhaustive of the different ways you can modify a graph as there many more methods which are not discussed here. Instead of trying to include everything, this tutorial tries to be a stepping stone to help students of R to learn the basics of tweaking a graph. Try to practice what is covered here using other datasets available in the r-package `Stat2Data`. Have a good day my friend!
---
#bottom buttons to guide to next/previous chapters.
---
<br>
<a href="project3.html" class="btn button_round" style="float: right;">Next chapter:<br> 3. Even more customizations in ggplot2</a>
<a href="project1.html" class="btn button_round" style="float: left;">Previous chapter:<br> 1. Data visualization using ggplot2</a>
<!-- adding share buttons on the right side of the page -->
<!-- AddToAny BEGIN -->
<div class="a2a_kit a2a_kit_size_32 a2a_floating_style a2a_vertical_style" style="right:0px; top:150px; data-a2a-url="https://jeweljohnsonj.github.io/jeweljohnson.github.io/" data-a2a-title="One-carat Blog">
<a class="a2a_button_twitter"></a>
<a class="a2a_button_whatsapp"></a>
<a class="a2a_button_telegram"></a>
<a class="a2a_button_google_gmail"></a>
<a class="a2a_button_pinterest"></a>
<a class="a2a_button_reddit"></a>
<a class="a2a_button_facebook"></a>
<a class="a2a_button_facebook_messenger"></a>
</div>
<script>
var a2a_config = a2a_config || {};
a2a_config.onclick = 1;
</script>
<script async src="https://static.addtoany.com/menu/page.js"></script>
<!-- AddToAny END -->
## References
1. H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016. Read more about ggplot2 [here](https://ggplot2.tidyverse.org/). You can also look at the [cheat sheet](https://github.com/rstudio/cheatsheets/blob/master/data-visualization-2.1.pdf) for all the syntax used in `ggplot2`. Also check this [out](https://github.com/erikgahner/awesome-ggplot2).
2. Ann Cannon, George Cobb, Bradley Hartlaub, Julie Legler, Robin Lock, Thomas Moore, Allan Rossman and Jeffrey Witmer (2019).
Stat2Data: Datasets for Stat2. R package version 2.0.0. https://CRAN.R-project.org/package=Stat2Data
## Last updated on {.appendix}
```{r, echo=FALSE}
Sys.time()
```