-
Notifications
You must be signed in to change notification settings - Fork 230
/
vectors.Rmd
805 lines (635 loc) · 27.2 KB
/
vectors.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
# Vectors {#vectors .r4ds-section}
## Introduction {#introduction-13 .r4ds-section}
```{r setup,message=FALSE,cache=FALSE}
library("tidyverse")
```
## Vector basics {#vector-basics .r4ds-section}
`r no_exercises()`
## Important types of atomic vector {#important-types-of-atomic-vector .r4ds-section}
### Exercise 20.3.1 {.unnumbered .exercise data-number="20.3.1"}
<div class="question">
Describe the difference between `is.finite(x)` and `!is.infinite(x)`.
</div>
<div class="answer">
To find out, try the functions on a numeric vector that includes at least one number and the four special values (`NA`, `NaN`, `Inf`, `-Inf`).
```{r}
x <- c(0, NA, NaN, Inf, -Inf)
is.finite(x)
!is.infinite(x)
```
The `is.finite()` function considers non-missing numeric values to be finite,
and missing (`NA`), not a number (`NaN`), and positive (`Inf`) and negative infinity (`-Inf`) to not be finite. The `is.infinite()` behaves slightly differently.
It considers `Inf` and `-Inf` to be infinite, and everything else, including non-missing numbers, `NA`, and `NaN` to not be infinite. See Table \@ref(tab:finite-infinite).
Table: (\#tab:finite-infinite) Results of `is.finite()` and `is.infinite()` for
numeric and special values.
| | `is.finite()` | `is.infinite()` |
|--------|---------------|-----------------|
| `1` | `TRUE` | `FALSE` |
| `NA` | `FALSE` | `FALSE` |
| `NaN` | `FALSE` | `FALSE` |
| `Inf` | `FALSE` | `TRUE` |
</div>
### Exercise 20.3.2 {.unnumbered .exercise data-number="20.3.2"}
<div class="question">
Read the source code for `dplyr::near()` (Hint: to see the source code, drop the `()`). How does it work?
</div>
<div class="answer">
The source for `dplyr::near` is:
```{r}
dplyr::near
```
Instead of checking for exact equality, it checks that two numbers are within a certain tolerance, `tol`.
By default the tolerance is set to the square root of `.Machine$double.eps`, which is the smallest floating point number that the computer can represent.
</div>
### Exercise 20.3.3 {.unnumbered .exercise data-number="20.3.3"}
<div class="question">
A logical vector can take 3 possible values. How many possible values can an integer vector take? How many possible values can a double take? Use Google to do some research.
</div>
<div class="answer">
For integers vectors, R uses a 32-bit representation. This means that it can represent up to $2^{32}$ different values with integers. One of these values is set aside for `NA_integer_`.
From the help for `integer`.
> Note that current implementations of R use 32-bit integers for integer vectors,
> so the range of representable integers is restricted to about +/-2*10^9: doubles
> can hold much larger integers exactly.
The range of integers values that R can represent in an integer vector is $\pm 2^{31} - 1$,
```{r}
.Machine$integer.max
```
The maximum integer is $2^{31} - 1$ rather than $2^{32}$ because 1 bit is used to
represent the sign ($+$, $-$) and one value is used to represent `NA_integer_`.
If you try to represent an integer greater than that value, R will return `NA` values.
```{r warning}
.Machine$integer.max + 1L
```
However, you can represent that value (exactly) with a numeric vector at the cost of
about two times the memory.
```{r}
as.numeric(.Machine$integer.max) + 1
```
The same is true for the negative of the integer max.
```{r}
-.Machine$integer.max - 1L
```
For double vectors, R uses a 64-bit representation. This means that they can hold up
to $2^{64}$ values exactly. However, some of those values are allocated to special values
such as `-Inf`, `Inf`, `NA_real_`, and `NaN`. From the help for `double`:
> All R platforms are required to work with values conforming to the IEC 60559
> (also known as IEEE 754) standard. This basically works with a precision of
> 53 bits, and represents to that precision a range of absolute values from
> about 2e-308 to 2e+308. It also has special values `NaN` (many of them),
> plus and minus infinity
> and plus and minus zero (although R acts as if these are the same). There are
> also denormal(ized) (or subnormal) numbers with absolute values above or below
> the range given above but represented to less precision.
The details of floating point representation and arithmetic are complicated, beyond
the scope of this question, and better discussed in the references provided below.
The double can represent numbers in the range of about $\pm 2 \times 10^{308}$, which is
provided in
```{r}
.Machine$double.xmax
```
Many other details for the implementation of the double vectors are given in the `.Machine` variable (and its documentation).
These include the base (radix) of doubles,
```{r}
.Machine$double.base
```
the number of bits used for the significand (mantissa),
```{r}
.Machine$double.digits
```
the number of bits used in the exponent,
```{r}
.Machine$double.exponent
```
and the smallest positive and negative numbers not equal to zero,
```{r}
.Machine$double.eps
.Machine$double.neg.eps
```
- Computerphile, "[Floating Point Numbers](https://www.youtube.com/watch?v=PZRI1IfStY0)"
- <https://en.wikipedia.org/wiki/IEEE_754>
- <https://en.wikipedia.org/wiki/Double-precision_floating-point_format>
- "[Floating Point Numbers: Why floating-point numbers are needed](https://floating-point-gui.de/formats/fp/)"
- Fabien Sanglard, "[Floating Point Numbers: Visually Explained](http://fabiensanglard.net/floating_point_visually_explained/)"
- James Howard, "[How Many Floating Point Numbers are There?](https://jameshoward.us/2015/09/09/how-many-floating-point-numbers-are-there/)"
- GeeksforGeeks, "[Floating Point Representation Basics](https://www.geeksforgeeks.org/floating-point-representation-basics/)"
- Chris Hecker, "[Lets Go to the (Floating) Point](http://chrishecker.com/images/f/fb/Gdmfp.pdf)", *Game Developer*
- Chua Hock-Chuan, [A Tutorial on Data Representation Integers, Floating-point Numbers, and Characters](http://www.ntu.edu.sg/home/ehchua/programming/java/datarepresentation.html)
- John D. Cook, "[Anatomy of a floating point number](https://www.johndcook.com/blog/2009/04/06/anatomy-of-a-floating-point-number/)"
- John D. Cook, "[Five Tips for Floating Point Programming](https://www.codeproject.com/Articles/29637/Five-Tips-for-Floating-Point-Programming)"
</div>
### Exercise 20.3.4 {.unnumbered .exercise data-number="20.3.4"}
<div class="question">
Brainstorm at least four functions that allow you to convert a double to an integer. How do they differ? Be precise.
</div>
<div class="answer">
The difference between to convert a double to an integer differ in how they deal with the fractional part of the double.
There are are a variety of rules that could be used to do this.
- Round down, towards $-\infty$. This is also called taking the `floor` of a number. This is the method the `floor()` function uses.
- Round up, towards $+\infty$. This is also called taking the `ceiling`. This is the method the `ceiling()` function uses.
- Round towards zero. This is the method that the `trunc()` and `as.integer()` functions use.
- Round away from zero.
- Round to the nearest integer. There several different methods for handling ties, defined as numbers with a fractional part of 0.5.
- Round half down, towards $-\infty$.
- Round half up, towards $+\infty$.
- Round half towards zero
- Round half away from zero
- Round half towards the even integer. This is the method that the `round()` function uses.
- Round half towards the odd integer.
```{r}
function(x, method) {
if (method == "round down") {
floor(x)
} else if (method == "round up") {
ceiling(x)
} else if (method == "round towards zero") {
trunc(x)
} else if (method == "round away from zero") {
sign(x) * ceiling(abs(x))
} else if (method == "nearest, round half up") {
floor(x + 0.5)
} else if (method == "nearest, round half down") {
ceiling(x - 0.5)
} else if (method == "nearest, round half towards zero") {
sign(x) * ceiling(abs(x) - 0.5)
} else if (method == "nearest, round half away from zero") {
sign(x) * floor(abs(x) + 0.5)
} else if (method == "nearest, round half to even") {
round(x, digits = 0)
} else if (method == "nearest, round half to odd") {
case_when(
# smaller integer is odd - round half down
floor(x) %% 2 ~ ceiling(x - 0.5),
# otherwise, round half up
TRUE ~ floor(x + 0.5)
)
} else if (method == "nearest, round half randomly") {
round_half_up <- sample(c(TRUE, FALSE), length(x), replace = TRUE)
y <- x
y[round_half_up] <- ceiling(x[round_half_up] - 0.5)
y[!round_half_up] <- floor(x[!round_half_up] + 0.5)
y
}
}
```
```{r}
tibble(
x = c(1.8, 1.5, 1.2, 0.8, 0.5, 0.2,
-0.2, -0.5, -0.8, -1.2, -1.5, -1.8),
`Round down` = floor(x),
`Round up` = ceiling(x),
`Round towards zero` = trunc(x),
`Nearest, round half to even` = round(x)
)
```
See the Wikipedia articles, [Rounding](https://en.wikipedia.org/wiki/Rounding) and [IEEE floating point](https://en.wikipedia.org/wiki/IEEE_floating_point) for more discussion of these rounding rules.
For rounding, R and many programming languages use the IEEE standard. This method is called "round to nearest, ties to even".[^rounding]
This rule rounds ties, numbers with a remainder of 0.5, to the nearest even number.
In this rule, half the ties are rounded up, and half are rounded down.
The following function, `round2()`, manually implements the "round to nearest, ties to even" method.
```{r}
x <- seq(-10, 10, by = 0.5)
round2 <- function(x, to_even = TRUE) {
q <- x %/% 1
r <- x %% 1
q + (r >= 0.5)
}
x <- c(-12.5, -11.5, 11.5, 12.5)
round(x)
round2(x, to_even = FALSE)
```
This rounding method may be different than the one you learned in grade school, which is, at least for me, was to always round ties upwards, or, alternatively away from zero.
This rule is called the "round half up" rule.
The problem with the "round half up" rule is that it is biased upwards for positive numbers.
Rounding to nearest with ties towards even is not.
Consider this sequence which sums to zero.
```{r}
x <- seq(-100.5, 100.5, by = 1)
x
sum(x)
```
A nice property of rounding preserved that sum.
Using the "ties towards even", the sum is still zero.
However, the "ties towards $+\infty$" produces a non-zero number.
```{r}
sum(x)
sum(round(x))
sum(round2(x))
```
Rounding rules can have real world impacts.
One notable example was that in 1983, the Vancouver stock exchange adjusted its index from 524.811 to 1098.892 to correct for accumulated error due to rounding to three decimal points (see [Vancouver Stock Exchange](https://en.wikipedia.org/wiki/Vancouver_Stock_Exchange)).
This [site](https://web.ma.utexas.edu/users/arbogast/misc/disasters.html) lists several more examples of the dangers of rounding rules.
</div>
### Exercise 20.3.5 {.unnumbered .exercise data-number="20.3.5"}
<div class="question">
What functions from the readr package allow you to turn a string into logical, integer, and double vector?
</div>
<div class="answer">
The function `parse_logical()` parses logical values, which can appear
as variations of TRUE/FALSE or 1/0.
```{r}
parse_logical(c("TRUE", "FALSE", "1", "0", "true", "t", "NA"))
```
The function `parse_integer()` parses integer values.
```{r}
parse_integer(c("1235", "0134", "NA"))
```
However, if there are any non-numeric characters in the string, including
currency symbols, commas, and decimals, `parse_integer()` will raise an error.
```{r error=TRUE}
parse_integer(c("1000", "$1,000", "10.00"))
```
The function `parse_number()` parses numeric values.
Unlike `parse_integer()`, the function `parse_number()` is more forgiving about the format of the numbers.
It ignores all non-numeric characters before or after the first number, as with `"$1,000.00"` in the example.
Within the number, `parse_number()` will only ignore grouping marks such as `","`.
This allows it to easily parse numeric fields that include currency symbols and comma separators in number strings without any intervention by the user.
```{r}
parse_number(c("1.0", "3.5", "$1,000.00", "NA", "ABCD12234.90", "1234ABC", "A123B", "A1B2C"))
```
</div>
## Using atomic vectors {#using-atomic-vectors .r4ds-section}
### Exercise 20.4.1 {.unnumbered .exercise data-number="20.4.1"}
<div class="question">
What does `mean(is.na(x))` tell you about a vector `x`? What about `sum(!is.finite(x))`?
</div>
<div class="answer">
I'll use the numeric vector `x` to compare the behaviors of `is.na()`
and `is.finite()`. It contains numbers (`-1`, `0`, `1`) as
well as all the special numeric values: infinity (`Inf`),
missing (`NA`), and not-a-number (`NaN`).
```{r}
x <- c(-Inf, -1, 0, 1, Inf, NA, NaN)
```
The expression `mean(is.na(x))` calculates the proportion of missing (`NA`) and not-a-number `NaN` values in a vector:
```{r}
mean(is.na(x))
```
The result of `r round(2 / 7, 3)` is equal to `2 / 7` as expected.
There are seven elements in the vector `x`, and two elements that are either `NA` or `NaN`.
The expression `sum(!is.finite(x))` calculates the number of elements in the vector that are equal to missing (`NA`), not-a-number (`NaN`), or infinity (`Inf`).
```{r}
sum(!is.finite(x))
```
Review the [Numeric](https://r4ds.had.co.nz/vectors.html#numeric) section for the differences between `is.na()` and `is.finite()`.
</div>
### Exercise 20.4.2 {.unnumbered .exercise data-number="20.4.2"}
<div class="question">
Carefully read the documentation of `is.vector()`. What does it actually test for? Why does `is.atomic()` not agree with the definition of atomic vectors above?
</div>
<div class="answer">
The function `is.vector()` only checks whether the object has no attributes other than names. Thus a `list` is a vector:
```{r}
is.vector(list(a = 1, b = 2))
```
But any object that has an attribute (other than names) is not:
```{r}
x <- 1:10
attr(x, "something") <- TRUE
is.vector(x)
```
The idea behind this is that object oriented classes will include attributes, including, but not limited to `"class"`.
The function `is.atomic()` explicitly checks whether an object is one of the atomic types ("logical", "integer", "numeric", "complex", "character", and "raw") or NULL.
```{r}
is.atomic(1:10)
is.atomic(list(a = 1))
```
The function `is.atomic()` will consider objects to be atomic even if they have extra attributes.
```{r}
is.atomic(x)
```
</div>
### Exercise 20.4.3 {.unnumbered .exercise data-number="20.4.3"}
<div class="question">
Compare and contrast `setNames()` with `purrr::set_names()`.
</div>
<div class="answer">
The function `setNames()` takes two arguments, a vector to be named and a vector
of names to apply to its elements.
```{r}
setNames(1:4, c("a", "b", "c", "d"))
```
You can use the values of the vector as its names if the `nm` argument is used.
```{r}
setNames(nm = c("a", "b", "c", "d"))
```
The function `set_names()` has more ways to set the names than `setNames()`.
The names can be specified in the same manner as `setNames()`.
```{r}
purrr::set_names(1:4, c("a", "b", "c", "d"))
```
The names can also be specified as unnamed arguments,
```{r}
purrr::set_names(1:4, "a", "b", "c", "d")
```
The function `set_names()` will name an object with itself if no `nm` argument is
provided (the opposite of `setNames()` behavior).
```{r}
purrr::set_names(c("a", "b", "c", "d"))
```
The biggest difference between `set_names()` and `setNames()` is that `set_names()` allows for using a function or formula to transform the existing names.
```{r}
purrr::set_names(c(a = 1, b = 2, c = 3), toupper)
purrr::set_names(c(a = 1, b = 2, c = 3), ~toupper(.))
```
The `set_names()` function also checks that the length of the names argument is the
same length as the vector that is being named, and will raise an error if it is not.
```{r error=TRUE}
purrr::set_names(1:4, c("a", "b"))
```
The `setNames()` function will allow the names to be shorter than the vector being
named, and will set the missing names to `NA`.
```{r}
setNames(1:4, c("a", "b"))
```
</div>
### Exercise 20.4.4 {.unnumbered .exercise data-number="20.4.4"}
<div class="question">
Create functions that take a vector as input and returns:
1. The last value. Should you use `[` or `[[`?
1. The elements at even numbered positions.
1. Every element except the last value.
1. Only even numbers (and no missing values).
</div>
<div class="answer">
The answers to the parts follow.
1. This function find the last value in a vector.
```{r}
last_value <- function(x) {
# check for case with no length
if (length(x)) {
x[[length(x)]]
} else {
x
}
}
last_value(numeric())
last_value(1)
last_value(1:10)
```
The function uses `[[` in order to extract a single element.
1. This function returns the elements at even number positions.
```{r}
even_indices <- function(x) {
if (length(x)) {
x[seq_along(x) %% 2 == 0]
} else {
x
}
}
even_indices(numeric())
even_indices(1)
even_indices(1:10)
# test using case to ensure that values not indices
# are being returned
even_indices(letters)
```
1. This function returns a vector with every element except the last.
```{r not_last}
not_last <- function(x) {
n <- length(x)
if (n) {
x[-n]
} else {
# n == 0
x
}
}
not_last(1:3)
```
We should also confirm that the function works with some edge cases, like
a vector with one element, and a vector with zero elements.
```{r}
not_last(1)
not_last(numeric())
```
In both these cases, `not_last()` correctly returns an empty vector.
1. This function returns the elements of a vector that are even numbers.
```{r even_numbers}
even_numbers <- function(x) {
x[x %% 2 == 0]
}
even_numbers(-4:4)
```
We could improve this function by handling the special numeric values:
`NA`, `NaN`, `Inf`. However, first we need to decide how to handle them.
Neither `NaN` nor `Inf` are numbers, and so they are neither even nor odd.
In other words, since `NaN` nor `Inf` aren't *even* numbers, they aren't *even numbers*.
What about `NA`? Well, we don't know. `NA` is a number, but we don't know its
value. The missing number could be even or odd, but we don't know.
Another reason to return `NA` is that it is consistent with the behavior of other R functions,
which generally return `NA` values instead of dropping them.
```{r even_numbers2}
even_numbers2 <- function(x) {
x[!is.infinite(x) & !is.nan(x) & (x %% 2 == 0)]
}
even_numbers2(c(0:4, NA, NaN, Inf, -Inf))
```
</div>
### Exercise 20.4.5 {.unnumbered .exercise data-number="20.4.5"}
<div class="question">
Why is `x[-which(x > 0)]` not the same as `x[x <= 0]`?
</div>
<div class="answer">
These expressions differ in the way that they treat missing values.
Let's test how they work by creating a vector with positive and negative integers,
and special values (`NA`, `NaN`, and `Inf`). These values should encompass
all relevant types of values that these expressions would encounter.
```{r}
x <- c(-1:1, Inf, -Inf, NaN, NA)
x[-which(x > 0)]
x[x <= 0]
```
The expressions `x[-which(x > 0)]` and `x[x <= 0]` return the same values except
for a `NaN` instead of an `NA` in the expression using which.
So what is going on here? Let's work through each part of these expressions and
see where the different occurs.
Let's start with the expression `x[x <= 0]`.
```{r}
x <= 0
```
Recall how the logical relational operators (`<`, `<=`, `==`, `!=`, `>`, `>=`) treat `NA` values.
Any relational operation that includes a `NA` returns an `NA`.
Is `NA <= 0`? We don't know because it depends on the unknown value of `NA`, so the answer is `NA`.
This same argument applies to `NaN`. Asking whether `NaN <= 0` does not make sense because you can't compare a number to "Not a Number".
Now recall how indexing treats `NA` values.
Indexing can take a logical vector as an input.
When the indexing vector is logical, the output vector includes those elements where the logical vector is `TRUE`, and excludes those elements where the logical vector is `FALSE`.
Logical vectors can also include `NA` values, and it is not clear how they should be treated.
Well, since the value is `NA`, it could be `TRUE` or `FALSE`, we don't know.
Keeping elements with `NA` would treat the `NA` as `TRUE`, and dropping them would treat the `NA` as `FALSE`.
The way R decides to handle the `NA` values so that they are treated differently than `TRUE` or `FALSE` values is to include elements where the indexing vector is `NA`, but set their values to `NA`.
Now consider the expression `x[-which(x > 0)]`.
As before, to understand this expression we'll work from the inside out.
Consider `x > 0`.
```{r}
x > 0
```
As with `x <= 0`, it returns `NA` for comparisons involving `NA` and `NaN`.
What does `which()` do?
```{r}
which(x > 0)
```
The `which()` function returns the indexes for which the argument is `TRUE`.
This means that it is not including the indexes for which the argument is `FALSE` or `NA`.
Now consider the full expression `x[-which(x > 0)]`?
The `which()` function returned a vector of integers.
How does indexing treat negative integers?
```{r}
x[1:2]
x[-(1:2)]
```
If indexing gets a vector of positive integers, it will select those indexes;
if it receives a vector of negative integers, it will drop those indexes.
Thus, `x[-which(x > 0)]` ends up dropping the elements for which `x > 0` is true,
and keeps all the other elements and their original values, including `NA` and `NaN`.
There's one other special case that we should consider. How do these two expressions work with
an empty vector?
```{r}
x <- numeric()
x[x <= 0]
x[-which(x > 0)]
```
Thankfully, they both handle empty vectors the same.
This exercise is a reminder to always test your code. Even though these two expressions looked
equivalent, they are not in practice. And when you do test code, consider both
how it works on typical values as well as special values and edge cases, like a
vector with `NA` or `NaN` or `Inf` values, or an empty vector. These are where
unexpected behavior is most likely to occur.
</div>
### Exercise 20.4.6 {.unnumbered .exercise data-number="20.4.6"}
<div class="question">
What happens when you subset with a positive integer that’s bigger than the length of the vector? What happens when you subset with a name that doesn’t exist?
</div>
<div class="answer">
Let's consider the named vector,
```{r}
x <- c(a = 10, b = 20)
```
If we subset it by an integer larger than its length, it returns a vector of missing values.
```{r}
x[3]
```
This also applies to ranges.
```{r}
x[3:5]
```
If some indexes are larger than the length of the vector, those elements are `NA`.
```{r}
x[1:5]
```
Likewise, when `[` is provided names not in the vector's names, it will return
`NA` for those elements.
```{r}
x["c"]
x[c("c", "d", "e")]
x[c("a", "b", "c")]
```
Though not yet discussed much in this chapter, the `[[` behaves differently.
With an atomic vector, if `[[` is given an index outside the range of the vector or an invalid name, it raises an error.
```{r error=TRUE}
x[["c"]]
```
```{r error=TRUE}
x[[5]]
```
</div>
## Recursive vectors (lists) {#lists .r4ds-section}
### Exercise 20.5.1 {.unnumbered .exercise data-number="20.5.1"}
<div class="question">
Draw the following lists as nested sets:
1. `list(a, b, list(c, d), list(e, f))`
1. `list(list(list(list(list(list(a))))))`
</div>
<div class="answer">
There are a variety of ways to draw these graphs.
The original diagrams in *R for Data Science* were produced with [Graffle](https://www.omnigroup.com/omnigraffle).
You could also use various diagramming, drawing, or presentation software, including Adobe Illustrator, Inkscape, PowerPoint, Keynote, and Google Slides.
For these examples, I generated these diagrams programmatically using the
[DiagrammeR](http://rich-iannone.github.io/DiagrammeR/graphviz_and_mermaid.html) R package to render [Graphviz](https://www.graphviz.org/) diagrams.
1. The nested set diagram for
`list(a, b, list(c, d), list(e, f))`
is:[^DiagrammeR]
```{r echo=FALSE,include=FALSE,purl=FALSE}
file_nested_set_1 <- here::here("diagrams", "nested_set_1.dot")
DiagrammeR::grViz(file_nested_set_1)
```
1. The nested set diagram for
`list(list(list(list(list(list(a))))))`
is:
```{r echo=FALSE,cache=FALSE,purl=FALSE}
file_nested_set_2 <- here::here("diagrams", "nested_set_2.dot")
DiagrammeR::grViz(file_nested_set_2)
```
</div>
### Exercise 20.5.2 {.unnumbered .exercise data-number="20.5.2"}
<div class="question">
What happens if you subset a `tibble` as if you’re subsetting a list? What are the key differences between a list and a `tibble`?
</div>
<div class="answer">
Subsetting a `tibble` works the same way as a list; a data frame can be thought of as a list of columns.
The key difference between a list and a `tibble` is that all the elements (columns) of a tibble must have the same length (number of rows).
Lists can have vectors with different lengths as elements.
```{r}
x <- tibble(a = 1:2, b = 3:4)
x[["a"]]
x["a"]
x[1]
x[1, ]
```
</div>
## Attributes {#attributes .r4ds-section}
`r no_exercises()`
## Augmented vectors {#augmented-vectors .r4ds-section}
### Exercise 20.7.1 {.unnumbered .exercise data-number="20.7.1"}
<div class="question">
What does `hms::hms(3600)` return? How does it print? What primitive type is the augmented vector built on top of? What attributes does it use?
</div>
<div class="answer">
```{r}
x <- hms::hms(3600)
class(x)
x
```
`hms::hms` returns an object of class, and prints the time in "%H:%M:%S" format.
The primitive type is a double
```{r}
typeof(x)
```
The attributes is uses are `"units"` and `"class"`.
```{r}
attributes(x)
```
</div>
### Exercise 20.7.2 {.unnumbered .exercise data-number="20.7.2"}
<div class="question">
Try and make a tibble that has columns with different lengths. What happens?
</div>
<div class="answer">
If I try to create a tibble with a scalar and column of a different length there are no issues, and the scalar is repeated to the length of the longer vector.
```{r}
tibble(x = 1, y = 1:5)
```
However, if I try to create a tibble with two vectors of different lengths (other than one), the `tibble` function throws an error.
```{r error=TRUE}
tibble(x = 1:3, y = 1:4)
```
</div>
### Exercise 20.7.3 {.unnumbered .exercise data-number="20.7.3"}
<div class="question">
Based on the definition above, is it OK to have a list as a column of a tibble?
</div>
<div class="answer">
If I didn't already know the answer, what I would do is try it out.
From the above, the error message was about vectors having different lengths.
But there is nothing that prevents a tibble from having vectors of different types: doubles, character, integers, logical, factor, date.
The later are still atomic, but they have additional attributes.
So, maybe there won't be an issue with a list vector as long as it is the same length.
```{r}
tibble(x = 1:3, y = list("a", 1, list(1:3)))
```
It works! I even used a list with heterogeneous types and there wasn't an issue.
In following chapters we'll see that list vectors can be very useful: for example, when processing many different models.
</div>
[^DiagrammeR]: These diagrams were created with the [DiagrammeR](https://rich-iannone.github.io/DiagrammeR/) package.
[^rounding]: See the documentation for `.Machine$double.rounding`.