forked from hadley/adv-r
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathExceptions-Debugging.rmd
690 lines (520 loc) · 35.1 KB
/
Exceptions-Debugging.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
---
title: Exceptions and debugging
layout: default
---
# Debugging, condition handling and defensive programming
What happens when something goes wrong with your R code? What do you do? What tools do you have to apply to the problem? This chapter will teach you how to fix unanticipated problems (debugging), show you how functions can communicate problems and how you can take action based on those communications (condition handling), and teach you how to avoid common problems before they occur (defensive programming).
Debugging is the art and science of fixing unexpected problems in your code. In this section you'll learn tools and techniques that help you get to the root cause of an error when you encounter it. You'll learn general strategies for debugging, and RStudio and R specific tools like `traceback()` and `browser()`.
Not all problems are unexpected. When writing a function, you can often anticipate potential problems (like a file not existing, or the wrong type of input). Communicating these problems back to the user is the job of __conditions__, which include errors, warnings and messages:
* Fatal errors are raised by `stop()` and force all execution to terminate.
Errors are used when there is no way for a function to continue.
* Warnings are generated by `warning()` and are used to display potential
problems, such as when some elements of a vectorised input are invalid,
like `log(-1:2)`.
* Messages are generated by `message()` and are used to give informative output
in a way that can easily be suppressed by the user (`?suppressMessages()`).
I often use messages to let the user know what value the function has chosen
for an important missing argument.
Conditions are usually displayed prominently, in a bold font or coloured red depending on your R interface. You can tell them apart because errors always start with "Error" and warnings with "Warning message". Function authors can also communicate with their users with `print()` or `cat()`, but I think that's a bad idea because it's hard to capture and selectively ignore this sort of output. Printed output is not a condition, so you can't use any of the useful condition handling tools you'll learn about below.
Condition handling tools, like `try()`, `tryCatch()` and `withCallingHandlers()`, allow you to take specific actions when a condition occurs. For example, if you're fitting many models, you might want to continue fitting the others even if one fails to converge. R offers an exceptionally powerful condition handling system based on ideas from Common Lisp, but it's currently not very well documented or often used. This chapter will introduce you to the most important basics, but if you want to learn more, I recommend the following two sources:
* [A prototype of a condition system for R](http://homepage.stat.uiowa.edu/~luke/R/exceptions/simpcond.html) by Robert Gentleman and Luke Tierney. This describes an early version of R's condition system. The implementation has changed somewhat since this was written, but it provides a good overview of how the pieces fit together, and some motivation for the design.
* [Beyond Exception Handling: Conditions and Restarts](http://www.gigamonkeys.com/book/beyond-exception-handling-conditions-and-restarts.html) by Peter Seibel. This describes exception handling in Lisp, which happens to be very similar to R's approach. It provides useful motivation and more sophisticated examples. I have provided an R translation of the chapter at <http://adv-r.had.co.nz/beyond-exception-handling.html>.
The chapter concludes with a discussion of "defensive" programming, avoiding common errors before they occur. You'll spend a little more time writing your code, but you'll save time in the long run by reducing errors and providing more informative error messages. The basic principle of defensive programming is to "fail fast", to raise an error as soon as you know there's something wrong, rather than trying to silently struggle through. In R, this has three particular applications: checking that inputs are correct, avoiding non-standard evaluation, and avoiding functions that can return different types of output.
## Debugging techniques
> Finding your bug is a process of confirming the many things
> that you believe are true — until you find one which is not
> true. \
> — Norm Matloff
Debugging code is challenging. Many bugs subtle and hard to find. Indeed, if a bug were obvious, you probably would've been able to avoid it in the first place. While it's true that with good technique, you can productively debug a problem with just `print()`, there are times when additional help would be welcome. In this section, we'll discuss some useful tools, which R and RStudio provide, and outline a general procedure for debugging.
While the procedure below is by no means foolproof, it will hopefully help you to organise your thoughts when debugging. There are four steps:
1. __Realise that you have a bug__
If you're reading this chapter, you've probably already completed this step.
It is a surprisingly important one: you can't fix a bug until you know it
exists. This is one reason why automated test suites are important when
producing high-quality code. Unfortunately, automated testing is outside the
scope of this book, but you can read more about it at
<http://adv-r.had.co.nz/Testing.html>.
2. __Make it repeatable__
Once you've determined you have a bug, you need to be able to reproduce it
on command. Without this, it becomes extremely difficult to isolate its
cause and to confirm that you've successfully fixed it.
Generally, you will start with a big block of code that you know causes the
error and then slowly whittle it down to get to the smallest possible
snippet that still causes the error. Binary search is particularly useful
for this. To do a binary search, you repeatedly remove half of the code
until you find the bug. This is fast because, with each step, you reduce the
amount of code to look through by half.
If it takes a long time to generate the bug, it's also worthwhile to figure
out how to generate it faster. The quicker you can do this, the quicker you
can figure out the cause.
As you work on creating a minimal example, you'll also discover similar
inputs that don't trigger the bug. Make note of them: they will be helpful
when diagnosing the cause of the bug.
If you're using automated testing, this is also a good time to create an
automated test case. If your existing test coverage is low, take the
opportunity to add some nearby tests to ensure that existing good behaviour
is preserved. This reduces the chances of creating a new bug.
3. __Figure out where it is__
If you're lucky, one of the tools in the following section will help you to
quickly identify the line of code that's causing the bug. Usually, however,
you'll have to think a bit more about the problem. It's a great idea to
adopt the scientific method. Generate hypotheses, design experiments to test
them and record your results. This may seem like a lot of work, but a
systematic approach will end up saving you time. I often waste a lot of time
relying on my intuition to solve a bug ("oh, it must be an off-by-one error,
so I'll just subtract 1 here"), when I would have been better off taking a
systematic approach.
4. __Fix it and test it__
Once you've found the bug, you need to figure out how to fix it and to check
that the fix actually worked. Again, it's very useful to have automated
tests in place. Not only does this help to ensure that you've actually fixed
the bug, it also helps to ensure you haven't introduced any new bugs in the
process. In the absence of automated tests, make sure to carefully record
the correct output, and check against the inputs that previously failed.
## Debugging tools
As well as a broad strategy to follow when debugging code, you also need some specific tools to apply. In this section you'll learn about tools provided with R and the RStudio IDE. Rstudio's integrated debugging support makes life easier, but it mostly exposes existing R tools in a user friendly way. I'll show you both the Rstudio way and the regular R way so that you can work with whatever environment you have. You may also want to refer to the official [Rstudio debugging documentation](http://www.rstudio.com/ide/docs/debugging/overview) which will always reflect the tools in the latest version of Rstudio.
There are three key debugging tools:
* The Rstudio error inspector and `traceback()` which list the sequence of
calls that lead to the error.
* Rstudio's "Rerun with Debug" tool and `options(error = browser)` which enter
an interactive session where the error occurred.
* Rstudio's breakpoints and `browser()` which enter an interactive session at
an arbitrary code location.
I'll explain each tool in more detail below.
You shouldn't need to use these tools when writing new functions. If you find yourself using them frequently with new code, you may want to reconsider your approach. Instead of trying to write one big function all at once, work interactively on small pieces. If you start small, you can quickly identify why something doesn't work, rather than struggling to identify the problem in a large function.
### Determining the sequence of calls
The first tool is the __call stack__, the sequence of calls that lead up to an error. Here's a simple an example: you can see that `f()` calls `g()` calls `h()` calls `i()` which adds together a number and a string creating a error:
```{r, eval = FALSE}
f <- function(a) g(a)
g <- function(b) h(b)
h <- function(c) i(c)
i <- function(d) "a" + d
f(10)
```
When we run this code in Rstudio we see:
![Initial traceback display](traceback-hidden.png)
If you click "Show traceback" you see:
![Traceback display after clicking "show traceback"](traceback-shown.png)
If you're not using Rstudio, you can use `traceback()` to get the same information:
```{r, eval = FALSE}
traceback()
# 4: i(c) at error.R#3
# 3: h(b) at error.R#2
# 2: g(a) at error.R#1
# 1: f(10)
```
You read the call stack from bottom to top: the initial call is `f()`, which eventually calls `i()` which triggers the error. If you're calling code that you `source()`d into R, the traceback will also display the location of the function, in the form `filename.r#linenumber`. These are clickable in Rstudio, and will take you to the corresponding line of code in the editor.
Sometimes this is enough information to let you track down the error and fix it. However, it's usually not: it shows you where the error occurred, but not why. The next useful tool is the interactive debugger, which allows you to pause execution of a function and interactively explore its state.
### Browsing on error
The easiest way to enter the interactive debugger is through RStudio's "Rerun with Debug" tool. This reruns the command that created the error, pausing execution where the error occurred. You're now in an interactive state inside the function, and you can interact with any object defined there. You'll see the corresponding code in the editor (with the statement that will be run next highlighted), objects in the current environment in the "Environment" pane, the call stack in a "Traceback" pane, and you can run arbitrary R code in the console.
As well as any regular R function, there are a few special commands you can use in debug mode. You can access them either with the Rstudio toolbar (![](debug-toolbar.png)) or with the keyboard:
* Next, `n`: executes the next step in the function. Be careful if you have a
variable named `n`; to print it you'll need to do `print(n)`.
* Continue, `c`: leaves interactive debugging and continues regular execution
of the function. This is useful if you've fixed the bad state and want to
check that the function proceeds correctly.
* Stop, `Q`: stops debugging, terminates the function and return to the global
workspace. Use this once you've figured out where the problem is, and you're
ready to fix it and reload the code.
There are two other slightly less useful commands that aren't available in the toolbar:
* Enter: repeats the previous command. I find this too easy to activate
accidentally, so I turn it off using `options(browserNLdisabled = TRUE)`.
* `where`: prints stack trace of active calls (the interactive equivalent of
`traceback`).
To enter this style of debugging outside of Rstudio, you can use the `error` option which specifies a function to run when an error occurs. The function most similar to Rstudio's debug is `browser()`: this will start an interactive console in the environment where the error occurred. Use `options(error = browser)` to turn it on, re-run the previous command, then use `options(error = NULL)` to return to the default error behaviour. You could automate this with the `browseOnce()` function as defined below:
```{r, eval = FALSE}
browseOnce <- function() {
old <- getOption("error")
function() {
options(error = old)
browser()
}
}
options(error = browseOnce())
f <- function() stop("!")
# Enters browser
f()
# Runs normally
f()
```
(You'll learn more about functions that return functions in [Functional programming](#functional-programming).)
There are two other useful functions that you can use with the `error` option:
* `recover` is a step up from `browser`, as it allows you to enter the
environment of any of the calls in the call stack. This is useful because
often the root cause of the error is a number of calls back.
* `dump.frames` is an equivalent to `recover` for non-interactive code. It
creates a `last.dump.rda` file in the current working directory. Then,
in a later interactive R session, you load that file, and use `debugger()`
to enter an interactive debugger with the same interface as `recover()`.
This allows interactive debugging of batch code.
```{r, eval = FALSE}
# In batch R process ----
dump_and_quit <- function() {
# Save debugging info to file last.dump.rda
dump.frames(to.file = TRUE)
# Quit R with error status
q(status = 1)
}
options(error = dump_and_quit)
# In a later interactive session ----
load("last.dump.rda")
debugger()
```
To reset error behaviour to the default, use `options(error = NULL)`. Then errors will print a message and abort function execution.
### Browsing arbitrary code
As well as entering an interactive console on error, you can enter it at an arbitrary code location by using either an Rstudio breakpoint or `browser()`. You can set a breakpoint in Rstudio by clicking to the left of the line number, or pressing `Shift + F9`, or equivalently, add `browser()` when you want execution to pause. Breakpoints behave similarly to `browser()` but they are easier to set (one click instead of nine key presses), and you don't run the risk of accidentally including a `browser()` statement in your source code. There are two small downsides to breakpoints:
* There are few unusual situations in which breakpoints will not work: read [breakpoint troubleshooting](http://www.rstudio.com/ide/docs/debugging/breakpoint-troubleshooting) for more details.
* Rstudio currently does not support conditional breakpoints, whereas you can
always put `browser()` inside an `if` statement.
As well as adding `browser()` yourself, there are two functions that will add it to code for you:
* `debug()` inserts a browser statement in the first line of the specified
function. `undebug()` will remove it, or you can use `debugonce()` to browse
only on the next run.
* `utils::setBreakpoint()` works similarly, but instead of taking a function
name, it takes a file name and line number and finds the appropriate function
for you.
These two functions are both special cases of `trace()`, which inserts arbitrary code at any position in an existing function. `trace()` is occasionally useful when you're debugging code that you don't have the source for. To remove tracing from a function, use `untrace()`. You can only perform one trace per function, but that one trace can call multiple functions.
### The call stack: `traceback()`, `where` and `recover()`.
Unfortunately the call stacks printed by `traceback()`, `browser()` + `where` and `recover()` are not consistent. The following table shows how the call stacks from a simple nested set of calls are displayed by the three tools.
`traceback()` `where` `recover()`
---------------- ----------------------- ------------
4: stop("Error") where 1: stop("Error") 1: f()
3: h(x) where 2: h(x) 2: g(x)
2: g(x) where 3: g(x) 3: h(x)
1: f() where 4: f()
Note that numbering is different between `traceback()` and `where`, and `recover()` displays calls in the opposite order, and omits the call to `stop()`. Rstudio displays calls in the same order as `traceback()` but omits the numbers.
```{r, eval = FALSE, echo = FALSE}
f <- function(x) g(x)
g <- function(x) h(x)
h <- function(x) stop("Error")
f(); traceback()
options(error = browser); f()
options(error = recover); f()
options(error = NULL)
```
### Other types of failure
There are other ways for a function to fail apart from throwing an error or returning an incorrect result.
* A function may generate an unexpected warning. The easiest way to track down
warnings is to convert them into errors with `options(warn = 2)` and use the
regular debugging tools. When you do this you'll see some extra calls
in the call stack, like `doWithOneRestart()`, `withOneRestart()`,
`withRestarts()` and `.signalSimpleWarning()`. Ignore these: they are
internal functions used to turn warnings into errors.
* A function may generate an unexpected message. There's no built in tool to
help solve this problem, but it's possible to create one:
```{r, eval = FALSE}
message2error <- function(code) {
withCallingHandlers(code, message = function(e) stop(e))
}
f <- function() g()
g <- function() message("Hi!")
g()
# Error in message("Hi!"): Hi!
message2error(g())
traceback()
# 10: stop(e) at #2
# 9: (function (e) stop(e))(list(message = "Hi!\n", call = message("Hi!")))
# 8: signalCondition(cond)
# 7: doWithOneRestart(return(expr), restart)
# 6: withOneRestart(expr, restarts[[1L]])
# 5: withRestarts()
# 4: message("Hi!") at #1
# 3: g()
# 2: withCallingHandlers(code, message = function(e) stop(e)) at #2
# 1: message2error(g())
```
As with warnings, you'll need to ignore some of the calls on the tracback
(i.e. the first two and the last seven).
* A function might never return. This is particularly hard to debug
automatically, but sometimes terminating the function and looking at the
call stack is informative. Otherwise, use the basic debugging strategies
described above.
* The worst scenario is that your code might crash R completely, leaving you
with no way to interactively debug your code. This indicates a bug in
underlying C code and is hard to debug. Sometimes an interactive debugger,
like `gdb`, can be useful, but describing how to use it is beyond the
scope of this book.
If the crash is caused by base R code, post a reproducible example to R-help.
If it's in a package, contact the package maintainer. If it's your own C or
C++ code, you'll need to use numerous `print()` statements to narrow down
the location of the bug, and then you'll need to use many more print
statements to figure out which data structure doesn't have the properties
that you expect.
## Condition handling
Unexpected errors require interactive debugging to figure out what went wrong. Some errors, however, are expected, and you want to handle them automatically. In R, expected errors crop up most frequently when you're fitting many models to different datasets, such as bootstrap replicates. Sometimes the model might fail to fit and throw an error, but you don't want to stop everything; instead you want to fit as many models as possible and then perform diagnostics after the fact.
In R, there are three tools for handling conditions (including errors) programmatically:
* `try()` gives you the ability to continue execution even when an error occurs.
* `tryCatch()` lets you specify __handler__ functions that control what
happens when a condition is signalled.
* `withCallingHandlers()` is a variant of `tryCatch()` that runs its handlers
in a different context. It is rarely needed, but is useful to be aware of.
The following sections describe them in more detail.
### Ignore errors with `try()`
`try()` allows execution to continue even after an error has occurred. For example, normally if you run a function that throws an error, it terminates immediately and doesn't return a value:
```{r, error = TRUE}
f1 <- function(x) {
log(x)
10
}
f1("x")
```
However, if you wrap the statement that creates the error in `try()`, the error message will be printed but execution will continue:
```{r, error = TRUE}
f2 <- function(x) {
try(log(x))
10
}
f2()
```
You can suppress the message with `try(..., silent = TRUE)`.
To pass larger blocks of code to `try()`, wrap them in `{}`:
```{r}
try({
a <- 1
b <- "x"
a + b
})
```
You can also capture the output of the `try()` function. If successful, it will be the last result evaluated in the block (just like a function); if unsuccessful it will be an (invisible) object of class "try-error":
```{r}
success <- try(1 + 2)
failure <- try("a" + "b")
str(success)
str(failure)
```
`try()` is particularly useful when you're applying a function to multiple elements in a list:
```{r, error = TRUE}
elements <- list(1:10, c(-1, 10), c(T, F), letters)
results <- lapply(elements, log)
results <- lapply(elements, function(x) try(log(x)))
```
There isn't a built-in function for testing for this class, so we'll define one. Then you can easily find the locations of errors with `sapply()` (as discussed in [Functions](#functions)), and extract the successes or look at the inputs that lead to failures.
```{r}
is.error <- function(x) inherits(x, "try-error")
succeeded <- !sapply(results, is.error)
# look at successful results
str(results[succeeded])
# look at inputs that failed
str(elements[!succeeded])
```
Another useful `try()` idiom is using a default value if an expression fails. Simply assign the default value outside the try block, and then run the risky code:
```{r, eval = FALSE}
default <- NULL
try(default <- read.csv("possibly-bad-input.csv"), silent = TRUE)
```
There is also `failwith()`, which makes this pattern even easier, as discussed in [Function Operators](#function-operators).
### Handle conditions with `tryCatch()`
`tryCatch()` is a general tool for handling conditions: as well as errors you can take different actions for warnings, messages and interrupts. You've seen errors (made by `stop()`), warnings (`warning()`) and messages (`message()`) before, but interrupts are new. They can't be generated directly by the programmer, but are raised when the user attempts to terminate execution by pressing Ctrl + Break, Escape, or Ctrl + C (depending on the platform).
With `tryCatch()` you map conditions to handlers, named functions that are passed the condition as an input. If a condition is signalled, `tryCatch` will call the first handler whose name matches one of the classes of the condition. The only useful built-in names are `error`, `warning`, `message`, `interrupt` and the catch-all `condition`.
A handler function can do anything, but typically it will either return a value or create a more informative error message. For example, the `show_condition()` function below sets up handlers that return the type of condition signalled:
```{r}
show_condition <- function(code) {
tryCatch(code,
error = function(c) "error",
warning = function(c) "warning",
message = function(c) "message"
)
}
show_condition(stop("!"))
show_condition(warning("?!"))
show_condition(message("?"))
# If no condition is captured, tryCatch returns the value of the input
show_condition(10)
```
You can use `tryCatch()` to implement `try()`. A simple implementation is shown below: the real version is more complicated to make the error message look more like what you'd see if `tryCatch()` wasn't used. Note the use of `conditionMessage()` to extract the message associated with the original error.
```{r}
try2 <- function(code, silent = FALSE) {
tryCatch(code, error = function(c) {
msg <- conditionMessage(c)
if (!silent) message("Error: ", c)
invisible(structure(msg, class = "try-error"))
})
}
try2(1)
try2(stop("Hi"))
try2(stop("Hi"), silent = TRUE)
```
As well as returning default values when a condition is signalled, handlers can be used to make more informative error messages. For example, the following function wraps around `read.csv()` to add the file name to any errors by modifying the message stored in error condition object:
```{r, error = TRUE}
read.csv2 <- function(file, ...) {
tryCatch(read.csv(file, ...), error = function(c) {
c$message <- paste0(c$message, " ( in ", file, ")")
stop(c)
})
}
read.csv("code/dummy.csv")
read.csv2("code/dummy.csv")
```
Catching interrupts can be useful if you want to take special action when the user tries to abort running code. But be careful, it's easy to create a loop that you can never escape! (unless you kill R)
```{r, eval = FALSE}
# Don't let the user interrupt the code
i <- 1
while(i < 3) {
tryCatch({
Sys.sleep(0.5)
message("Try to escape")
}, interrupt = function(x) {
message("Try again!")
i <<- i + 1
})
}
```
`tryCatch()` has one other argument: `finally`, which specifies a block of code (not a function) to run regardless of whether of the initial expression succeeds or fails. This can be useful for clean up (e.g. deleting files, closing connections). This is functionally equivalent to using `on.exit()` but it can wrap smaller chunks of code than an entire function.
### `withCallingHandlers()`
An alternative to `tryCatch()` is `withCallingHandlers()`. There are two main differences between the functions:
* The return value of `tryCatch()` handlers is returned by `tryCatch()`, where
the return value of `withCallingHandlers()` handlers is ignored:
```{r, error = TRUE}
f <- function() stop("!")
tryCatch(f(), error = function(e) 1)
withCallingHandlers(f(), error = function(e) 1)
```
* The handlers in `withCallingHandlers()` are called in the context of the
call that generated the condition; the handlers in `tryCatch()` are called
in the context of `tryCatch()`. (`sys.calls()` is the run-time equivalent of
`traceback()`, listing all calls leading to the current function.)
```{r, eval = FALSE}
f <- function() g()
g <- function() h()
h <- function() stop("!")
tryCatch(f(), error = function(e) print(sys.calls()))
# [[1]] tryCatch(f(), error = function(e) print(sys.calls()))
# [[2]] tryCatchList(expr, classes, parentenv, handlers)
# [[3]] tryCatchOne(expr, names, parentenv, handlers[[1L]])
# [[4]] value[[3L]](cond)
withCallingHandlers(f(), error = function(e) print(sys.calls()))
# [[1]] withCallingHandlers(f(), error = function(e) print(sys.calls()))
# [[2]] f()
# [[3]] g()
# [[4]] h()
# [[5]] stop("!")
# [[6]] .handleSimpleError(function (e) print(sys.calls()), "!", quote(h()))
# [[7]] h(simpleError(msg, call))
```
This also affects the order in which `on.exit()` is called.
These subtle differences are rarely useful, except when you're trying to capture exactly what went wrong and pass it on to another function. For most purposes, you should never need to use `withCallingHandlers()`.
### Custom signal classes
One of the challenges of error handling in R is that most functions just call `stop()` with a string. That means if you want to figure out if a particular error occurred, you have to look at the text of the error message. This is error prone, not only because the text of the error might change over time, but also because many error messages are translated, so the message might be completely different to what you expect.
R has a little known and little used feature to solve this problem. Conditions are S3 classes, so you can define your own classes if you want to distinguish different types of error. Each condition signalling function, `stop()`, `warning()` and `message()` can be given either a list of strings, or a custom S3 condition object. Custom condition objects are not used very often, but are very useful because they make it possible for the user to respond to different errors in different ways. For example, "expected" errors (like a model failing to converge for some input datasets) can be silently ignored, while unexpected errors (like no disk space available) can be propagated to the user.
R doesn't come with a built-in constructor function for conditions, but we can easily add one. Conditions must contain `message` and `call` components, and may contain other useful components. When creating a new condition, it should always inherit from `condition` and one of `error`, `warning` and `message`.
```{r}
condition <- function(subclass, message, call = sys.call(-1), ...) {
structure(
class = c(subclass, "condition"),
list(message = message, call = call),
...
)
}
is.condition <- function(x) inherits(x, "condition")
```
You can signal an arbitrary condition with `signalCondition()`, but nothing will happen unless you've instantiated a custom signal handler (with `tryCatch()` or `withCallingHandlers()`. Instead, use `stop()`, `warning()` or `message()` as appropriate to trigger the usual handling. R won't complain if the class of your condition doesn't match the function, but you should avoid this in real code.
```{r, eval = FALSE}
c <- condition(c("my_error", "error"), message = "This is an error")
signalCondition(c)
# NULL
stop(c)
# Error: This is an error
warning(c)
# Warning message: This is an error
message(c)
# This is an error
```
You can then use `tryCatch()` to take different actions for different types of errors. In this example we make a convenient `custom_stop()` function that allows us to signal error conditions with arbitrary classes. In a real application, it would be better to have individual S3 constructor functions that you could document, describing the error classes in more detail.
```{r}
custom_stop <- function(subclass, message, call = sys.call(-1), ...) {
c <- condition(c(subclass, "error"), message, call = call, ...)
stop(c)
}
my_log <- function(x) {
if (!is.numeric(x))
custom_stop("invalid_class", "my_log() needs numeric input")
if (any(x < 0))
custom_stop("invalid_value", "my_log() needs positive inputs")
log(x)
}
tryCatch(
my_log("a"),
invalid_class = function(c) "class",
invalid_value = function(c) "value"
)
```
Note that when using `tryCatch()` with multiple handlers and custom classes, the first handler to match any class in the signal's class hierarchy is called, not the best match. For this reason, you need to make sure to put the most specific handlers first:
```{r}
tryCatch(customStop("my_error", "!"),
error = function(c) "error",
my_error = function(c) "my_error"
)
tryCatch(custom_stop("my_error", "!"),
my_error = function(c) "my_error",
error = function(c) "error"
)
```
### Exercises
* Compare the following two implementations of `message2error()`. What is the
main advantage of `withCallingHandlers()` in this scenario? (Hint: look
carefully at the traceback.)
```{r}
message2error <- function(code) {
withCallingHandlers(code, message = function(e) stop(e))
}
message2error <- function(code) {
tryCatch(code, message = function(e) stop(e))
}
```
## Defensive programming
Defensive programming is the art of making code fail in a well-defined manner even when something unexpected occurs. A key principle of defensive programming is to "fail fast": as soon as you discover something is wrong, signal an error. This is more work for you as the function author, but will make it easier for the user to debug because they get errors early on, not after unexpected input has passed through several functions.
The principle of "fail fast" has three main applications in R:
* Be strict about what you accept. For example, if your function is not
vectorised in its inputs, but uses functions that are, make sure to check
that the inputs are scalars. You can use `stopifnot()`, the
[assertthat](https://github.com/hadley/assertthat) package or simple `if`
statements and `stop()`.
* Avoid functions that use non-standard evaluation, like `subset`, `transform`,
and `with`. These functions save time when used interactively, but because
they make assumptions to reduce typing, when they fail, they often fail with
uninformative error messages. You can learn more about non-standard evaluation in the
[metaprogramming](#metaprogramming) chapter.
* Avoid functions that return different types of output depending on their
input. The two biggest offenders are `[` and `sapply()`. Whenever
subsetting a data frame in a function, you should always use `drop = FALSE`,
otherwise you will accidentally convert 1-column data frames into vectors.
Similarly, never use `sapply()` inside a function: always use the stricter
`vapply()` which will throw an error if the inputs are incorrect types and
return the correct type of output even for zero-length inputs.
There is a tension between interactive analysis and programming. When you're doing an analysis, you want R to do what you mean, and if it guesses wrong, you'll discover it right away and you can fix it. When you're programming, you want functions with no magic that signal errors is anything is slightly wrong or underspecified. Keep this tension in mind when writing functions: If you're making a function to facilitate interactive data analysis, feel free to guess what the analyst wants and recover from minor misspecifications automatically; if you're making a function to program with, be strict, and never make guesses about what the caller wants.
### Exercises
* The goal of the `col_means()` function defined below is to compute the means
of all numeric columns in a data frame.
```{r}
col_means <- function(df) {
numeric <- sapply(df, is.numeric)
numeric_cols <- df[, numeric]
data.frame(lapply(numeric_cols, mean))
}
```
However, the function is not robust to unusual inputs. Look at
the following results, decide which ones are incorrect, and modify `col_means`
to be more robust. (Hint: there are two function calls in `col_means` that
are particularly prone to problems.)
```{r, eval = FALSE}
col_means(mtcars)
col_means(mtcars[, 0])
col_means(mtcars[0, ])
col_means(mtcars[, "mpg", drop = F])
col_means(1:10)
col_means(as.matrix(mtcars))
col_means(as.list(mtcars))
mtcars2 <- mtcars
mtcars2[-1] <- lapply(mtcars2[-1], as.character)
col_means(mtcars2)
```
* The following function "lags" a vector, returning a version of `x` that is `n`
values behind the original. Improve the function so that (1) it returns a
useful error message if `n` is not a vector, (2) it has reasonable behaviour
when `n` is 0 or longer than `x`.
```{r}
lag <- function(x, n = 1L) {
xlen <- length(x)
c(rep(NA, n), x[seq_len(xlen - n)])
}
```