Debugging.Rmd

# Debugging

```{r setup, include = FALSE}
source("common.R")
```

## Introduction

What happens when something goes wrong with your R code? What do you do? What tools do you have to address the problem? This chapter will teach you how to fix unanticipated problems (debugging), show you how functions can communicate problems and how you can take action based on those communications (condition handling), and teach you how to avoid common problems before they occur (defensive programming).

Debugging is the art and science of fixing unexpected problems in your code. In this section you'll learn the tools and techniques that help you get to the root cause of an error. You'll learn general strategies for debugging, useful R functions like `traceback()` and `browser()`, and interactive tools in RStudio.

The chapter concludes with a discussion of "defensive" programming: ways to avoid common errors before they occur. In the short run you'll spend more time writing code, but in the long run you'll save time because error messages will be more informative and will let you narrow in on the root cause more quickly. The basic principle of defensive programming is to "fail fast", to raise an error as soon as something goes wrong. In R, this takes three particular forms: checking that inputs are correct, avoiding non-standard evaluation, and avoiding functions that can return different types of output.

### Quiz {-}

Want to skip this chapter? Go for it, if you can answer the questions below. Find the answers at the end of the chapter in [answers](#debugging-answers).

1. How can you find out where an error occurred?

1. What does `browser()` do? List the five useful single-key commands
   that you can use inside of a `browser()` environment.

### Outline {-}

1. [Debugging techniques](#debugging-techniques) outlines a general
   approach for finding and resolving bugs.
   
1. [Debugging tools](#debugging-tools) introduces you to the R functions
   and RStudio features that help you locate exactly where an error
   occurred.

1. [Defensive programming](#defensive-programming) introduces you to
   some important techniques for defensive programming, techniques that help
   prevent bugs from occurring in the first place.

## Techniques {#debugging-techniques}

> "Finding your bug is a process of confirming the many things
> that you believe are true — until you find one which is not
> true." 
> 
> ---Norm Matloff

Debugging code is challenging. Many bugs are subtle and hard to find. Indeed, if a bug was obvious, you probably would've been able to avoid it in the first place. While it's true that with a good technique, you can productively debug a problem with just `print()`, there are times when additional help would be welcome. In this section, we'll discuss some useful tools, which R and RStudio provide, and outline a general procedure for debugging. \index{debugging} \index{bugs}

While the procedure below is by no means foolproof, it will hopefully help you to organise your thoughts when debugging. There are four steps:

1. __Realise that you have a bug__

    If you're reading this chapter, you've probably already completed this step.
    It is a surprisingly important one: you can't fix a bug until you know it
    exists. This is one reason why automated test suites are important when
    producing high-quality code. Unfortunately, automated testing is outside the
    scope of this book, but you can read more about it at
    <http://r-pkgs.had.co.nz/tests.html>.

2. __Make it repeatable__

    Once you've determined you have a bug, you need to be able to reproduce it
    on command. Without this, it becomes extremely difficult to isolate its
    cause and to confirm that you've successfully fixed it.

    Generally, you will start with a big block of code that you know causes the
    error and then slowly whittle it down to get to the smallest possible
    snippet that still causes the error. Binary search is particularly useful
    for this. To do a binary search, you repeatedly remove half of the code
    until you find the bug. This is fast because, with each step, you reduce the
    amount of code to look through by half.

    If it takes a long time to generate the bug, it's also worthwhile to figure
    out how to generate it faster. The quicker you can do this, the quicker you
    can figure out the cause.

    As you work on creating a minimal example, you'll also discover similar
    inputs that don't trigger the bug. Make note of them: they will be helpful
    when diagnosing the cause of the bug.

    If you're using automated testing, this is also a good time to create an
    automated test case. If your existing test coverage is low, take the
    opportunity to add some nearby tests to ensure that existing good behaviour
    is preserved. This reduces the chances of creating a new bug.

3. __Figure out where it is__

    If you're lucky, one of the tools in the following section will help you to
    quickly identify the line of code that's causing the bug. Usually, however,
    you'll have to think a bit more about the problem. It's a great idea to
    adopt the scientific method. Generate hypotheses, design experiments to test
    them, and record your results. This may seem like a lot of work, but a
    systematic approach will end up saving you time. I often waste a lot of time
    relying on my intuition to solve a bug ("oh, it must be an off-by-one error,
    so I'll just subtract 1 here"), when I would have been better off taking a
    systematic approach.

4. __Fix it and test it__

    Once you've found the bug, you need to figure out how to fix it and to check
    that the fix actually worked. Again, it's very useful to have automated
    tests in place. Not only does this help to ensure that you've actually fixed
    the bug, it also helps to ensure you haven't introduced any new bugs in the
    process. In the absence of automated tests, make sure to carefully record
    the correct output, and check against the inputs that previously failed.

## Tools {#debugging-tools}

To implement a strategy of debugging, you'll need tools. In this section, you'll learn about the tools provided by R and the RStudio IDE. RStudio's integrated debugging support makes life easier by exposing existing R tools in a user friendly way. I'll show you both the R and RStudio ways so that you can work with whatever environment you use. You may also want to refer to the official [RStudio debugging documentation](https://support.rstudio.com/hc/en-us/articles/205612627-Debugging-with-RStudio) which always reflects the tools in the latest version of RStudio.

There are three key debugging tools:

* RStudio's error inspector and `traceback()` which list the sequence of calls
  that lead to the error.

* RStudio's "Rerun with Debug" tool and `options(error = browser)` which open
  an interactive session where the error occurred.

* RStudio's breakpoints and `browser()` which open an interactive session at
  an arbitrary location in the code.

I'll explain each tool in more detail below.

You shouldn't need to use these tools when writing new functions. If you find yourself using them frequently with new code, you may want to reconsider your approach. Instead of trying to write one big function all at once, work interactively on small pieces. If you start small, you can quickly identify why something doesn't work. But if you start large, you may end up struggling to identify the source of the problem.

### Determining the sequence of calls

The first tool is the __call stack__, the sequence of calls that lead up to an error. Here's a simple example: you can see that `f()` calls `g()` calls `h()` calls `i()` which adds together a number and a string creating a error: \index{call stack} \indexc{traceback()}

```{r, eval = FALSE}
f <- function(a) g(a)
g <- function(b) h(b)
h <- function(c) i(c)
i <- function(d) "a" + d
f(10)
```

When we run this code in RStudio we see:

```{r, echo = FALSE}
knitr::include_graphics("screenshots/traceback-hidden.png", dpi = 220)
```

Two options appear to the right of the error message: "Show Traceback" and "Rerun with Debug".  If you click "Show traceback" you see:

```{r, echo = FALSE}
knitr::include_graphics("screenshots/traceback-shown.png", dpi = 220)
```

If you're not using RStudio, you can use `traceback()` to get the same information:

```{r, eval = FALSE}
traceback()
# 4: i(c) at exceptions-example.R#3
# 3: h(b) at exceptions-example.R#2
# 2: g(a) at exceptions-example.R#1
# 1: f(10)
```

Read the call stack from bottom to top: the initial call is `f()`, which calls `g()`, then `h()`, then `i()`, which triggers the error. If you're calling code that you `source()`d into R, the traceback will also display the location of the function, in the form `filename.r#linenumber`. These are clickable in RStudio, and will take you to the corresponding line of code in the editor.

Sometimes this is enough information to let you track down the error and fix it. However, it's usually not. `traceback()` shows you where the error occurred, but not why. The next useful tool is the interactive debugger, which allows you to pause execution of a function and interactively explore its state.

### Browsing on error

The easiest way to enter the interactive debugger is through RStudio's "Rerun with Debug" tool. This reruns the command that created the error, pausing execution where the error occurred. You're now in an interactive state inside the function, and you can interact with any object defined there. You'll see the corresponding code in the editor (with the statement that will be run next highlighted), objects in the current environment in the "Environment" pane, the call stack in a "Traceback" pane, and you can run arbitrary R code in the console. \index{debugger, interactive}

As well as any regular R function, there are a few special commands you can use in debug mode. You can access them either with the RStudio toolbar (![](screenshots/debug-toolbar.png){height=20}) or with the keyboard:

* Next, `n`: executes the next step in the function. Be careful if you have a
  variable named `n`; to print it you'll need to do `print(n)`.

* Step into, ![](screenshots/step-into.png){height=20} or `s`: 
  works like next, but if the next step is a function, it will step into that
  function so you can work through each line.

* Finish, ![](screenshots/finish-loop.png){height=20} or `f`: 
  finishes execution of the current loop or function.

* Continue, `c`: leaves interactive debugging and continues regular execution
  of the function. This is useful if you've fixed the bad state and want to
  check that the function proceeds correctly.

* Stop, `Q`: stops debugging, terminates the function, and returns to the global
  workspace. Use this once you've figured out where the problem is, and you're
  ready to fix it and reload the code.

There are two other slightly less useful commands that aren't available in the toolbar:

* Enter: repeats the previous command. I find this too easy to activate
  accidentally, so I turn it off using `options(browserNLdisabled = TRUE)`.

* `where`: prints stack trace of active calls (the interactive equivalent of
  `traceback`).

To enter this style of debugging outside of RStudio, you can use the `error` option which specifies a function to run when an error occurs. The function most similar to RStudio's debug is `browser()`: this will start an interactive console in the environment where the error occurred. Use `options(error = browser)` to turn it on, re-run the previous command, then use `options(error = NULL)` to return to the default error behaviour. You could automate this with the `browseOnce()` function as defined below: \indexc{options(error)}

```{r, eval = FALSE}
browseOnce <- function() {
  old <- getOption("error")
  function() {
    options(error = old)
    browser()
  }
}
options(error = browseOnce())

f <- function() stop("!")
# Enters browser
f()
# Runs normally
f()
```

(You'll learn more about functions that return functions in [Functional programming](#functional-programming).)

There are two other useful functions that you can use with the `error` option:

* `recover` is a step up from `browser`, as it allows you to enter the
  environment of any of the calls in the call stack. This is useful because
  often the root cause of the error is a number of calls back. \indexc{recover()}

* `dump.frames` is an equivalent to `recover` for non-interactive code. It
  creates a `last.dump.rda` file in the current working directory. Then,
  in a later interactive R session, you load that file, and use `debugger()`
  to enter an interactive debugger with the same interface as `recover()`.
  This allows interactive debugging of batch code. \indexc{dump.frames()}

    ```{r, eval = FALSE}
    # In batch R process ----
    dump_and_quit <- function() {
      # Save debugging info to file last.dump.rda
      dump.frames(to.file = TRUE)
      # Quit R with error status
      q(status = 1)
    }
    options(error = dump_and_quit)

    # In a later interactive session ----
    load("last.dump.rda")
    debugger()
    ```

To reset error behaviour to the default, use `options(error = NULL)`. Then errors will print a message and abort function execution.

### Browsing arbitrary code

As well as entering an interactive console on error, you can enter it at an arbitrary code location by using either an RStudio breakpoint or `browser()`. You can set a breakpoint in RStudio by clicking to the left of the line number, or pressing `Shift + F9`. Equivalently, add `browser()` where you want execution to pause. Breakpoints behave similarly to `browser()` but they are easier to set (one click instead of nine key presses), and you don't run the risk of accidentally including a `browser()` statement in your source code. There are two small downsides to breakpoints: \indexc{browser()} \index{breakpoints}

* There are a few unusual situations in which breakpoints will not work: 
  read [breakpoint troubleshooting](http://www.rstudio.com/ide/docs/debugging/breakpoint-troubleshooting) for more details.

* RStudio currently does not support conditional breakpoints, whereas you 
  can always put `browser()` inside an `if` statement.

As well as adding `browser()` yourself, there are two other functions that will add it to code:

* `debug()` inserts a browser statement in the first line of the specified
  function. `undebug()` removes it. Alternatively, you can use `debugonce()`
  to browse only on the next run. \indexc{debug()}

* `utils::setBreakpoint()` works similarly, but instead of taking a function
  name, it takes a file name and line number and finds the appropriate function
  for you. \indexc{setBreakpoint()}

These two functions are both special cases of `trace()`, which inserts arbitrary code at any position in an existing function. `trace()` is occasionally useful when you're debugging code that you don't have the source for. To remove tracing from a function, use `untrace()`. You can only perform one trace per function, but that one trace can call multiple functions. \indexc{trace()}

### The call stack: `traceback()`, `where`, and `recover()`

Unfortunately, the call stacks printed by `traceback()`, `browser()` + `where`, and `recover()` are not consistent. The following table shows how the call stacks from a simple nested set of calls are displayed by the three tools. \index{call stack}

`traceback()`       `where`                    `recover()`
----------------    -----------------------    ------------
`4: stop("Error")`  `where 1: stop("Error")`   `1: f()`
`3: h(x)`           `where 2: h(x)`            `2: g(x)`
`2: g(x)`           `where 3: g(x)`            `3: h(x)`
`1: f()`            `where 4: f()`

Note that numbering is different between `traceback()` and `where`, and that `recover()` displays calls in the opposite order, and omits the call to `stop()`. RStudio displays calls in the same order as `traceback()` but omits the numbers.

```{r, eval = FALSE, echo = FALSE}
f <- function(x) g(x)
g <- function(x) h(x)
h <- function(x) stop("Error")
f(); traceback()
options(error = browser); f()
options(error = recover); f()
options(error = NULL)
```

### Other types of failure

There are other ways for a function to fail apart from throwing an error or returning an incorrect result.

* A function may generate an unexpected warning. The easiest way to track down
  warnings is to convert them into errors with `options(warn = 2)` and use the
  regular debugging tools. When you do this you'll see some extra calls
  in the call stack, like `doWithOneRestart()`, `withOneRestart()`,
  `withRestarts()`, and `.signalSimpleWarning()`. Ignore these: they are
  internal functions used to turn warnings into errors.
  \index{debugging!warnings}

* A function may generate an unexpected message. There's no built-in tool to
  help solve this problem, but it's possible to create one:
  \index{debugging!messages}

    ```{r, eval = FALSE}
    message2error <- function(code) {
      withCallingHandlers(code, message = function(e) stop(e))
    }

    f <- function() g()
    g <- function() message("Hi!")
    g()
    # Hi!   
    message2error(g())
    # Error in message("Hi!"): Hi!
    traceback()
    # 10: stop(e) at #2
    # 9: (function (e) stop(e))(list(message = "Hi!\n", 
    #      call = message("Hi!")))
    # 8: signalCondition(cond)
    # 7: doWithOneRestart(return(expr), restart)
    # 6: withOneRestart(expr, restarts[[1L]])
    # 5: withRestarts()
    # 4: message("Hi!") at #1
    # 3: g()
    # 2: withCallingHandlers(code, message = function(e) stop(e)) 
    #      at #2
    # 1: message2error(g())
    ```

    As with warnings, you'll need to ignore some of the calls on the traceback
    (i.e., the first two and the last six).

* A function might never return. This is particularly hard to debug
  automatically, but sometimes terminating the function and looking at the
  call stack is informative. Otherwise, use the basic debugging strategies
  described above.

* The worst scenario is that your code might crash R completely, leaving you
  with no way to interactively debug your code. This indicates a bug in the
  underlying C code. This is hard to debug. Sometimes an interactive debugger,
  like `gdb`, can be useful, but describing how to use it is beyond the
  scope of this book. \index{debugging!C code}

    If the crash is caused by base R code, post a reproducible example to R-help.
    If it's in a package, contact the package maintainer. If it's your own C or
    C++ code, you'll need to use numerous `print()` statements to narrow down
    the location of the bug, and then you'll need to use many more print
    statements to figure out which data structure doesn't have the properties
    that you expect.

## Quiz answers {#debugging-answers}

1. The most useful tool to determine where a error occurred is `traceback()`.
   Or use RStudio, which displays it automatically where an error occurs.
   
1. `browser()` pauses execution at the specified line and allows you to
   enter an interactive environment. In that environment, there are five
   useful commands: `n`, execute the next command; `s`, step into the 
   next function; `f`, finish the current loop or function; `c`, continue
   execution normally; `Q`, stop the function and return to the console.