forked from hadley/adv-r
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathSpecial-environments.rmd
412 lines (296 loc) · 14.8 KB
/
Special-environments.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
---
title: Special contexts
layout: default
---
# Special contexts
R's lexical scoping rules, lazy argument evaluation and first-class environments make it an excellent language in which to design special environments that allow you to create domain specific languages (DSLs). This chapter shows how you can use these ideas to evaluate code in special contexts that pervert the usual behaviour of R. We'll start with simple modifications then work our way up to evaluations that completely redefine ordinary function semantics in R.
This chapter uses ideas from the breadth of the book including non-standard evaluation, environment manipulation, active bindings, ...
* Evaluate code in a special context: `local`, `capture.output`, `with_*`
* Supply an expression instead of a function: `curve`
* Combine evaluation with extra processing: `test_that`, `assert_that`
* Create a full-blown DSL: `html`, `plotmath`, `deriv`, `parseNamespace`, `sql`
## Evaluate code in a special context
It's often useful to evaluate a chunk of code in a special context
* Temporarily modifying global state, like the working directory, environment variables, plot options or locale.
* Capture side-effects of a function, like the text it prints, or the warnings it emits
* Evaluate code in a new environment
### `with_something`
There are a number of parameters in R that have global state (e.g. `option()`, environmental variables, `par()`, ...) and it's useful to be able to run code temporarily in a different context. For example, it's often useful to be able to run code with the working directory temporarily set to a new location:
```{r}
in_dir <- function(dir, code) {
old <- setwd(dir)
on.exit(setwd(old))
force(code)
}
getwd()
in_dir("~", getwd())
```
The basic pattern is simple:
* We first set the directory to a new location, capturing the current location from the output of `setwd`.
* We then use `on.exit()` to ensure that the working directory is returned to the previous value regardless of how the function exits.
* Finally, we explicitly force evaluation of the code. (We don't actually need `force()` here, but it makes it clear to readers what we're doing)
We could use similar code to temporarily set global options:
```{r}
with_options <- function(opts, code) {
old <- options(opts)
on.exit(options(old))
force(code)
}
x <- 0.123456
print(x)
with_options(list(digits = 3), print(x))
with_options(list(digits = 1), print(x))
```
You might notice that there's a lot of similarity between these two functions. We could extract out that commonality with a function operator that takes the setter function as an input:
```{r}
with_something <- function(set) {
function(new, code) {
old <- set(new)
on.exit(set(old))
force(code)
}
}
```
Then we can easily generate a whole set of with functions:
```{r}
in_dir <- with_something(setwd)
with_options <- with_something(options)
with_par <- with_something(par)
```
However, many of the setter functions that affect global state in R don't return the previous values in a way that can easily be passed back in. In that case, like for `.libPaths()`, which controls where R looks for packages to load, we first create a wrapper that enforces the behaviour we want, and then use `with_something()`:
```{r}
set_libpaths <- function(paths) {
libpath <- normalizePath(paths, mustWork = TRUE)
old <- .libPaths()
.libPaths(paths)
invisible(old)
}
with_libpaths <- with_something(set_libpaths)
```
These functions (and a few others) are provided by the devtools package. See `?with_something` for more details.
### `capture.output`
`capture.output()` is a useful function when the output you really want from a function is printed to the console. It also allows you to work around badly written functions that have no way to suppress output to the console. For example, it's difficult to capture the output of `str()`:
```{r}
y <- 1:10
y_str <- str(y)
y_str
y_str <- capture.output(str(y))
y_str
```
To work its magic, `capture.output()` uses `sink()`, which allows you to redirect the output stream to an arbitrary connection. We'll first write a helper function that allows us to execute code in the context of a `sink()`, automatically un-`sink()`ing when the function finishes:
```{r}
with_sink <- function(connection, code, ...) {
sink(connection, ...)
on.exit(sink())
code
}
with_sink("temp.txt", print("Hello"))
readLines("temp.txt")
file.remove("temp.txt")
```
With this in hand, we just need to add a little extra wrapping to our `capture.output2()` to write to a temporary file, read from it and clean up after ourselves:
```{r}
capture.output2 <- function(code) {
temp <- tempfile()
on.exit(file.remove(temp))
with_sink(temp, force(code))
readLines(temp)
}
capture.output2(cat("a", "b", "c", sep = "\n"))
```
The real `capture.output()` is a bit more complicated: it uses a local `textConnection` to capture the data sent to sink, and it allows you to supply multiple expressions which are evaluated in turn. Using `with_sink()` this looks like `capture.output3()`
```{r}
capture.output3 <- function(..., env = parent.frame()) {
txtcon <- textConnection("rval", "w", local = TRUE)
with_sink(txtcon, {
args <- dots(...)
for(i in seq_along(args)) {
out <- withVisible(eval(args[[i]], env))
if (out$visible) print(out$value)
}
})
rval
}
```
You might want to compare this function to the real `capture.output()` and think about the simplifications I've made. Is the code easier to understand or harder? Have I removed important functionality?
If you want to capture more types of output (like messages and warnings), you may find the `evaluate` package helpful. It powers `knitr`, and does its best to ensure high fidelity between its output and what you'd see if you copied and pasted the code at the console.
### Evaluating code in a new environment
In the process of performing a data analysis, you may create variables that are necessarily because they help break a complicated sequence of steps down in to easily digestible chunks, but are not needed afterwards. For example, in the following example, we might only want to keep the value of x:
```{r}
a <- 10
b <- 30
x <- a + b
```
It's useful to be able to store only the final result, preventing the intermediate results from cluttering your workspace. We already know one way of doing this, using a function:
```{r}
x <- (function() {
a <- 10
b <- 30
a + b
})()
```
(In JavaScript this is called the immediately invoked function expression (IIFE), and is used extensively in modern JavaScript to encapsulate different JavaScript libraries)
R provides another tool that's a little less verbose, the `local()` function:
```{r}
x <- local({
a <- 10
b <- 30
a + b
})
```
The idea of local is to create a new environment (inheriting from the current environment) and run the code in that. The essence of `local()` is captured in this code:
```{r}
local2 <- function(expr) {
envir <- new.env(parent = parent.frame())
eval(substitute(expr), envir)
}
```
The real `local()` code is considerably more complicated because it adds a second environment parameter. I don't think this is necessary because if you have an explicit environment parameter, then you can already evaluate code in that environment with `evalq()`. The original code is also hard to understand because it is very concise and uses some sutble features of evaluation (including non-standard evaluation of both arguments). If you have read [metaprogramming](#metaprogramming), you might be able to puzzle it out, but to make it a bit easier I have rewritten it in a simpler style below.
```{r}
local2 <- function(expr, envir = new.env()) {
env <- parent.frame()
call <- substitute(eval(quote(expr), envir))
eval(call, env)
}
a <- 100
local2({
b <- a + sample(10, 1)
my_get <<- function() b
})
my_get()
```
You might wonder we can't simplify to this:
```{r}
local3 <- function(expr, envir = new.env()) {
eval(substitute(expr), envir)
}
```
But it's because of how the arguments are evaluated - default arguments are evalauted in the scope of the function so that `local(x)` would not be the same as `local(x, new.env())` without special effort.
### Exercises
* Compare `capture.output()` to `capture.output2()` and `local()` to `local2()`. How do the functions differ? What features have I removed to make the key ideas easier to see? How have I rewritten the key ideas to be easier to understand?
* How does the `chdir` parameter of `source()` compare to `in_dir()`? Why might you prefer one approach to the other?
## Anaphoric functions
Another variant along these lines is an "[anaphoric](http://en.wikipedia.org/wiki/Anaphora_(linguistics)) function", or a function that uses a pronoun. This is easiest to understand with an example using an interesting anaphoric function in base R: `curve()`.`curve()` draws a plot of the specified function, but interestingly you don't need to use a function, you just supply an expression that uses `x`:
```{r curve-demo}
curve(x ^ 2)
curve(sin(x), to = 3 * pi)
curve(sin(exp(4 * x)), n = 1000)
```
Here `x` plays a role like a pronoun in an English sentence: it doesn't represent a single concrete value, but instead is a place holder that varies over the range of the plot. Note that it doesn't matter what the value of `x` outside of `curve()` is: the expression is evaluated in a special environment where `x` has a special meaning:
```{r curve}
x <- 1
curve(sin(exp(4 * x)), n = 1000)
```
`curve()` works by evaluating the expression in a special environment in which the appropriate `x` exists. The essence of `curve()`, omitting many useful but incidental details like plot labelling, looks like this:
```{r curve2}
curve2 <- function(expr, xlim = c(0, 1), n = 100, env = parent.frame()) {
env2 <- new.env(parent = env)
env2$x <- seq(xlim[1], xlim[2], length = n)
y <- eval(substitute(expr), env2)
plot(env2$x, y, type = "l", ylab = deparse(substitute(expr)))
}
curve2(sin(exp(4 * x)), n = 1000)
```
Creating a new environment containing the pronoun is the key technique for implementing anaphoric functions.
Another way to solve the problem would be to turn the expression into a function using `make_function()`:
```{r curve3}
curve3 <- function(expr, xlim = c(0, 1), n = 100, env = parent.frame()) {
f <- pryr::make_function(alist(x = ), substitute(expr), env)
x <- seq(xlim[1], xlim[2], length = n)
y <- f(x)
plot(x, y, type = "l", ylab = deparse(substitute(expr)))
}
curve3(sin(exp(4 * x)), n = 1000)
```
The approaches take about as much code, and require knowledge of the same number of fundamental R concepts. I would have a slight preference for the second because it would be easier to reuse the part of the `curve3()` that turns an expression into a function. All anaphoric functions need careful documentation so that the user knows that some variable will have special properties inside the anaphoric function and must otherwise be avoided.
If you're interesting in learning more, there are some good resources for anaphoric functions in [Arc](http://www.arcfn.com/doc/anaphoric.html) (a list like language), [Perl](http://www.perlmonks.org/index.pl?node_id=666047) and [Clojure](http://amalloy.hubpages.com/hub/Unhygenic-anaphoric-Clojure-macros-for-fun-and-profit)
### With connection
We could use this idea to create a function that allows you to use a connection, automatically openning and closing it if needed. This is a common technique in Ruby, which uses its block syntax whenever you deal with resources that need to be closed after you're done with them. We can't use the same implementation as the with functions above because an implementation like that below gives no way for the code to refer to the connection:
```{r}
with_conn <- function(conn, code) {
open(conn)
on.exit(close(conn))
force(code)
}
```
We can work around this problem by using an anaphoric function, referring to the connection as `.it`. We'd need to clearly document that convention so the user knew what to call it.
```{r}
with_conn <- function(conn, code, env = parent.frame()) {
if (!isOpen(conn)) {
open(conn)
on.exit(close(conn))
}
env2 <- new.env(parent = env)
env2$.it <- conn
eval(substitute(code), env2)
}
# Create a new file, and then read it in in pieces of length 6
with_conn(file("test.txt", "w+"), {
writeLines("This is a test", .it)
x <- readChar(.it, 6)
while(length(x) > 0) {
print(x)
x <- readChar(.it, 6)
}
})
file.remove("test.txt")
```
The construction of this function also has the appealing side-effect that the scope of the connection is clearly shown with indenting.
## Special environments for run-time checking
We can take the idea of special evaluation contexts and use the idea to implement run-time checks, by executing code in a special environment that warns or errors when the user does something that we think is a bad idea.
* Checking for logical abbreviations
### Logical abbreviations
One of the checks that `R CMD check` runs ensures that you don't use the logical abbreviations `T` and `F`. It's a useful check, but a bit frustrating to use because it terminates on the first `F` or `T` it finds, so you need to run it many times, but `R CMD check` is slow. Instead, we can take the idea that underlies the check and run it ourselves, which makes it faster to iterate.
The basic idea is to use an active binding so that whenever code tries to access `F` or `T` it throws an error. This is a very similar technique to anaphoric functions:
```{r, error = TRUE}
check_logical_abbr <- function(code, env = parent.frame()) {
new_env <- new.env(parent = env)
delayedAssign("T", stop("Use TRUE not T"), assign.env = new_env)
delayedAssign("F", stop("Use FALSE not F"), assign.env = new_env)
eval(substitute(code), new_env)
}
check_logical_abbr(c(FALSE, T, FALSE))
```
Note that functions look in the environment in which they were defined so to test large bodies of code, you'll need to run `check_logical_abbr()` as far as out as possible:
```{r, error = TRUE}
f <- function(x) {
mean(x, na.rm = T)
}
check_logical_abbr(f(1:10))
check_logical_abbr({
f <- function(x) {
mean(x, na.rm = T)
}
f(1:10)
})
```
This will be typically easiest to do as a wrapper around the code you use to load your code into R:
```{r, eval = FALSE}
check_logical_abbr(source("my-file.r"))
check_logical_abbr(load_all())
```
### Test that
Manage global state accessible from functions. Use special environment.
Create with function that uses `on.exit()` to push and pop from stack.
```{r}
testthat_env <- new.env()
set_reporter <<- function(value) {
old <- testthat_env$reporter
testthat_env$reporter <- value
old
}
get_reporter <<- function() {
testthat_env$reporter
}
```
```{r}
with_reporter <- function(reporter, code) {
reporter <- find_reporter(reporter)
old <- set_reporter(reporter)
on.exit(set_reporter(old))
reporter$start_reporter()
res <- force(code)
reporter$end_reporter()
res
}
```