forked from STAT545-UBC/STAT545-UBC-original-website
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathautomation04_make-activity.html
451 lines (397 loc) · 26.6 KB
/
automation04_make-activity.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="generator" content="pandoc" />
<meta name="author" content="Shaun Jackman and Jenny Bryan" />
<meta name="date" content="2014-11-03" />
<title>Automating Data-analysis Pipelines</title>
<script src="libs/jquery-1.11.3/jquery.min.js"></script>
<meta name="viewport" content="width=device-width, initial-scale=1" />
<link href="libs/bootstrap-3.3.5/css/bootstrap.min.css" rel="stylesheet" />
<script src="libs/bootstrap-3.3.5/js/bootstrap.min.js"></script>
<script src="libs/bootstrap-3.3.5/shim/html5shiv.min.js"></script>
<script src="libs/bootstrap-3.3.5/shim/respond.min.js"></script>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-68219208-1', 'auto');
ga('send', 'pageview');
</script>
<style type="text/css">code{white-space: pre;}</style>
<link rel="stylesheet"
href="libs/highlight/default.css"
type="text/css" />
<script src="libs/highlight/highlight.js"></script>
<style type="text/css">
pre:not([class]) {
background-color: white;
}
</style>
<script type="text/javascript">
if (window.hljs && document.readyState && document.readyState === "complete") {
window.setTimeout(function() {
hljs.initHighlighting();
}, 0);
}
</script>
<style type="text/css">
h1 {
font-size: 34px;
}
h1.title {
font-size: 38px;
}
h2 {
font-size: 30px;
}
h3 {
font-size: 24px;
}
h4 {
font-size: 18px;
}
h5 {
font-size: 16px;
}
h6 {
font-size: 12px;
}
.table th:not([align]) {
text-align: left;
}
</style>
<link rel="stylesheet" href="libs/local/main.css" type="text/css" />
<link rel="stylesheet" href="libs/local/nav.css" type="text/css" />
<link rel="stylesheet" href="//netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" type="text/css" />
</head>
<body>
<style type = "text/css">
.main-container {
max-width: 940px;
margin-left: auto;
margin-right: auto;
}
code {
color: inherit;
background-color: rgba(0, 0, 0, 0.04);
}
img {
max-width:100%;
height: auto;
}
.tabbed-pane {
padding-top: 12px;
}
button.code-folding-btn:focus {
outline: none;
}
</style>
<div class="container-fluid main-container">
<!-- tabsets -->
<script src="libs/navigation-1.1/tabsets.js"></script>
<script>
$(document).ready(function () {
window.buildTabsets("TOC");
});
</script>
<!-- code folding -->
<header>
<div class="nav">
<a class="nav-logo" href="index.html">
<img src="static/img/stat545-logo-s.png" width="70px" height="70px"/>
</a>
<ul>
<li class="home"><a href="index.html">Home</a></li>
<li class="faq"><a href="faq.html">FAQ</a></li>
<li class="syllabus"><a href="syllabus.html">Syllabus</a></li>
<li class="topics"><a href="topics.html">Topics</a></li>
<li class="people"><a href="people.html">People</a></li>
</ul>
</div>
</header>
<div class="fluid-row" id="header">
<h1 class="title toc-ignore">Automating Data-analysis Pipelines</h1>
<h4 class="author"><em>Shaun Jackman and Jenny Bryan</em></h4>
<h4 class="date"><em>2014-11-03</em></h4>
</div>
<div id="TOC">
<ul>
<li><a href="#dependency-graph-of-the-pipeline">Dependency graph of the pipeline</a></li>
<li><a href="#set-up-a-new-rstudio-project-and-git-repo">Set up a new RStudio Project (and Git repo)</a></li>
<li><a href="#sample-project-and-git-repository">Sample Project and Git repository</a></li>
<li><a href="#create-the-makefile">Create the Makefile</a></li>
<li><a href="#get-the-dictionary-of-words">Get the dictionary of words</a></li>
<li><a href="#create-rules-for-all-and-clean">Create rules for all and clean</a></li>
<li><a href="#create-a-table-of-word-lengths">Create a table of word lengths</a></li>
<li><a href="#update-rules-for-all-and-clean">Update rules for all and clean</a></li>
<li><a href="#plot-a-histogram-of-word-lengths-update-all-and-clean">Plot a histogram of word lengths, update all and clean</a></li>
<li><a href="#use-make-to-deal-with-an-annoyance">Use <code>make</code> to deal with an annoyance</a></li>
<li><a href="#render-an-html-report">Render an HTML report</a></li>
<li><a href="#the-final-makefile">The Final Makefile</a></li>
<li><a href="#appendix">Appendix</a></li>
</ul>
</div>
<p>The goal of this activity is to create a pipeline that will</p>
<ul>
<li>obtain a large file of English words</li>
<li>calculate a histogram of word lengths</li>
<li>determine the most common word length</li>
<li>generate a figure of this histogram</li>
<li>render a R Markdown report in HTML and PDF</li>
</ul>
<p>You will automate this pipeline using <code>make</code>!</p>
<p>back to <a href="automation00_index.html">All the automation things</a></p>
<div id="dependency-graph-of-the-pipeline" class="section level3">
<h3>Dependency graph of the pipeline</h3>
<!-- TO DO: remake the figure to say words.txt not words.tsv and the new words downloading strategy -->
<p><a href="automation01_slides/images/activity.gv"><img src="automation01_slides/images/activity.png" alt="automation01_slides/images/activity.png" /></a></p>
</div>
<div id="set-up-a-new-rstudio-project-and-git-repo" class="section level3">
<h3>Set up a new RStudio Project (and Git repo)</h3>
<p>In RStudio: <em>File > New Project > New Directory > Empty Project.</em> If you’re a Git user, we strongly encourage you to click on “Create a git repository.”</p>
<p>This project will be useful as a reference in the future, so give it an informative name and location. If you’re a GitHub user, you may want to push it there as well.</p>
<p>Git(Hub) users: from here on out, we assume you will be committing at regular intervals. At key points, we explicitly prompt you to commit.</p>
<p>Git folks: commit now.</p>
</div>
<div id="sample-project-and-git-repository" class="section level3">
<h3>Sample Project and Git repository</h3>
<p>We walked through this activity ourselves and <a href="https://github.com/STAT545-UBC/make-activity">this Git repo</a> reflects how our Project evolved.</p>
<p>The Project is set up for use with <code>make</code> at <a href="https://github.com/STAT545-UBC/make-activity/tree/5d282f87ec3fd46d13b500be51a74c9df146d283">this commit</a>.</p>
</div>
<div id="create-the-makefile" class="section level3">
<h3>Create the Makefile</h3>
<p>In RStudio: <em>File > New File > Text File.</em> Save it with the name <code>Makefile</code>. Keep adding the rules we write below to this file, saving regularly.</p>
<p>Once you’ve saved the file with the name <code>Makefile</code>, RStudio should indent with tabs instead of spaces. I recommend you display whitespace in order to visually confirm this: <em>RStudio > Preferences > Code > Display > Display whitespace characters</em>. A more extreme measure is to set Project or Global preferences to NOT replace tabs with spaces, but this will wreak havoc elsewhere.</p>
<p>You also want RStudio to recognize the presence of the <code>Makefile</code>. Pick one:</p>
<ul>
<li>set Project Build Tools to <code>Makefile</code></li>
<li>quit and relaunch</li>
</ul>
<p>You should see a “Build” tab now in the same pane as “Environment”, “History”, and, if applicable, “Git”.</p>
<p>Git folks: commit now.</p>
</div>
<div id="get-the-dictionary-of-words" class="section level3">
<h3>Get the dictionary of words</h3>
<p>Depending on your OS and mood, you can get the file of English words by copying a local file or downloading from the internet.</p>
<div id="download-the-dictionary" class="section level4">
<h4>Download the dictionary</h4>
<p>Our first <code>Makefile</code> rule will download the dictionary <code>words.txt</code>. The command of this rule is a one-line R script, so instead of putting the R script in a separate file, we’ll include the command directly in the <code>Makefile</code>, since it’s so short. <em>Sure, we could download a file without using R at all but humor us: this is a tutorial about <code>make</code> and R!</em></p>
<pre class="makefile"><code>words.txt:
Rscript -e 'download.file("https://svnweb.freebsd.org/base/head/share/dict/web2?view=co", destfile = "words.txt", quiet = TRUE)'</code></pre>
<p>Suggested workflow:</p>
<ul>
<li>Git folks: commit anything new/modified. Start with a clean working tree.</li>
<li>Submit the above <code>download.file()</code> command in the R Console to make sure it works.</li>
<li>Inspect the downloaded words file any way you know how; make sure it’s not garbage. Size should be about 2.4MB.</li>
<li>Delete <code>words.txt</code>.</li>
<li>Put the above rule into your <code>Makefile</code>. From the <a href="git09_shell.html">shell</a>, enter <code>make words.txt</code> to verify rule works. Reinspect the words file.</li>
<li>Git folks: commit <code>Makefile</code> and <code>words.txt</code>.</li>
</ul>
<p>See the sample Project at this point in <a href="https://github.com/STAT545-UBC/make-activity/tree/c30ecc9c890a2f2261eb94118997f0774012eeb8">this commit</a>.</p>
</div>
<div id="copy-the-dictionary" class="section level4">
<h4>Copy the dictionary</h4>
<p>On Mac or Linux systems, rather than download the dictionary, we can simply copy the file <code>/usr/share/dict/words</code> that comes with the operating system. In this alternative rule, we use the <a href="git09_shell.html">shell</a> command <code>cp</code> to copy the file.</p>
<pre class="makefile"><code>words.txt: /usr/share/dict/words
cp /usr/share/dict/words words.txt</code></pre>
<p>This rule copies the input file <code>/usr/share/dict/words</code> to create the output file <code>words.txt</code>. We then repeat these file names in the command rule, which is redundant and leaves us vulnerable to typos. <code>make</code> offers many automatic variables, so the revised rule below uses <code>$<</code> and <code>$@</code> to represent the input file and output file, respectively.</p>
<pre class="makefile"><code>words.txt: /usr/share/dict/words
cp $< $@</code></pre>
<p>Suggested workflow:</p>
<ul>
<li>Git folks: commit anything new/modified. Start with a clean working tree.</li>
<li>Remove <code>words.txt</code> if you succeeded with the download approach.</li>
<li>Submit the above <code>cp</code> command in the <a href="git09_shell.html">shell</a> to make sure it works.</li>
<li>Inspect the copied words file any way you know how; make sure it’s not garbage. Size should be about 2.4MB.</li>
<li>Delete <code>words.txt</code>.</li>
<li>Put the above rule into your <code>Makefile</code>. From the <a href="git09_shell.html">shell</a>, enter <code>make words.txt</code> to verify rule works. Reinspect the words file.</li>
<li>Git folks: look at the diff. You should see how your <code>words.txt</code> rule has changed and you might also see some differences between the local and remote words files. Interesting! Commit <code>Makefile</code> and <code>words.txt</code>.</li>
</ul>
<p>See the sample Project at this point in <a href="https://github.com/STAT545-UBC/make-activity/tree/1131791548e0c5bbc5104eebb19710ed435146e3">this commit</a>.</p>
</div>
</div>
<div id="create-rules-for-all-and-clean" class="section level3">
<h3>Create rules for all and clean</h3>
<p>It would be nice to execute our <code>make</code> rules from RStudio. So it’s urgent that we create phony targets <code>all</code> and <code>clean</code>, which are the only targets accessible from RStudio. These targets are phony in the sense that they do not specify an actual file to be made, rather they just make it easy to trigger a certain action. <code>all</code> and <code>clean</code> are phony targets that appear in most <code>Makefiles</code>, which is why RStudio makes it easy to access them from the IDE.</p>
<p>Edit your <code>Makefile</code> to look like this (where your <code>words.txt</code> rule can be the copy or download version):</p>
<pre class="makefile"><code>all: words.txt
clean:
rm -f words.txt
words.txt: /usr/share/dict/words
cp /usr/share/dict/words words.txt</code></pre>
<p>Since our only output so far is <code>words.txt</code>, that’s what we associate with the <code>all</code> target. Likewise, the only product we can re-make so far is <code>words.txt</code>, so it’s the only thing we delete via <code>clean</code>.</p>
<p>Suggested workflow:</p>
<ul>
<li>Use <code>make clean</code> from the shell and/or <em>RStudio > Build > More > Clean All</em> to delete <code>words.txt</code>.
<ul>
<li>Does it go away?</li>
<li>Git folks: does the deletion of this file show up in your Git tab?</li>
</ul></li>
<li>Use <code>make all</code> from the shell and/or <em>RStudio > Build > Build All</em> to get <code>words.txt</code> back.
<ul>
<li>Does it come back?</li>
<li>Git folks: does the restoration of <code>words.txt</code> cause it to drop off your radar as a changed/deleted file? See how this stuff all works together?</li>
</ul></li>
<li>Git folks: Commit.</li>
</ul>
<p>See the sample Project at this point in <a href="https://github.com/STAT545-UBC/make-activity/tree/9e1a556adc602ffce91b5c8edccd223237080c54">this commit</a>.</p>
</div>
<div id="create-a-table-of-word-lengths" class="section level3">
<h3>Create a table of word lengths</h3>
<p>This rule will read the list of words and generate a table of word length frequency, stored in a tab-separated-values (TSV) file. This R script is a little longer, so we’ll put it in its own file, named <code>histogram.r</code>. If either the script <code>histogram.r</code> or the data file <code>words.txt</code> were to change, we’d need to rerun this command to get up-to-date results, so both files are dependencies of this rule. The input-file variable <code>$<</code> refers to the <em>first</em> dependency, <code>histogram.r</code>.</p>
<pre class="makefile"><code>histogram.tsv: histogram.r words.txt
Rscript $<</code></pre>
<p>FYI: <code>Rscript</code> allows you to execute R scripts from the <a href="git09_shell.html">shell</a>. It is a more modern replacement for <code>R CMD BATCH</code> (don’t worry if you’ve never heard of that).</p>
<p>Create the R script <code>histogram.r</code> that reads the list of words from <code>words.txt</code> and writes the table of word length frequency to <code>histogram.tsv</code>. It should be a tab-delimited TSV file with a header and two columns, named <code>Length</code> and <code>Freq</code>. Hint: you can accomplish this task using four functions: <code>readLines</code>, <code>nchar</code>, <code>table</code> and <code>write.table</code>. Here’s <a href="https://raw.githubusercontent.com/STAT545-UBC/STAT545-UBC.github.io/master/automation10_holding-area/activity/histogram.r">one solution</a>, but try not to peek until you’ve attempted this task yourself.</p>
<p>Suggested workflow:</p>
<ul>
<li>Develop your <code>histogram.r</code> script interactively. Make sure it works when you step through it line-by-line. Debugging only gets harder once you’re running entire scripts at arm’s length via <code>make</code>!</li>
<li>Remove <code>histogram.tsv</code>. Clean out the workspace and restart R. Run <code>histogram.r</code> via <code>source()</code> or using RStudio’s Source button. Make sure it works!</li>
<li>Add the <code>histogram.tsv</code> rule to your <code>Makefile</code>.</li>
<li>Remove <code>histogram.tsv</code> and regenerate it via <code>make histogram.tsv</code> from the <a href="git09_shell.html">shell</a>.</li>
<li>Git folks: Commit.</li>
</ul>
<p>See the sample Project at this point in <a href="https://github.com/STAT545-UBC/make-activity/tree/889e01a3d610e900c7e58ebd32a0506c61543fd9">this commit</a>.</p>
</div>
<div id="update-rules-for-all-and-clean" class="section level3">
<h3>Update rules for all and clean</h3>
<p>The new output <code>histogram.tsv</code> can replace <code>words.txt</code> as our most definitive output. So it will go in the <code>all</code> rule. Likewise, we should add <code>histogram.tsv</code> to the <code>clean</code> rule. Edit your <code>all</code> and <code>clean</code> rules to look like this:</p>
<pre class="makefile"><code>all: histogram.tsv
clean:
rm -f words.txt histogram.tsv</code></pre>
<p>Suggested workflow:</p>
<ul>
<li>Use <code>make clean</code> from the shell and/or <em>RStudio > Build > More > Clean All</em>.
<ul>
<li>Do <code>words.txt</code> and <code>histogram.tsv</code> go away?</li>
<li>Git folks: does the deletion of these files show up in your Git tab?</li>
</ul></li>
<li>Use <code>make all</code> from the shell and/or <em>RStudio > Build > Build All</em> to get <code>words.txt</code> back.
<ul>
<li>Does it come back?</li>
<li>Git folks: does the restoration of the files cause them to drop off your radar as changed/deleted files?</li>
</ul></li>
<li>Git folks: Commit.</li>
</ul>
<p>See the sample Project at this point in <a href="https://github.com/STAT545-UBC/make-activity/tree/4f392d0e20bb7e4bfcdc00a812190e40e27ae3d4">this commit</a>.</p>
</div>
<div id="plot-a-histogram-of-word-lengths-update-all-and-clean" class="section level3">
<h3>Plot a histogram of word lengths, update all and clean</h3>
<p>This rule will read the table of word lengths and plot a histogram using <code>ggplot2::qplot()</code>. The R snippet is three lines long, but we’ll still include the script in the <code>Makefile</code> directly, and use semicolons <code>;</code> to separate the R commands. The variable <code>$@</code> refers to the output file, <code>histogram.png</code>.</p>
<pre class="makefile"><code>histogram.png: histogram.tsv
Rscript -e 'library(ggplot2); qplot(Length, Freq, data=read.delim("$<")); ggsave("$@")'</code></pre>
<p>Suggested workflow:</p>
<ul>
<li>Test the histogram-drawing code in the R Console to make sure it works.</li>
<li>Inspect the resulting PNG to make sure it’s good.</li>
<li>Clean up after yourself.</li>
<li>Add the above rule to your <code>Makefile</code>.</li>
<li>Test that new rule works.</li>
<li>If you get an unexpected empty plot <code>Rplots.pdf</code>, don’t worry about it yet.</li>
<li>Update the <code>all</code> and <code>clean</code> targets in light of this addition to the pipeline.</li>
<li>Test the new definitions of <code>all</code> and <code>clean</code>.</li>
<li>Git folks: commit.</li>
</ul>
<p><em>NOTE: Why are we writing this PNG to file when, by the end of the activity, we are writing an R Markdown report? We could include this figure-making code in an R chunk there. We’re doing it this way to demonstrate more about R and <code>make</code> workflows. Plus sometimes we do work this way in real life, if a figure has a life outside one specific R Markdown report.</em></p>
<p>See the sample Project at this point in <a href="https://github.com/STAT545-UBC/make-activity/tree/221c2d66fe9fd7a359835492db6557e258178780">this commit</a>.</p>
</div>
<div id="use-make-to-deal-with-an-annoyance" class="section level3">
<h3>Use <code>make</code> to deal with an annoyance</h3>
<p>The code used above to create <code>histogram.png</code> usually leaves an empty <code>Rplots.pdf</code> file behind. You can read <a href="http://stackoverflow.com/questions/17348359/how-to-stop-r-from-creating-empty-rplots-pdf-file-when-using-ggsave-and-rscript">this thread on stackoverflow</a> if you’d like to know more.</p>
<p>We’ll just use this as a teachable moment to demonstrate how handy an automated pipeline is for dealing with such annoyances and to show a multi-line <code>make</code> rule.</p>
<p>Update the <code>histogram.png</code> rule like so:</p>
<pre class="makefile"><code>histogram.png: histogram.tsv
Rscript -e 'library(ggplot2); qplot(Length, Freq, data=read.delim("$<")); ggsave("$@")'
rm Rplots.pdf</code></pre>
<p>Suggested workflow:</p>
<ul>
<li>Remove <code>Rplots.pdf</code> manually</li>
<li>Add the <code>rm Rplots.pdf</code> command to the <code>histogram.png</code> rule.</li>
<li>Test that new rule works.</li>
<li>Test that behavior of <code>all</code> and <code>clean</code> still good.</li>
<li>Git folks: commit.</li>
</ul>
<p>See the sample Project at this point in <a href="https://github.com/STAT545-UBC/make-activity/tree/3b75dac0d0cd8dd7e7cd3c2e66799a65d90b9fff">this commit</a>.</p>
</div>
<div id="render-an-html-report" class="section level3">
<h3>Render an HTML report</h3>
<p>Finally, we’ll use <code>rmarkdown::render()</code> to generate an HTML report. If we think narrowly, we might say that the HTML report depends only on its Markdown predecessor, which would lead to a rule like this:</p>
<pre class="makefile"><code>report.html: report.md
Rscript -e 'rmarkdown::render("$<")'</code></pre>
<p>But we really shouldn’t hard-wire statements about word length in Markdown; we should use inline R code to compute that from the word length table. Similarly, if the plotted histogram were to change, we’d need to remake the HTML report. Here is a better rule that captures all of these dependencies:</p>
<pre class="makefile"><code>report.html: report.rmd histogram.tsv histogram.png
Rscript -e 'rmarkdown::render("$<")'</code></pre>
<p>Create the R Markdown file <code>report.rmd</code> that reads the table of word lengths <code>histogram.tsv</code>, reports the most common word length, and displays the pre-made histogram <code>histogram.png</code>. Here’s <a href="https://raw.githubusercontent.com/STAT545-UBC/STAT545-UBC.github.io/master/automation10_holding-area/activity/report.rmd">one solution</a>, but try not to peek until you’ve attempted this task yourself.</p>
<p>Suggested workflow:</p>
<ul>
<li>Develop <code>report.rmd</code>, running the R chunks often, from a clean workspace and fresh R session. Debugging only gets harder once you’re rendering entire reports at arm’s length via <code>make</code>!</li>
<li>Render the report using <code>rmarkdown::render()</code> in the Console or RStudio’s Preview HTML button.</li>
<li>Clean up after yourself.</li>
<li>Add the above rule for <code>report.html</code> to your <code>Makefile</code>.</li>
<li>Test that new rule works.</li>
<li>Update the <code>all</code> and <code>clean</code> targets in light of this addition to the pipeline.</li>
<li>Test the new definitions of <code>all</code> and <code>clean</code>.</li>
<li>Git folks: commit.</li>
</ul>
<p>See the sample Project at this point in <a href="https://github.com/STAT545-UBC/make-activity/tree/91ebcfc7d25743ebd8d6c9684ed7923ad4758585">this commit</a>.</p>
<!-- TO DO: this PDF bit awaiting attention from Shaun -->
<!--
Render a PDF report
================================================================================
Can you modify the `rmarkdown::render` command to generate a PDF report instead of an HTML report? Hint: look at the optional second argument of `rmarkdown::render`. Alternatively, click the *Knit* dropdown box and select *Knit PDF*, and look at how RStudio modifies the header of your R Markdown script.
-->
</div>
<div id="the-final-makefile" class="section level3">
<h3>The Final Makefile</h3>
<p>At this point, your <code>Makefile</code> should look something like this:</p>
<pre class="makefile"><code>all: report.html
clean:
rm -f words.txt histogram.tsv histogram.png report.md report.html
words.txt: /usr/share/dict/words
cp /usr/share/dict/words words.txt
histogram.tsv: histogram.r words.txt
Rscript $<
histogram.png: histogram.tsv
Rscript -e 'library(ggplot2); qplot(Length, Freq, data=read.delim("$<")); ggsave("$@")'
rm Rplots.pdf
report.html: report.rmd histogram.tsv histogram.png
Rscript -e 'rmarkdown::render("$<")'</code></pre>
<p>Remember, you can review the entire activity via the commit history of the sample Project: <a href="https://github.com/STAT545-UBC/make-activity" class="uri">https://github.com/STAT545-UBC/make-activity</a>.</p>
<p>And that’s how a data analytical pipeline gets built using <code>make</code>, the shell, R, RStudio, and optionally Git.</p>
</div>
<div id="appendix" class="section level3">
<h3>Appendix</h3>
<p>Here are <a href="https://www.gnu.org/software/make/manual/html_node/Special-Targets.html">some additions</a> you might like to include in your <code>Makefile</code>:</p>
<pre class="sh"><code>.PHONY: all clean
.DELETE_ON_ERROR:
.SECONDARY:</code></pre>
<p>The <code>.PHONY</code> line is where you declare which targets are <em>phony</em>, i.e. are not actual files to be made in the literal sense. It’s a good idea to explicitly tell <code>make</code> which targets are phony, instead of letting it try to deduce this. <code>make</code> can get confused if you create a file that has the same name as a phony target. If for example you create a directory named <code>clean</code> to hold your clean data and run <code>make clean</code>, then <code>make</code> will report <code>'clean' is up to date</code>, because a directory with that name already exists.</p>
<p><code>.DELETE_ON_ERROR</code> causes <code>make</code> to “delete the target of a rule if it has changed and its recipe exits with a nonzero exit status”. In English, this means that – if a rule starts to run but then exits due to error – any outputs written in the course of that fiasco will be deleted. This can protect you from having half-baked, erroneous files lying around that will just confuse you later.</p>
<p><code>.SECONDARY</code> tells <code>make</code> not to delete intermediate files of a chain of pattern rules. Consider creating a <code>Makefile</code> with two pattern rules, <code>%.md: %.rmd</code> and <code>%.html: %.md</code>, and then running <code>make report.html</code>. After <code>make</code> has created <code>report.md</code> and <code>report.html</code>, it will delete the intermediate file <code>report.md</code>. Adding <code>.SECONDARY</code> to your <code>Makefile</code> prevents the intermediate file from being deleted.</p>
<p>back to <a href="automation00_index.html">All the automation things</a></p>
</div>
<div class="footer">
This work is licensed under the <a href="http://creativecommons.org/licenses/by-nc/3.0/">CC BY-NC 3.0 Creative Commons License</a>.
</div>
</div>
<script>
// add bootstrap table styles to pandoc tables
$(document).ready(function () {
$('tr.header').parent('thead').parent('table').addClass('table table-condensed');
});
</script>
<!-- dynamically load mathjax for compatibility with self-contained -->
<script>
(function () {
var script = document.createElement("script");
script.type = "text/javascript";
script.src = "https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
document.getElementsByTagName("head")[0].appendChild(script);
})();
</script>
</body>
</html>