-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathblog-post-workflow.Rmd
516 lines (343 loc) · 23.2 KB
/
blog-post-workflow.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
---
title: "Blog post workflow"
description: |
How to create or review a PMA Data Hub blog post
author:
- name: Matt Gunther
date: 02-02-2021
output:
distill::distill_article:
self_contained: false
toc: true
toc_depth: 3
toc_float: true
---
# All users: first time setup
## R, RStudio, and required packages
To get a copy of R, visit the [Comprehensive R Archive Network](https://cloud.r-project.org/) (CRAN) and choose the right download link for your operating system.
The PMA Data Hub is organized as an [RStudio Project](https://support.rstudio.com/hc/en-us/articles/200526207-Using-Projects), so you'll also need to use [RStudio](https://rstudio.com/products/rstudio/download/) (not base R).
*Note: a copy of RStudio running R version 4.0.2 (or higher) lives on the MPC gp1 server* [here](https://rstudio.pop.umn.edu/). Members of the MPC GitHub organization can access an article specifically about using RStudio on gp1 (e.g. how to build a package library) [here](https://pages.github.umn.edu/mpc/ipumsPMA/articles/rstudio_gp1.html).
When you've got RStudio set up, install these packages:
```{r, eval = F, echo = T}
install.packages("rmarkdown")
install.packages("knitr")
install.packages("distill")
install.packages("usethis")
```
**Troubleshooting note:** the package `rmarkdown` comes along with RStudio, but you may receive an older version than we need to build the site. So, when you try `install.packages("rmarkdown")`, you may get a message asking to restart R (avoiding a conflict with the prior version). *We've found that it's best to reply 'No' to the restart prompt, wait for rmarkdown to install, and then re-launch RStudio.* If you do choose to restart, you may experience a recursive loop of restart prompts!
## Initialize your UMN GitHub account
Contributors to the PMA Data Hub will work on an internal copy of the public site - it's visible only to certain people affiliated with the MPC. A smaller team of Data Hub "admin" (currently Matt & postdocs) will take care of migrating content from the private site to our public site: we're always here to help with formatting, editing, and version control!
**So what is UMN GitHub?** GitHub, itself, is a company that hosts projects on proprietary server: when you make a repository "public", anyone in the world can visit your project on a GitHub server. GitHub also makes its underlying software available to institutions that want to provide a similar service *restricted to institutional members*. In practice, UMN operates its own GitHub server where organizations like the MPC can host projects that are more private in scope.
UMN GitHub is an instance of [Enterprise GitHub](https://docs.github.com/en/github/getting-started-with-github/githubs-products#github-enterprise), whereas the public version of our blog lives in a space that folks sometimes call "public GitHub". It's common for people to have one account for "public GitHub" and one account for their job associated with an "enterprise GitHub". **To initialize an account for UMN GitHub, visit [github.umn.edu](https://github.umn.edu) and log in with your University Internet ID and password.**
## Using UMN GitHub from RStudio
First things first: you must install install [Git](https://git-scm.com/) on your computer if it isn't there already. Mac OS comes with git installed,^[You can check its location by running "which git" in Terminal, and "git --version" to check the installed version. If git is somehow not installed, use the "Install git using Homebrew" instructions [here](https://jennybc.github.io/2014-05-12-ubc/ubc-r/session03_git.html)] while other users should [download](http://git-scm.com/downloads) the right Git for their operation system. *If you're using RStudio on the gp1 server, Git is already installed.*
Next, open the `Global Options` menu in RStudio and locate the Git/SVN tab. Ensure that the box shown below is checked, and then enter the location of the executable file^[Mac users: type "which git" in terminal and enter the result; Windows users: look for git.exe (most likely in Program Files)] for your Git installation:
```{r}
knitr::include_graphics("images/bpw/git-menu.png")
```
Lastly, you should provide Git with a username, email, and Personal Access Token (PAT) for your UMN GitHub account. If you've installed [usethis](https://usethis.r-lib.org/articles/articles/git-credentials.html) as shown [above](#r-rstudio-and-required-packages), you'll be able to set these up with R commands (changes will be applied to globally wherever you use Git on your operating system). First, set the username and email address for your UMN GitHub account. For example, mine are:
<aside>
<b>Why a PAT?</b> GitHub plans to deprecate password authentication in the near future. You could use one for now (like the example below), but you'll need a PAT soon!
</aside>
```{r, eval=F, echo=T}
gert::git_config_global_set("user.name", "Matt Gunther")
gert::git_config_global_set("user.email", "[email protected]")
```
Then, create a PAT for your account with:
```{r, eval = F, echo = T}
usethis::create_github_token(host = "https://github.umn.edu")
```
Your browser will open to a webpage. **Check all the boxes you see**, then click the green `Generate Token` button. On the next page, notice the very long string shown in the green box: this is your PAT. *Don't close this page yet!*. Return to R and call:
```{r, eval=F, echo=T}
gitcreds::gitcreds_set("https://github.umn.edu")
```
You'll be asked to enter a new password or token: copy and paste your PAT from your browser and press Enter. From now on, RStudio and Git will be able to access your UMN GitHub account automatically! (If you have a personal GitHub account at github.com, you could repeat this process substituting `https://github.com` for `https://github.umn.edu`, and Git will automatically choose the right credentials based on the repository associated with your project).
## The PMA Data Hub Repository
Open RStudio and navigate to `File > New Project`, then select `Version Control`:
```{r}
knitr::include_graphics("images/bpw/version_control.png")
```
Choose `Git` to clone our project from a GitHub repository:
```{r}
knitr::include_graphics("images/bpw/use_git.png")
```
On the next menu page, enter the address for the enterprise repository exactly as shown (do not clone the public repository):
```
https://github.umn.edu/mpc/pma-data-hub/
```
Also enter the project directory name "pma-data-hub" as shown:
```
pma-data-hub
```
In the third field, choose a location where you would like to save this file on your computer (mine was "~/R" - **insert your own path, instead**). Finally, click `Create Project`.
<aside>
When choosing a place to save this project, <b>do not save to a network drive</b>. This seems to cause RStudio to crash!
</aside>
```{r}
knitr::include_graphics("images/bpw/clone_info.png")
```
If you have not configured Git to automatically use your UMN GitHub credentials with the [steps shown above](#using-umn-github-from-rstudio), you may be prompted to provide them in a pop-up window:
```{r}
knitr::include_graphics("images/bpw/username.png")
knitr::include_graphics("images/bpw/password.png")
```
<aside>
Until you configure Git with these credentials, you'll have to do this every time you interact with GitHub. Additionally, <b>password authentication for GitHub will be deprecated in the near future</b> so you'll need to do it soon!
</aside>
After a short bit, RStudio will relaunch and open the new project. If you adjust the windows to show the tabs `Git` (left) and `Files` (right), you should see something like this:
```{r}
knitr::include_graphics("images/bpw/clone_success.png")
```
You have now downloaded a copy of the Enterprise repository to your computer!
Moreover, because you've connected these files to a GitHub repository, the RStudio Project will now keep track of changes you make to the files in this folder, and it will prompt you to upload your changes back to GitHub: as you add, edit, or delete files, a list of changes will appear in the `Git` tab.
<aside>
To open an RStudio project, click on the file <b>pma-data-hub.Rproj</b>. If you ever forget, RStudio won't know to look for a Git history associated with all of the underlying files.
</aside>
Notice the word `master` shown in the `Git` tab - this shows that any changes we make to files will be recorded in our local copy of the "master" version of the repository. If we made changes here and then *pushed* them to GitHub, they would be reflected on the "master" version we've saved there, too.
In general, Matt and postdocs will be responsible for merging finished blog posts to the master branch and deploying its contents to the "live" blog that's seen by users (for details, see [site admin](#site-admin) instructions below). **All other contributors should create their own branch when writing a new blog post**; Matt or postdocs will merge them to "master" after they've been reviewed and approved by an editor. Read on!
# Authors: Creating a new post
## Create a new branch
Notice that the `Git` tab in RStudio has a purple icon:
```{r}
knitr::include_graphics("images/bpw/new_branch.png")
```
Click this icon to create a new branch. You can name it anything you like, but we recommend using your URL slug if possible (e.g. "blog-post-workflow" is the end of the URL for this webpage). Leave the box next to "Sync branch with remote" **checked**, as this will create your branch both locally and on our GitHub page:
```{r}
knitr::include_graphics("images/bpw/branch_name.png")
```
RStudio now displays the new branch in place of "master" to show that we're working on the new branch, instead!
## Create a new folder in "_posts"
Now that you've created a new branch in the `Git` window, take a look at the `File` window.
```{r}
knitr::include_graphics("images/bpw/file_window1.png")
```
The program we use to build the blog is called [Distill](https://rstudio.github.io/distill/), and it takes care of all the back-end work as long as we put every new blog post inside of a unique folder within the "_posts" directory. Opening "_posts", you can see that every post is contained within a time-stamped sub-folder:
```{r}
knitr::include_graphics("images/bpw/posts_list.png")
```
To create one of these folders for your new post, enter the following command into R:
```{r, eval=F, echo=T}
distill::create_post("Blog post workflow")
```
This does two things: it creates the folder automatically (circled in red), and it opens a new RMarkdown file where you can begin writing your post (circled in green).
```{r}
knitr::include_graphics("images/bpw/post_added.png")
```
<aside>
In red: notice the folder appears in both your <b>File</b> tab and your <b>Git</b> tab. (Don't worry about the date on this folder - it's for internal use and does not need to match the publication date.)
In green: this is the RMarkdown file where you'll write your post.
</aside>
Check out our [Quick-start Guide for Blogging with RMarkdown](../2021-02-02-blogging-with-rmarkdown/index.html)!
As you're writing your post, you can preview it as a fully formatted webpage by hitting the `Knit` button at the top of your RMarkdown file:
```{r}
knitr::include_graphics("images/bpw/knit.png")
```
<aside>
You can switch between previewing the page in the RStudio Viewer tab, or in your computer's default web browser. Click the settings icon next to "Knit" for preview options.
</aside>
## Put your data in a "data" folder!
If you're working with a PMA data extract in your post **put it in a sub-folder called** `data`. Your editors will need a copy of your extract in order to render your post! For example, I would store the data and the ddi for this post like this:
```
./_posts/2021-02-02-blog-post-workflow/data/pma_00008.dat.gz
./_posts/2021-02-02-blog-post-workflow/data/pma_00008.xml
```
It's OK to push your data to UMN Github! As long as you put it in a folder called `data`, admin will automatically remove it before posting the public version of the blog.
## Push your post to GitHub
When you're finished writing, follow these steps to share your post with the team on our Enterprise GitHub page (reminder: it won't go "live" until Matt or postdocs merge your post to `master` and publishes it to the public GitHub page).
**Press the "Knit" button one more time to render a final HTML version of the page.** (At this point, you may see a number of automatically created files related to your RMarkdown file in the `Git` tab.)
Now, enter the following commands directly into the R console, *but please adjust the brief commit "message" as necessary to describe your change!*
```{r, echo=T, eval=F}
gert::git_add(".")
gert::git_commit("Draft published: blog-post-workflow")
gert::git_push()
```
Now, if you visit our Enterprise GitHub page, your post will appear in a new branch! (It will not yet appear on the copy of the blog we have posted there, which is rendered only from `master`.)
```{r}
knitr::include_graphics("images/bpw/see_branches.png")
```
# Editors
## Pull the author's branch to your computer
Any time that you want to review an author's post, you'll always need to get the latest copy of their branch from our Enterprise GitHub page. If this is the first time you've read a draft for the post, the author's branch won't yet be listed in RStudio.
```{r}
knitr::include_graphics("images/bpw/master_only.png")
```
<aside>
This editor's RStudio only knows about the "master" branch so far.
</aside>
The `Pull` button in RStudio's `Git` tab will gather information about all of the new branches on GitHub, and it will download a copy of each one onto your computer:
```{r}
knitr::include_graphics("images/bpw/pull_button.png")
```
Clicking it will bring up a dialogue screen. RStudio reports that it discovered a new branch `blog-post-workflow` living at the remote repository `origin`.
```{r}
knitr::include_graphics("images/bpw/pull_new_branch.png")
```
Returning to RStudio's main window, notice that you can now toggle between the working on the remote `master` branch, or the new remote branch called `blog-post-workflow`. When you're ready to edit the author's post, use this menu to select their branch.
```{r}
knitr::include_graphics("images/bpw/toggle_branches.png")
```
RStudio automatically creates a local version of this branch on your computer, and it reports that your changes will be tracked and pushed to the remote branch of the same name.
```{r}
knitr::include_graphics("images/bpw/new_local_branch.png")
```
## Locate and edit the new post
Now, RStudio shows that you're working on the author's branch in the `Git` tab, and you'll see their post listed in the `_posts` folder on the `Files` tab. Navigate to the `.Rmd` file for their post, then click it to begin making edits.
```{r}
knitr::include_graphics("images/bpw/author_post_appears.png")
```
Check out our [Quick-start Guide for Blogging with RMarkdown](../2021-02-02-blogging-with-rmarkdown/index.html)!
As you're writing your post, you can preview it as a fully formatted webpage by hitting the `Knit` button at the top of your RMarkdown file:
```{r}
knitr::include_graphics("images/bpw/edit_knit.png")
```
<aside>
You can switch between previewing the page in the RStudio Viewer tab, or in your computer's default web browser. Click the settings icon next to "Knit" for preview options.
</aside>
## Push the edited post back to GitHub
When you're finished editing, follow these steps to send the revised file back to GitHub. (Reminder: it won't appear on the website until Matt or postdocs merge the post to `master` and publishes it to the public GitHub page).
**Press the "Knit" button one more time to render a final HTML version of the page.** (At this point, you may see a number of automatically created files related to your RMarkdown file in the `Git` tab.)
Now, enter the following commands directly into the R console, *but please adjust the brief commit "message" as necessary to describe your change!*
```{r, echo=T, eval=F}
gert::git_add(".")
gert::git_commit("Draft edited: blog-post-workflow")
gert::git_push()
```
Now, if you visit our Enterprise GitHub page, you'll see that your edited files appear on the author's branch. (They will not yet appear on the copy of the blog we have posted there, which is rendered only from `master`.)
```{r}
knitr::include_graphics("images/bpw/draft_accepted.png")
```
# All users: making revisions
When you've finished pushing something to GitHub, **please email your collaborators to let them know about next steps!**
To get your collaborator's latest updates from GitHub, **you should launch the pma-data-hub RStudio project file** called `pma-data-hub.Rproj`. If you forget to open the project file, RStudio will not be able to access the contents of your Git folder (it won't know that there's a GitHub repository for the project at all).
When you open the project file, RStudio will again display a `Git` tab. Click on the `Pull` button to get your collaborator's latest changes.
```{r}
knitr::include_graphics("images/bpw/pull_button.png")
```
Switch from the `master` branch to the branch associated with your post:
```{r}
knitr::include_graphics("images/bpw/toggle_branches.png")
```
After you're finished incorporating their feedback into the RMarkdown file (.Rmd), **click Knit** and then run these commands in Terminal again to send your work back to GitHub:
```{r, echo=T, eval=F}
gert::git_add(".")
gert::git_commit("Draft complete: blog-post-workflow")
gert::git_push()
```
**Please let Matt & postdocs know when your revisions are complete and ready to appear on the live blog!**
# Site Admin
*These instructions will introduce an additional remote to your local repository. Only use them if you'll be involved with managing updates to the live blog (e.g. Matt & postdocs).* All of these functions can be used the in RStudio Terminal.
## Setup
Before adding a second remote, it's best to rename the UMN GitHub remote something like `private`. You can check the current name with:
```
git remote
```
If the UMN GitHub is still called `origin` (by default), rename it with:
```
git remote rename origin private
```
Likewise, you should give the `private` branch `master` a different name, as the second remote will have a `master` branch, too. To see all of the remote branches currently in use:
```
git remote show private
```
Checkout `master` and change its name to something like `private-master`:
```
git checkout master
git branch -m "private-master"
```
Now, add the **public remote** and fetch its branches (there should only be one, called `master`):
```
git remote add public https://github.com/ipums/pma-data-hub
git fetch public
```
Create a local branch called `public-master` corresponding with `master` on the `public` remote:
```
git branch public-master public/master
```
At this point, you'll notice that RStudio shows two remotes (with the branches you've fetched) and all of the local branches you've created so far.
```{r}
knitr::include_graphics("images/bpw/dual-remotes.png")
```
<aside>
In this example, we set the local "private-master" branch to track "master" on the private remote. The "public-master" branch tracks "master" on the public remote.
</aside>
## Create a Git Hook to remove data from the public repo
**We should never push PMA data extracts to the public repository.** Yet, we need to have these data available on the private repository in order to build each others' posts when we want to review or merge new content.
You might imagine that the only way to do this would be to manually delete the `data` folder for each post just before we push something from our local version of `public-master` to the remote `public/master`. Luckily, [git hooks](https://git-scm.com/docs/githooks) offers a way to automatically delete any incoming data folder headed to a merge with `public-master`.
Simply create a file at `.git/hooks/post-merge` containing the following code:
```
#! /bin/sh
green='\033[0;32m'
nc='\033[0m'
# Start from the repository root.
cd ./$(git rev-parse --show-cdup)
# Delete data files and empty directories.
if [ `git rev-parse --abbrev-ref HEAD` == "public-master" ]; then
echo "${green}Deleting data files...${nc}"
find . -path '*/data/*' -delete
find . -type d -empty -delete
fi
```
<aside>
This will only delete `data` folders if a branch called `public-master` is currently checked out when we run `git merge` (see below). **If you gave a different name to your local public branch, you'll have to edit this code!**
</aside>
You also must make this file executable. For Mac OS users, you can do this by entering the following command into Terminal (assuming you're already in the pma data hub directory):
```
chmod +x .git/hooks/post-merge
```
Now, when you follow the merging steps below, any file saved in a folder called `data` (including all sub-folders) will be deleted automatically.
## Merging
**Our main goal is to avoid creating divergent commit histories between the internal repository and the public repository.** In practice, that means a typical workflow will involve these steps:
1. Squash and Merge the author's branch to private/master
2. Build, Commit, and Push to private/master
3. Merge private/master to public/master
4. Push to public/master
Our authoring / editing workflow generates a commit each time someone adds a change to the branch for a new post (you can run `git log` on any branch to see its full commit history). We will [squash and merge](https://docs.gitlab.com/ee/user/project/merge_requests/squash_and_merge.html) this commit history into a single commit on `private-master`.
For example, with a new post on the branch `blog-post-workflow`:
```
git checkout private-master
git merge --squash blog-post-workflow
```
Now, in RStudio, hit the `Build Website` button. Look over the site to make sure everything looks good. (We won't build on `public-master` to avoid merge conflicts, so get those edits in now.)
```{r}
knitr::include_graphics("images/bpw/build.png")
```
When you're ready, add all files and commit your changes with a message like:
```
git add .
git commit -m "new post: blog-post-workflow"
```
Although your local branch has a different name, you can push your commit to `master` on the `private` remote with this command:
```
git push private HEAD:master
```
Wait a few minutes, and you'll see that the GitHub Pages site hosted at UMN GitHub should update to reflect your changes.
To merge your `private-master` to `public-master`:
```
git checkout public-master
git merge --squash private-master -X theirs
git add .
git commit -m "new post: blog-post-workflow"
```
(It's good practice to squash commits to private-master, too, since we sometimes make quick-fixes there. The `-X theirs` part tells Git to defer to `private-master` over `public-master` in case there are any conflicts.)
If you've created a Git Hook to automatically delete any `data` folders (see above), you're all set! **If not, manually delete them now.** Finally, push your changes to the live site:
```
git push public HEAD:master
```
## Other git tips
Need to rollback to a previous commit? Look for it in the `git log`, and do a hard reset:
```
git log
git reset --hard 58ba4f0396b985fb5ab82c88f7bbc5c9cc619e71
```
For a checklist of updated files and their commit status:
```
git status
```
To see a list of files with unresolved conflicts
```
git diff --name-only --diff-filter=U
```
If you ever need to force a push to GitHub (e.g. if you REALLY must overwrite a commit), you can do that with the force option:
```
git push -f
```