In this document we will go through the steps to reproduce an analysis recorded in the document "demo.Rmd" as a demonstration of reproducibility using:
- docker: to fix the computing environment.
- rmarkdown: to format the code and text into a document to (re)produce the analysis.
- knitr: to weave (or knit) the rmarkdown document into a report for clients.
See TBD for more details.
We build the following core layout for this simple project where we will use a custom library hosted on github to produce a simple report.
The following are the sub-directories and their content in this layout scheme:
- docker: contains configuration for the docker environment, i.e. the docker file giving the specifications of the image. The container of this image will be used as the computing environment for this project.
- data: contains the data used in the project. Ideally this will be a link to the directory containing the original data on the user's computer.
- scripts: contains the rmarkdown document and other auxillary scripts that may be necessary for reproducing the analysis.
- results: contains the report and or results of the analysis intended to be reproduced.
The top project directory reproduce-analysis
contains two bash scripts:
- compile_rmd: This scripts allows the analyst (username rstudio) to compile a rmarkdown document into a report in scripting mode.
- run_docker: This script allows the analyst to run R either in terminal interface or an RStudio session and thus interactively reproduce the analysis recorded in "demo.Rmd" or any other rmarkdown document.
Here we assume that the underlying platform of use for docker is desktop computer running some version of Unix operating system such as MacOS or a distribution of Linux.
Go to this webpage and click on the link to get docker for your operating system. The link takes you to a page that contains links to software download as well as operating system specific instructions for installation.
Similarly go to this webpage to get instructions on how to download and install git for your environment.
For example, you can clone this repo using the command:
git clone [email protected]:ssinari/reproduce-analysis.git
or by downloading the ZIP file from here. Similarly you may clone other project repos.
cd reproduce-analysis
./compile_rmd -p $(pwd)
This will compile the analysis in rmarkdown to a new report named
date
_demo.pdf, where date is in the format YYYYMMDD. This report is
identical to the original one given by "demo-original-output.pdf" except the
date in the header will be the date of compilation. Look at
./compile_rmd -h
for more information on how to use it to compile another Rmd into a report located under this project.
The report can also be generated more interactively using either terminal based R or an RStudio session. The steps below show you how to do this.
- To initiate the docker container with terminal interface to R, run the command:
run_docker -s ``bash''
from inside the cloned directory. The command will compile the right docker environment and provide a bash terminal. To compile "demo.Rmd" and place the resulting report in the "results" folder underneath the project do:
R -e ``rmarkdown::render(``/home/rstudio/project/scripts/demo.Rmd'' \
, output_file = ``YYYYMMDD_demo.pdf'' \
, output_dir = ``/home/rstudio/project/results'')''
For more flexibity in using this command type:
run_docker -h
You can exit this session normally as you would in a terminal using the exit
command.
- Rstudio interface is also available. Just invoke the following command in the terminal:
run_docker
then point your browser to the URL http://localhost:8787. Type the username
rstudio
and password 123456
. This will land you in an RStudio session with
project
directory visible under the Files menu. Navigate to project > scripts
and click to open the document "demo.Rmd". Click knit
to knit the document
and then move the resulting report "demo.pdf" to the "results" folder and
rename it to "date
_demo.pdf". When finished do CTRL-C
in the terminal to
terminate the docker container.
Name the resulting report in the format "Date
_ReportName
.pdf". Here Date
is in the format YYYYMMDD and ReportName
is any alphanumeric identifier. The
advantages of naming this way is that sorting reports is easy along with the ability to
quickly identify the report by the date it was generated.