-
Notifications
You must be signed in to change notification settings - Fork 9
/
Copy path02-project-intro.html
128 lines (127 loc) · 8.69 KB
/
02-project-intro.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="generator" content="pandoc">
<title>Software Carpentry: R for reproducible scientific analysis</title>
<link rel="shortcut icon" type="image/x-icon" href="/favicon.ico" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<link rel="stylesheet" type="text/css" href="css/bootstrap/bootstrap.css" />
<link rel="stylesheet" type="text/css" href="css/bootstrap/bootstrap-theme.css" />
<link rel="stylesheet" type="text/css" href="css/swc.css" />
<link rel="alternate" type="application/rss+xml" title="Software Carpentry Blog" href="http://software-carpentry.org/feed.xml"/>
<meta charset="UTF-8" />
<!-- HTML5 shim, for IE6-8 support of HTML5 elements -->
<!--[if lt IE 9]>
<script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
</head>
<body class="lesson">
<div class="container card">
<div class="banner">
<a href="http://software-carpentry.org" title="Software Carpentry">
<img alt="Software Carpentry banner" src="img/software-carpentry-banner.png" />
</a>
</div>
<article>
<div class="row">
<div class="col-md-10 col-md-offset-1">
<a href="index.html"><h1 class="title">R for reproducible scientific analysis</h1></a>
<h2 class="subtitle">Project management with RStudio</h2>
<section class="objectives panel panel-warning">
<div class="panel-heading">
<h2 id="learning-objectives"><span class="glyphicon glyphicon-certificate"></span>Learning objectives</h2>
</div>
<div class="panel-body">
<ul>
<li>Understand motivation for code and data management</li>
<li>Know how to organize code, data, and results</li>
<li>Be able to create and use an RStudio project</li>
</ul>
</div>
</section>
<h2 id="code-data-organization">Code & Data Organization</h2>
<p>The scientific process is naturally incremental, and many projects start life as random notes, some code, then a manuscript, and eventually everything is a bit mixed together.</p>
<blockquote class="twitter-tweet">
<p>
Managing your projects in a reproducible fashion doesn’t just make your science reproducible, it makes your life easier.
</p>
— Vince Buffalo (<span class="citation">@vsbuffalo</span>) <a href="https://twitter.com/vsbuffalo/status/323638476153167872">April 15, 2013</a>
</blockquote>
<script async src="http://platform.twitter.com/widgets.js" charset="utf-8"></script>
<p>A good project layout will ultimately make your life easier:</p>
<ul>
<li>It makes it easier to understand the pipeline from source data to final product</li>
<li>It helps ensure the integrity of your data</li>
<li>It makes it simpler to share your code with someone else</li>
<li>It allows you to easily upload your code with your manuscript submission</li>
<li>It makes it easier to pick the project back up after a break</li>
</ul>
<h3 id="best-practices-for-project-organization">Best practices for project organization</h3>
<p>Although there is no “best” way to lay out a project, there are some general principles to adhere to that will make project management easier:</p>
<h4 id="treat-raw-data-as-read-only">Treat raw data as read only</h4>
<p>This is probably the most important goal of setting up a project. Raw data should never be edited, because you can never be sure that you will want to keep any edit you make, and you want to have a record of any changes you make to data. Therefore, treat your raw data as “read only”, perhaps even making a <code>raw_data</code> directory that is never modified. If you do some data cleaning or modification, save the modified file separate from the raw data, and ideally keep all the modifying actions in a script so that you can review and revise them as needed in the future.</p>
<h4 id="treat-generated-output-as-disposable">Treat generated output as disposable</h4>
<p>Anything generated by your scripts should be treated as disposable: it should all be able to be regenerated from your scripts. There are lots of different ways to manage this output, and what’s best may depend on the particular kind of project. At a minimum, it’s useful to have separate directories for each of the following:</p>
<ul>
<li>data: Ideally .csv files as these are flat, transparent, and universal. You may have other specialized formats as well. .rda and .rds are R-specific data files but you never <em>need</em> to use these.</li>
<li>code: .R files, perhaps .do files if Stata is your thing, .py files for Python, etc.</li>
<li>results: .png or .pdf files for plots; .tex or .txt files for tables</li>
<li>papers: .tex if you write in LaTeX, .doc for Word, .Rmd for RMarkdown (which we recommend and will cover tomorrow afternoon), and .pdf or .html rendered documents.</li>
</ul>
<h3 id="rstudio-projects">RStudio Projects</h3>
<p>RStudio has a feature to help keep everything organized in a self-contained, reproducible package, called a “project”.</p>
<p>A project is a small file with a <code>.Rproj</code> extension, but you can think of all the files and sub-directories as belonging to that project. We recommend creating a directory and a project file for each project you work on. It should look something like this:</p>
<div class="figure">
<img src="img/good_project_organization.png" alt="" />
</div>
<p>When you want to work on this project using R, double click on the .Rproj file, and RStudio will open it and keep everything organized for you. You can also open an existing project from RStudio by clicking “File -> Open project…”</p>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h4 id="challenge-create-a-project"><span class="glyphicon glyphicon-pencil"></span>Challenge – Create a Project</h4>
</div>
<div class="panel-body">
<p>Let’s create a new project in RStudio that will contain all of our work for this workshop.</p>
<ol style="list-style-type: decimal">
<li>Click the “File” menu button, then “New Project”.</li>
<li>Click “New Directory”.</li>
<li>Click “Empty Project”.</li>
<li>Type a descriptive directory name: This is the title of your project, so for this one you might use “DataCarpentry” or something similar.</li>
<li>Store the new directory in a sensable place in your computer’s organizational scheme. If you have a “workshops” or “classes” directory that would make sense. This will create a new directory called “DataCarpentry” in that directory.</li>
<li>Click “Create Project”.</li>
</ol>
</div>
</section>
<p>If everything went right, RStudio should’ve flickered and you should be looking at a pretty bare RStudio instance. That’s okay. Click on the “Files” tab in the lower right pane. Your .Rproj file should be there with nothing else. You’ve got the bare bones of a new project. Let’s now create the directory structure described above, a folder for each of data, code, results, and papers. You can do this in RStudio by clicking on the “New Folder” button in the Files pane, or in your OS by navigating to the directory you just created. Then we’ll download the data that we’re going to use in this workshop and put that in a “raw_data” directory inside your “data” directory.</p>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h4 id="challenge-project-organization"><span class="glyphicon glyphicon-pencil"></span>Challenge – Project Organization</h4>
</div>
<div class="panel-body">
<ul>
<li>In your project directory, either using the Project tab of RStudio or your OS’ file system, create the following directories:
<ul>
<li>data</li>
<li>code</li>
<li>results</li>
<li>papers</li>
</ul></li>
<li>Clicking on <a href="https://minhaskamal.github.io/DownGit/#/home?url=https://github.com/data-lessons/gapminder-R/tree/gh-pages/data/raw_data">this link</a> will take you to a page that downloads a .zip file that contains all the data needed for this workshop. Download it, unzip it, and move the new “raw_data” folder into the data folder of your project.</li>
</ul>
</div>
</section>
</div>
</div>
</article>
<div class="footer">
<a class="label swc-blue-bg" href="http://software-carpentry.org">Software Carpentry</a>
<a class="label swc-blue-bg" href="https://github.com/swcarpentry/lesson-template">Source</a>
<a class="label swc-blue-bg" href="mailto:[email protected]">Contact</a>
<a class="label swc-blue-bg" href="LICENSE.html">License</a>
</div>
</div>
<!-- Javascript placed at the end of the document so the pages load faster -->
<script src="http://software-carpentry.org/v5/js/jquery-1.9.1.min.js"></script>
<script src="css/bootstrap/bootstrap-js/bootstrap.js"></script>
</body>
</html>