generated from jtr13/bookdown-template
-
Notifications
You must be signed in to change notification settings - Fork 32
/
Copy pathpart1_intro.qmd
37 lines (31 loc) · 1.96 KB
/
part1_intro.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# Part 1: Don’t Repeat Yourself
The first idea we are going to focus on is Don’t Repeat Yourself. Simply by
avoiding having to repeat yourself, you will naturally implement best practices
to make your pipelines reproducible.
## Introduction
Part 1 will focus on teaching you the fundamental ingredients to
reproducibility. By fundamental ingredients I mean those tools that you
absolutely need to have in your toolbox before even attempting to make a project
reproducible. These tools are so important that a good chunk of this book is
dedicated to them:
- Version control;
- Functional programming;
- Literate programming.
You might already be familiar with these topics, and maybe already use them in
your day to day. If that’s the case, you still might want to at least skim part
1 before tackling part 2 of the book, which will focus on another set of tools
to actually build reproducible analytical pipelines (RAPs).
So this means that part 1 will not teach you how to build reproducible
pipelines. But I cannot immediately start teaching you how to build reproducible
analytical pipelines without first making sure that you understand the core
concepts laid out above. To help you understand these concepts, we will start by
analysing some data together. We are going to download, clean and plot some
data, and we will achieve this by writing two scripts. These scripts will be
written in a very typical non-"software engineery" way, as to mimic how
analysts, data scientists or researchers without any formal training in computer
science would perform such an analysis. This does not mean that the quality of
the analysis will be low. But it means that, typically, these programmers have
delievering results fast, and by any means necessary, as their top priority. My
goal with this book is to show you, and hopefully convince you, that by adopting
certain simple ideas from software engineering it is possible to deliver just as
fast as before, but in a more consistent and robust way.