-
Notifications
You must be signed in to change notification settings - Fork 0
/
02_intro.qmd
187 lines (119 loc) · 6.53 KB
/
02_intro.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
---
title: "Optimising R Workflows: Welcome!"
author: "Dr Anna Krystallli"
subtitle: "Introduction"
institute: R-RSE
materials_url: https://optimising-r.netlify.app
format:
revealjs:
theme: [default, styles.scss, reveal.scss]
logo: assets/logo/r-rse-logo2.png
footer: "[{{< fa home >}}](https://optimising-r.netlify.app/)"
template-partials:
- title-slide.html
editor: visual
preload-iframes: true
---
## 👋 Hello
### me: **Dr Anna Krystalli**
- **Research Software Engineering Consultant**, [**`r-rse`**](https://www.r-rse.eu/)
- mastodon [annakrystalli\@fosstodon.org](https://fosstodon.org/@annakrystalli)
- twitter @annakrystalli
- github @annakrystalli
- email **r.rse.eu\[at\]gmail.com**
- **Editor [rOpenSci](http://onboarding.ropensci.org/)**
- **Founder & Core Team member** [**ReproHack**](https://www.reprohack.org/)
## Objectives
In this course we'll explore:
- Benchmarking and profiling code
- Best practice for writing performant code in R
- Best practice in working efficiently with data
- Parallelising workflows
# Background {.inverse}
# {background-image="https://images-wixmp-ed30a86b8c4ca887773594c2.wixmp.com/f/1cc5e3ff-37e5-4b9c-abf4-92304fafa4c9/deekqx1-20d6363f-185e-4f8d-a748-f5b3f3b8fdde.gif?token=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJ1cm46YXBwOjdlMGQxODg5ODIyNjQzNzNhNWYwZDQxNWVhMGQyNmUwIiwiaXNzIjoidXJuOmFwcDo3ZTBkMTg4OTgyMjY0MzczYTVmMGQ0MTVlYTBkMjZlMCIsIm9iaiI6W1t7InBhdGgiOiJcL2ZcLzFjYzVlM2ZmLTM3ZTUtNGI5Yy1hYmY0LTkyMzA0ZmFmYTRjOVwvZGVla3F4MS0yMGQ2MzYzZi0xODVlLTRmOGQtYTc0OC1mNWIzZjNiOGZkZGUuZ2lmIn1dXSwiYXVkIjpbInVybjpzZXJ2aWNlOmZpbGUuZG93bmxvYWQiXX0.bQwR0OGahVNiMtiHhvn95SFiuAZKxapsWSr_AbMK_Oc"}
::: {style="background-color: #ffffffbb; border-radius: 10px; padding: 5px;"}
::: r-fit-text
Computation
:::
:::
::: fragment
[![Transistor icons created by surang - Flaticon](assets/images/transistor.png){fig-alt="transistor icon" fig-align="center"}](https://www.flaticon.com/free-icons/transistor)
:::
::: notes
Computers represent info using binary code in the form of digital 1s and 0s inside the central processing unit ([CPU](https://www.techtarget.com/whatis/definition/processor)) and RAM. These digital numbers are electrical signals that are either on or off inside the CPU or [RAM](https://www.techtarget.com/searchstorage/definition/RAM-random-access-memory).
Each transistor is a switch, that is, **0** when turned off and **1** when turned on. The more transistors, the more switches.
Transistors are the basic building blocks that regulate the operation of computers, mobile phones, and all other modern electronic circuits and is the basic unit of the CPU
:::
## Computer Hardware
::: columns
::: {.column width="33%"}
#### CPU (Processing)
![](assets/images/cpu.png){fig-align="center"}
:::
::: {.column width="33%"}
#### RAM (memory)
![](assets/images/ram-memory.png){fig-align="center"}
:::
::: {.column width="33%"}
#### I/O
![](assets/images/hdd.png){fig-align="center" width="200"} ![](assets/images/networking.png){fig-align="center" width="200"}
:::
:::
::: notes
CPU
- The central processing unit (CPU), or the processor, is the brains of a computer. The CPU is responsible for performing numerical calculations.
- The faster the processor, the faster R will run.
- The clock speed (or clock rate, measured in hertz) is the frequency with which the CPU executes instructions. The faster the clock speed, the more instructions a CPU can execute in a section.
RAM
- Random access memory (RAM) is a type of computer memory that can be accessed randomly: any byte of memory can be accessed without touching the preceding bytes.
- The amount of RAM R has access to is incredibly important. Since R loads objects into RAM, the amount of RAM you have available can limit the size of data set you can analyse. MEMORY BOUND
Even if the original data set is relatively small, your analysis can generate large objects
:::
## Moore's law
<iframe src="https://ourworldindata.org/grapher/transistors-per-microprocessor" loading="lazy" style="width: 100%; height: 600px; border: 0px none;">
</iframe>
::: notes
When the price is unchanged, the number of components that can be accommodated on the integrated circuit will **double every 18-24 months**, and the performance will double. In other words, the performance of a computer that can be bought for every dollar will more than double every 18-24 months
:::
## Yet...
### we've hit clock speed stagnation
[![50 Years of Processor Trends. Distributed by Karl Rupp under a CC-BY 4.0 License](assets/images/50-years-processor-trend.png){fig-align="center"}](https://github.com/karlrupp/microprocessor-trend-data)
# About R
## R is an interpreted language
::: columns
::: {.column width="50%"}
**Compiled language**
Converted directly into machine code that the processor can execute.
- Tend to be faster and more efficient to execute.
- Need a "build" step which builds for system they are run on
- **Examples:** C, C++, Erlang, Haskell, Rust, and Go
:::
::: {.column width="50%"}
#### **Interpreted Languages**
Code interpreted line by line during run time.
- significantly slower although [just-in-time compilation](https://guide.freecodecamp.org/computer-science/just-in-time-compilation) is closing that gap.
- much more compact and easy to write
- **Examples:** R, Ruby, Python, and JavaScript.
:::
:::
## R performance
::: incremental
- R offers some excellent features: dynamic typing, lazy functional evaluation and object-orientation
- Side effect: operations are undertaken in single-threaded mode, i.e. sequentially
- Many routines in R are written in compiled languages like C & Fortran.
- R performance can be enhanced by linking to optimised Linear Algebra Libraries.
- R offers many ways to parallelise computations.
- Many packages wrap more performant C, Fortran, C++ code.
:::
::: notes
R as a language and environment is reasonably well established and understood. A combination of dynamic typing, lazy functional evaluation and object-orientation (in several flavors) makes for a somewhat unique combination.
One side-effect of this design is that core operations are undertaken in single-threaded mode, or, in other words, sequentially.
:::
# About this course
::: incremental
- I normally like to live code...BUT!
- There's a lot of materials to get through so I will be copying & pasting from the materials alot
- Have the materials handy to follow along
- Please stop me for questions or to share your own experiences
:::
# Let's go! [{{< fa home >}}](https://optimising-r.netlify.app/)