Skip to content

Commit

Permalink
nye filer og up to date
Browse files Browse the repository at this point in the history
  • Loading branch information
chrbknudsen committed Jan 24, 2025
1 parent 572a573 commit 9871375
Show file tree
Hide file tree
Showing 15 changed files with 326 additions and 3 deletions.
11 changes: 10 additions & 1 deletion config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,14 @@
# lc: Library Carpentry
# cp: Carpentries (to use for instructor training for instance)
# incubator: The Carpentries Incubator
# Note that you can also use a custom carpentry type. For more info,
# see the documentation: https://carpentries.github.io/sandpaper-docs/editing.html
carpentry: 'lc'

# Custom carpentry description
# This will be used as the alt text for the logo
# carpentry_description: "Custom Carpentry"

# Overall title for pages.
title: 'R-toolbox'

Expand All @@ -24,7 +30,7 @@ keywords: 'software, data, lesson'
life_cycle: 'pre-alpha'

# License of the lesson materials (recommended CC-BY 4.0)
license: 'CC0'
license: 'CC BY-NC-SA 4.0'

# Link to the source repository for this lesson
source: 'https://github.com/KUBDatalab/R-toolbox'
Expand Down Expand Up @@ -88,6 +94,9 @@ episodes:
- missing-data.Rmd
- adv-dataviz.Rmd
- digitizing-graphs.Rmd
- ucloud.Rmd
- model-control.Rmd
- pipes.Rmd

# Information for Learners
learners:
Expand Down
3 changes: 3 additions & 0 deletions episodes/fence-test.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,9 @@ challenge og solution gør vi også.

:::: solution
## Den optræder som regel med denne solution fence



::::


Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added episodes/fig/new_project_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added episodes/fig/new_project_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added episodes/fig/new_project_3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added episodes/fig/ucloud_front.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added episodes/fig/ucloud_picks.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added episodes/fig/ucloud_rstudio.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added episodes/fig/ucloud_store.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
33 changes: 33 additions & 0 deletions episodes/files/ucloud_tensor_keras.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Kører shell kommandoer fra R
system2("sudo", args = c("add-apt-repository", "-y", "ppa:deadsnakes/ppa"))

system2("sudo", args = c("apt-get", "update"))
system2("sudo", args = c("apt-get", "install", "-y", "python3.9", "python3.9-venv", "python3.9-dev"))

# Python setup
system2("python3.9", args = c("-m", "ensurepip", "--upgrade"))
system2("python3.9", args = c("-m", "pip", "install", "--upgrade", "pip"))
system2("python3.9", args = c("-m", "pip", "install", "numpy"))

# Opret og aktiver virtual environment
system2("python3.9", args = c("-m", "venv", "~/r-tensorflow"))

# For at aktivere virtual environment og installere packages
system2("bash", args = c("-c", "source ~/r-tensorflow/bin/activate && pip install numpy tensorflow keras spacy && python -m spacy download en_core_web_sm && deactivate"))

# R packages og setup
Sys.unsetenv('RETICULATE_PYTHON')
library(reticulate)
use_virtualenv('~/r-tensorflow', required = TRUE)
py_config()

install.packages('remotes')
remotes::install_github('rstudio/tensorflow', upgrade = 'always')
library(tensorflow)
install_tensorflow(envname = '~/r-tensorflow', restart_session = FALSE)

remotes::install_github('rstudio/keras')
library(keras3)
install_keras(envname = '~/r-tensorflow')

print("READY TO GO!")
38 changes: 38 additions & 0 deletions episodes/model-control.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
---
title: 'model-control'
teaching: 10
exercises: 2
---

:::::::::::::::::::::::::::::::::::::: questions

- How do you write a lesson using R Markdown and `{sandpaper}`?

::::::::::::::::::::::::::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::: objectives

- Explain how to use markdown with the new lesson template
- Demonstrate how to include pieces of code, figures, and nested challenge blocks

::::::::::::::::::::::::::::::::::::::::::::::::

Det har du lært i undervisningen. Så måske du skulle have fulgt med der.

Men, overordnet. Kontrol eller tjek af om modellen opfylder betingelserne for
at man må bruge den. Det kunne være:

Fordeling af residualer. Er de normalfordelte. Er der tydelige mønstre i residualplottet
linearitetsantagelsen
multikollinearitet
outliers.

::::::::::::::::::::::::::::::::::::: keypoints

- Use `.md` files for episodes when you want static content
- Use `.Rmd` files for episodes when you need to generate output
- Run `sandpaper::check_lesson()` to identify any issues with your lesson
- Run `sandpaper::build_lesson()` to preview your lesson locally

::::::::::::::::::::::::::::::::::::::::::::::::

48 changes: 48 additions & 0 deletions episodes/pipes.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
---
title: 'Different pipes'
teaching: 10
exercises: 2
---

::: questions
- What are the differences between the two pipes?
:::

::: objectives
- Explain how to use markdown with the new lesson template
:::

<https://stackoverflow.com/questions/67633022/what-are-the-differences-between-rs-native-pipe-and-the-magrittr-pipe>

der er også et billede

magrittr pipen blev introduceret november 2014.

Vi bruger den massivt i vores undervisning, for den er super nyttig.

18. maj 2021 kom R version 4.1. Og den havde en indbygget pipe, der gør næsten, men ikke fuldstændigt, det samme som magrittr pipen.

KUB datalab fortsætter indtil videre med at bruge magrittr pipen. Det er tanken at vi fortsætter indtil den native pipe er default for tastaturgenvejen for indsættelse er pipe operatoren i Rstudio ( Ctrl+Shift+M Cmd+Shift+M).

Man kan ændre pipeoperatoren i Rstudio til at være

genvejene kan findes i Tools → Keyboard Shortcuts Help. alt-shift-k (Option+Shift+K på mac)

Tools → Modify Keyboard Shortcuts.

Vil man ændre den så er det Tools → Global Options (eller Project Options) → Code og hak af i "Use native pipe operator, \|\>"

| Topic | Magrittr 2.0.3 | Base 4.3.0 |
|----------------|------------------------------------|------------|
| Operator | `%>%` \``%<>%` `%$%` `%!>%` `%T>%` | `|>` |
| Function calls | `1:2 %>% sum()` | `1:2 |> sum()` |
| | `1:2 %>% sum` | Needs the parentheses |
||`1:3 %>% `+` 4`| Not every function is supported|


::: keypoints
- Use `.md` files for episodes when you want static content
- Use `.Rmd` files for episodes when you need to generate output
- Run `sandpaper::check_lesson()` to identify any issues with your lesson
- Run `sandpaper::build_lesson()` to preview your lesson locally
:::
182 changes: 182 additions & 0 deletions episodes/ucloud.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,182 @@
---
title: 'R på ucloud'
teaching: 10
exercises: 2
---

:::::::::::::::::::::::::::::::::::::: questions

- How do you write a lesson using R Markdown and `{sandpaper}`?

::::::::::::::::::::::::::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::: objectives

- Explain how to use markdown with the new lesson template
- Demonstrate how to include pieces of code, figures, and nested challenge blocks

::::::::::::::::::::::::::::::::::::::::::::::::


All students at UCPH have access to a High Performance Computing (HPC) facility;
ucloud. It can be acessed at cloud.sdu.dk using the normal UCPH login and
password.


Depending on the user allowances, it will look something like this.

![](fig/ucloud_front.png)

Ucloud provides access to a multitude of different applications organized in the
application store:

![](fig/ucloud_store.png)

Amongst the more popular picks from the store is RStudio:


![](fig/ucloud_picks.png)

This allow us to start a session of RStudio, accessible in the browser.


![](fig/ucloud_rstudio.png)


Note that we can chose different machine types. Parallel processing is not
the solution to every problem, but sometimes it is, and here we get access to a
_lot_ of cores.


## This is not a computer as such

What we can start is not a computer. It is a virtual machine. If we need to
save our results, we need to save them to files, on our user drive.

If we do not save the results, or for that matter the scripts we write,
all is lost when the virtual machine closes. Either because we close it,
or because the time ran out.

Another issue brought up by this, is that whenever we install a library,
we install it in the virtual machine. Those installed libraries also
dissappear when the machine closes.

One way to get around that is to run the virtual machine indefinitely. That is
expensive. Another is to have a prepared script we can every time we
start a new session.

Below you will find one such, rather complex, script setting up the
machine for running Keras (a module for advanced machine learning), using
tensorflow. It can be downloaded here INDSÆT LINK.

```{r eval = F}
# Run shell commands from R
system2("sudo", args = c("add-apt-repository", "-y", "ppa:deadsnakes/ppa"))
system2("sudo", args = c("apt-get", "update"))
system2("sudo", args = c("apt-get", "install", "-y", "python3.9", "python3.9-venv", "python3.9-dev"))
# Python setup
system2("python3.9", args = c("-m", "ensurepip", "--upgrade"))
system2("python3.9", args = c("-m", "pip", "install", "--upgrade", "pip"))
system2("python3.9", args = c("-m", "pip", "install", "numpy"))
# Create and activate environment
system2("python3.9", args = c("-m", "venv", "~/r-tensorflow"))
# Activate virtual environment and install packages
system2("bash", args = c("-c", "source ~/r-tensorflow/bin/activate && pip install numpy tensorflow keras spacy && python -m spacy download en_core_web_sm && deactivate"))
# R packages and setup
Sys.unsetenv('RETICULATE_PYTHON')
library(reticulate)
use_virtualenv('~/r-tensorflow', required = TRUE)
install.packages('remotes')
remotes::install_github('rstudio/tensorflow', upgrade = 'always')
library(tensorflow)
install_tensorflow(envname = '~/r-tensorflow', restart_session = FALSE)
remotes::install_github('rstudio/keras')
library(keras3)
install_keras(envname = '~/r-tensorflow')
```

One downside to this, is that this takes quite some time, and will have to
be repeated _every_ single time we start the virtual machine.

## Another approach

Data analysis is not worth much, if we are not able to reproduce our results.
A significant amount of work have therefore gone into providing infrastructure
for exactly that. One issue is the question of which libraries are used for
the analysis.

Enter `renv`. `renv` is a library that establishes scaffolding for installing
libraries in a specific location in an R-project, making it self contained and
easy to distribute. Normally we would distribute a "lock file" that describes
exactly which versions of which packages are used in a project.

And project is an important word here. This works best if we are working in an
R-project. So begin by making a project.

Give it a name, make sure it is saved somewhere easy to find in your files.
And tick the box about renv!

You will see a lot of stuff in the "files" tab. A folder called "renv", a file
"renv.lock", and probably a file ".Rprofile".

Looking into that, we will find a line of code "source("renv/activate.R")"

When ever we start the project, what ever we have written to
.Rprofile will be run. What will be run in this case is the script "activate.R"
which does a lot of interesting stuff. The important thing is, that
the renv-library is started. And whenever we now install a package, it
will be installed in the renv folder. Do not delve too deep into that, leave it
to the machine.

One issue with this is, that there are still installed packages weird places
on the machine. Caches of the packages are stored outside our project. The idea
is that other projects might use these cached packages, and cut down on install
time.

In our case, that is not helpful. This cache will disappear when the virtual
machine is stopped.

In order to handle this, we can specify where the cache should be stored.
We can do that manually. Or, and this is the preffered solution, make a file
.Renviron where we specify where renv should place the cache. Having done that
we need to restart R, and now we can install packages to our hearts delight,
and renv will place both the libraries and the cache in our local project.

An example of a script that sets the environemt file, and installs a selection
of usefull pacakges can be found below. Note that this takes a very long time.
The alternative to this taking a very long time once, is for it to take a very
long time every time we open our project.

```{r eval =F}
# write environment variable to file
writeLines('RENV_PATHS_CACHE = "renv/cache"', ".Renviron")
# restart R in order to set the environment variable
.rs.restart()
library(renv)
# install useful packages
# install.packages() works just as well as install() which comes from
# the renv package. But it is slightly shorter to type.
install("tidyverse")
install("reticulate")
install("devtools")
```

Note that this will need to be done for every project you initialize.
Also note - this takes a looong time...


::::::::::::::::::::::::::::::::::::: keypoints

- Use `.md` files for episodes when you want static content
- Use `.Rmd` files for episodes when you need to generate output
- Run `sandpaper::check_lesson()` to identify any issues with your lesson
- Run `sandpaper::build_lesson()` to preview your lesson locally

::::::::::::::::::::::::::::::::::::::::::::::::

14 changes: 12 additions & 2 deletions learners/CLT-en.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,18 @@
---
title: CLT - på dansk
title: CLT - in english
---

# Et bevis for Central Limit Teoremet.
# A proof of the Central Limit Theorem

In general: CLT explains why many distributions of data, approximates a normal
distribution (bell curve), as samplesize increases, independent of the shape
of the underlying distrubution.

The theorem states, at the distribution of the standardized average of a
sample approaches the standard normal distribution.

Or, less precise, that for large samples their average is more or less normally
distributed around the true average of the population.

Overordnet: CLT forklarer hvorfor mange fordelinger af data, tenderer en
normalfordeling (klokkekurven), når stikprøvestørrelsen bliver stor, uanset den
Expand Down

0 comments on commit 9871375

Please sign in to comment.