nye filer og up to date

KUBDatalab · Jan 24, 2025 · 9871375 · 9871375
1 parent 572a573
commit 9871375
Show file tree

Hide file tree

Showing 15 changed files with 326 additions and 3 deletions.
diff --git a/config.yaml b/config.yaml
@@ -8,8 +8,14 @@
 # lc: Library Carpentry
 # cp: Carpentries (to use for instructor training for instance)
 # incubator: The Carpentries Incubator
+# Note that you can also use a custom carpentry type. For more info,
+# see the documentation: https://carpentries.github.io/sandpaper-docs/editing.html
 carpentry: 'lc'
 
+# Custom carpentry description
+# This will be used as the alt text for the logo
+# carpentry_description: "Custom Carpentry"
+
 # Overall title for pages.
 title: 'R-toolbox'
 
@@ -24,7 +30,7 @@ keywords: 'software, data, lesson'
 life_cycle: 'pre-alpha'
 
 # License of the lesson materials (recommended CC-BY 4.0)
-license: 'CC0'
+license: 'CC BY-NC-SA 4.0'
 
 # Link to the source repository for this lesson
 source: 'https://github.com/KUBDatalab/R-toolbox'
@@ -88,6 +94,9 @@ episodes:
 - missing-data.Rmd
 - adv-dataviz.Rmd
 - digitizing-graphs.Rmd
+- ucloud.Rmd
+- model-control.Rmd
+- pipes.Rmd
 
 # Information for Learners
 learners: 

diff --git a/episodes/fence-test.Rmd b/episodes/fence-test.Rmd
@@ -80,6 +80,9 @@ challenge og solution gør vi også.
 
 :::: solution
 ## Den optræder som regel med denne solution fence
+
+
+
 ::::
 
 

diff --git a/episodes/fig/bafkreicy4q7y2zb3qrvsybnfhxgzizbxwolqxsadykugycjmtgjwqczapa.jpg b/episodes/fig/bafkreicy4q7y2zb3qrvsybnfhxgzizbxwolqxsadykugycjmtgjwqczapa.jpg
diff --git a/episodes/fig/new_project_1.png b/episodes/fig/new_project_1.png
diff --git a/episodes/fig/new_project_2.png b/episodes/fig/new_project_2.png
diff --git a/episodes/fig/new_project_3.png b/episodes/fig/new_project_3.png
diff --git a/episodes/fig/ucloud_front.png b/episodes/fig/ucloud_front.png
diff --git a/episodes/fig/ucloud_picks.png b/episodes/fig/ucloud_picks.png
diff --git a/episodes/fig/ucloud_rstudio.png b/episodes/fig/ucloud_rstudio.png
diff --git a/episodes/fig/ucloud_store.png b/episodes/fig/ucloud_store.png
diff --git a/episodes/files/ucloud_tensor_keras.R b/episodes/files/ucloud_tensor_keras.R
@@ -0,0 +1,33 @@
+# Kører shell kommandoer fra R
+system2("sudo", args = c("add-apt-repository", "-y", "ppa:deadsnakes/ppa"))
+
+system2("sudo", args = c("apt-get", "update"))
+system2("sudo", args = c("apt-get", "install", "-y", "python3.9", "python3.9-venv", "python3.9-dev"))
+
+# Python setup
+system2("python3.9", args = c("-m", "ensurepip", "--upgrade"))
+system2("python3.9", args = c("-m", "pip", "install", "--upgrade", "pip"))
+system2("python3.9", args = c("-m", "pip", "install", "numpy"))
+
+# Opret og aktiver virtual environment
+system2("python3.9", args = c("-m", "venv", "~/r-tensorflow"))
+
+# For at aktivere virtual environment og installere packages
+system2("bash", args = c("-c", "source ~/r-tensorflow/bin/activate && pip install numpy tensorflow keras spacy && python -m spacy download en_core_web_sm && deactivate"))
+
+# R packages og setup
+Sys.unsetenv('RETICULATE_PYTHON')
+library(reticulate)
+use_virtualenv('~/r-tensorflow', required = TRUE)
+py_config()
+
+install.packages('remotes')
+remotes::install_github('rstudio/tensorflow', upgrade = 'always')
+library(tensorflow)
+install_tensorflow(envname = '~/r-tensorflow', restart_session = FALSE)
+
+remotes::install_github('rstudio/keras')
+library(keras3)
+install_keras(envname = '~/r-tensorflow')
+
+print("READY TO GO!")
diff --git a/episodes/model-control.Rmd b/episodes/model-control.Rmd
@@ -0,0 +1,38 @@
+---
+title: 'model-control'
+teaching: 10
+exercises: 2
+---
+
+:::::::::::::::::::::::::::::::::::::: questions 
+
+- How do you write a lesson using R Markdown and `{sandpaper}`?
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: objectives
+
+- Explain how to use markdown with the new lesson template
+- Demonstrate how to include pieces of code, figures, and nested challenge blocks
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+Det har du lært i undervisningen. Så måske du skulle have fulgt med der.
+
+Men, overordnet. Kontrol eller tjek af om modellen opfylder betingelserne for
+at man må bruge den. Det kunne være:
+
+Fordeling af residualer. Er de normalfordelte. Er der tydelige mønstre i residualplottet
+linearitetsantagelsen
+multikollinearitet
+outliers.
+
+::::::::::::::::::::::::::::::::::::: keypoints 
+
+- Use `.md` files for episodes when you want static content
+- Use `.Rmd` files for episodes when you need to generate output
+- Run `sandpaper::check_lesson()` to identify any issues with your lesson
+- Run `sandpaper::build_lesson()` to preview your lesson locally
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
diff --git a/episodes/pipes.Rmd b/episodes/pipes.Rmd
@@ -0,0 +1,48 @@
+---
+title: 'Different pipes'
+teaching: 10
+exercises: 2
+---
+
+::: questions
+-   What are the differences between the two pipes?
+:::
+
+::: objectives
+-   Explain how to use markdown with the new lesson template
+:::
+
+<https://stackoverflow.com/questions/67633022/what-are-the-differences-between-rs-native-pipe-and-the-magrittr-pipe>
+
+der er også et billede
+
+magrittr pipen blev introduceret november 2014.
+
+Vi bruger den massivt i vores undervisning, for den er super nyttig.
+
+18. maj 2021 kom R version 4.1. Og den havde en indbygget pipe, der gør næsten, men ikke fuldstændigt, det samme som magrittr pipen.
+
+KUB datalab fortsætter indtil videre med at bruge magrittr pipen. Det er tanken at vi fortsætter indtil den native pipe er default for tastaturgenvejen for indsættelse er pipe operatoren i Rstudio ( Ctrl+Shift+M Cmd+Shift+M).
+
+Man kan ændre pipeoperatoren i Rstudio til at være
+
+genvejene kan findes i Tools → Keyboard Shortcuts Help. alt-shift-k (Option+Shift+K på mac)
+
+Tools → Modify Keyboard Shortcuts.
+
+Vil man ændre den så er det Tools → Global Options (eller Project Options) → Code og hak af i "Use native pipe operator, \|\>"
+
+| Topic          | Magrittr 2.0.3                     | Base 4.3.0 |
+|----------------|------------------------------------|------------|
+| Operator       | `%>%` \``%<>%` `%$%` `%!>%` `%T>%` | `|>`       |
+| Function calls | `1:2 %>% sum()`                           |  `1:2 |> sum()`          |
+|                |       `1:2 %>% sum`                              | Needs the parentheses           |
+||`1:3 %>% `+` 4`| Not every function is supported|
+
+
+::: keypoints
+-   Use `.md` files for episodes when you want static content
+-   Use `.Rmd` files for episodes when you need to generate output
+-   Run `sandpaper::check_lesson()` to identify any issues with your lesson
+-   Run `sandpaper::build_lesson()` to preview your lesson locally
+:::
diff --git a/episodes/ucloud.Rmd b/episodes/ucloud.Rmd
@@ -0,0 +1,182 @@
+---
+title: 'R på ucloud'
+teaching: 10
+exercises: 2
+---
+
+:::::::::::::::::::::::::::::::::::::: questions 
+
+- How do you write a lesson using R Markdown and `{sandpaper}`?
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: objectives
+
+- Explain how to use markdown with the new lesson template
+- Demonstrate how to include pieces of code, figures, and nested challenge blocks
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+
+All students at UCPH have access to a High Performance Computing (HPC) facility;
+ucloud. It can be acessed at cloud.sdu.dk using the normal UCPH login and 
+password.
+
+
+Depending on the user allowances, it will look something like this.
+
+![](fig/ucloud_front.png)
+
+Ucloud provides access to a multitude of different applications organized in the
+application store:
+
+![](fig/ucloud_store.png)
+
+Amongst the more popular picks from the store is RStudio:
+
+
+![](fig/ucloud_picks.png)
+
+This allow us to start a session of RStudio, accessible in the browser. 
+
+
+![](fig/ucloud_rstudio.png)
+
+
+Note that we can chose different machine types. Parallel processing is not 
+the solution to every problem, but sometimes it is, and here we get access to a
+_lot_ of cores.
+
+
+## This is not a computer as such
+
+What we can start is not a computer. It is a virtual machine. If we need to
+save our results, we need to save them to files, on our user drive.
+
+If we do not save the results, or for that matter the scripts we write,
+all is lost when the virtual machine closes. Either because we close it,
+or because the time ran out.
+
+Another issue brought up by this, is that whenever we install a library,
+we install it in the virtual machine. Those installed libraries also
+dissappear when the machine closes.
+
+One way to get around that is to run the virtual machine indefinitely. That is
+expensive. Another is to have a prepared script we can every time we
+start a new session.
+
+Below you will find one such, rather complex, script setting up the 
+machine for running Keras (a module for advanced machine learning), using
+tensorflow. It can be downloaded here INDSÆT LINK.
+
+```{r eval = F}
+# Run shell commands from R
+system2("sudo", args = c("add-apt-repository", "-y", "ppa:deadsnakes/ppa"))
+system2("sudo", args = c("apt-get", "update"))
+system2("sudo", args = c("apt-get", "install", "-y", "python3.9", "python3.9-venv", "python3.9-dev"))
+
+# Python setup
+system2("python3.9", args = c("-m", "ensurepip", "--upgrade"))
+system2("python3.9", args = c("-m", "pip", "install", "--upgrade", "pip"))
+system2("python3.9", args = c("-m", "pip", "install", "numpy"))
+
+# Create and activate environment
+system2("python3.9", args = c("-m", "venv", "~/r-tensorflow"))
+
+# Activate virtual environment and install packages
+system2("bash", args = c("-c", "source ~/r-tensorflow/bin/activate && pip install numpy tensorflow keras spacy && python -m spacy download en_core_web_sm && deactivate"))
+
+# R packages and setup
+Sys.unsetenv('RETICULATE_PYTHON')
+library(reticulate)
+use_virtualenv('~/r-tensorflow', required = TRUE)
+
+install.packages('remotes')
+remotes::install_github('rstudio/tensorflow', upgrade = 'always')
+library(tensorflow)
+install_tensorflow(envname = '~/r-tensorflow', restart_session = FALSE)
+
+remotes::install_github('rstudio/keras')
+library(keras3)
+install_keras(envname = '~/r-tensorflow')
+```
+
+One downside to this, is that this takes quite some time, and will have to
+be repeated _every_ single time we start the virtual machine.
+
+## Another approach
+
+Data analysis is not worth much, if we are not able to reproduce our results.
+A significant amount of work have therefore gone into providing infrastructure
+for exactly that. One issue is the question of which libraries are used for
+the analysis.
+
+Enter `renv`. `renv` is a library that establishes scaffolding for installing
+libraries in a specific location in an R-project, making it self contained and
+easy to distribute. Normally we would distribute a "lock file" that describes 
+exactly which versions of which packages are used in a project. 
+
+And project is an important word here. This works best if we are working in an
+R-project. So begin by making a project.
+
+Give it a name, make sure it is saved somewhere easy to find in your files.
+And tick the box about renv!
+
+You will see a lot of stuff in the "files" tab. A folder called "renv", a file
+"renv.lock", and probably a file ".Rprofile".
+
+Looking into that, we will find a line of code "source("renv/activate.R")"
+
+When ever we start the project, what ever we have written to 
+.Rprofile will be run. What will be run in this case is the script "activate.R"
+which does a lot of interesting stuff. The important thing is, that 
+the renv-library is started. And whenever we now install a package, it 
+will be installed in the renv folder. Do not delve too deep into that, leave it
+to the machine.
+
+One issue with this is, that there are still installed packages weird places 
+on the machine. Caches of the packages are stored outside our project. The idea
+is that other projects might use these cached packages, and cut down on install
+time. 
+
+In our case, that is not helpful. This cache will disappear when the virtual
+machine is stopped.
+
+In order to handle this, we can specify where the cache should be stored.
+We can do that manually. Or, and this is the preffered solution, make a file
+.Renviron where we specify where renv should place the cache. Having done that
+we need to restart R, and now we can install packages to our hearts delight,
+and renv will place both the libraries and the cache in our local project.
+
+An example of a script that sets the environemt file, and installs a selection
+of usefull pacakges can be found below. Note that this takes a very long time.
+The alternative to this taking a very long time once, is for it to take a very
+long time every time we open our project. 
+
+```{r eval =F}
+# write environment variable to file
+writeLines('RENV_PATHS_CACHE = "renv/cache"', ".Renviron")
+# restart R in order to set the environment variable
+.rs.restart()
+library(renv)
+# install useful packages
+# install.packages() works just as well as install() which comes from
+# the renv package. But it is slightly shorter to type.
+install("tidyverse")
+install("reticulate")
+install("devtools")
+```
+
+Note that this will need to be done for every project you initialize. 
+Also note - this takes a looong time...
+
+
+::::::::::::::::::::::::::::::::::::: keypoints 
+
+- Use `.md` files for episodes when you want static content
+- Use `.Rmd` files for episodes when you need to generate output
+- Run `sandpaper::check_lesson()` to identify any issues with your lesson
+- Run `sandpaper::build_lesson()` to preview your lesson locally
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
diff --git a/learners/CLT-en.md b/learners/CLT-en.md
@@ -1,8 +1,18 @@
 ---
-title: CLT - på dansk
+title: CLT - in english
 ---
 
-# Et bevis for Central Limit Teoremet.
+# A proof of the Central Limit Theorem
+
+In general: CLT explains why many distributions of data, approximates a normal
+distribution (bell curve), as samplesize increases, independent of the shape
+of the underlying distrubution.
+
+The theorem states, at the distribution of the standardized average of a 
+sample approaches the standard normal distribution.
+
+Or, less precise, that for large samples their average is more or less normally
+distributed around the true average of the population.
 
 Overordnet: CLT forklarer hvorfor mange fordelinger af data, tenderer en 
 normalfordeling (klokkekurven), når stikprøvestørrelsen bliver stor, uanset den
Original file line number	Diff line number	Diff line change
Expand Up		@@ -80,6 +80,9 @@ challenge og solution gør vi også.

		:::: solution
		## Den optræder som regel med denne solution fence



		::::


Expand Down