From 23665db9da9c9f39a855b317c432e07082b2f5ef Mon Sep 17 00:00:00 2001 From: Christian Knudsen Date: Wed, 6 Dec 2023 11:48:53 +0100 Subject: [PATCH 1/2] questions, keypoints, objectives fixet --- episodes/03-dplyr-tidyr.Rmd | 51 ++++++++++++++++++++------------- episodes/04-functions-plots.Rmd | 44 ++++++++++++++++------------ episodes/05-whats-next.Rmd | 37 +++++++++++++----------- 3 files changed, 77 insertions(+), 55 deletions(-) diff --git a/episodes/03-dplyr-tidyr.Rmd b/episodes/03-dplyr-tidyr.Rmd index db94aef..e1d76ce 100644 --- a/episodes/03-dplyr-tidyr.Rmd +++ b/episodes/03-dplyr-tidyr.Rmd @@ -1,15 +1,23 @@ --- title: "Data Wrangling with dplyr and tidyr" -keypoints: -- Use the `dplyr` package to manipulate dataframes. -- Use `select()` to choose variables from a dataframe. -- Use `filter()` to choose data based on values. -- Use `group_by()` and `summarize()` to work with subsets of data. -- Use `mutate()` to create new variables. -- Use the `tidyr` package to change the layout of dataframes. -- Use `pivot_wider()` to go from long to wide format. -- Use `pivot_longer()` to go from wide to long format. -objectives: + +teaching: 20 +exercises: 10 + +--- + +:::: questions + +- How can I select specific rows and/or columns from a dataframe? +- How can I combine multiple commands into a single command? +- How can I create new columns or remove existing columns from a dataframe? +- How can I reformat a dataframe to meet my needs? + +:::: + + +:::: objectives + - Describe the purpose of an R package and the **`dplyr`** and **`tidyr`** packages. - Select certain columns in a dataframe with the **`dplyr`** function `select`. - Select certain rows in a dataframe according to filtering conditions with the **`dplyr`** @@ -27,16 +35,8 @@ objectives: - Reshape a dataframe from long to wide format and back with the `pivot_wider` and `pivot_longer` commands from the **`tidyr`** package. - Export a dataframe to a csv file. -questions: -- How can I select specific rows and/or columns from a dataframe? -- How can I combine multiple commands into a single command? -- How can I create new columns or remove existing columns from a dataframe? -- How can I reformat a dataframe to meet my needs? -teaching: 20 -exercises: 10 -source: Rmd ---- +:::: ```{r, include = FALSE} library(dplyr) @@ -476,4 +476,15 @@ if (!dir.exists("../data_output")) dir.create("../data_output") write_csv(interviews, "../data_output/interviews_plotting.csv") ``` -{% include links.md %} +:::: keypoints + +- Use the `dplyr` package to manipulate dataframes. +- Use `select()` to choose variables from a dataframe. +- Use `filter()` to choose data based on values. +- Use `group_by()` and `summarize()` to work with subsets of data. +- Use `mutate()` to create new variables. +- Use the `tidyr` package to change the layout of dataframes. +- Use `pivot_wider()` to go from long to wide format. +- Use `pivot_longer()` to go from wide to long format. + +:::: diff --git a/episodes/04-functions-plots.Rmd b/episodes/04-functions-plots.Rmd index 8be5b27..fd42da6 100644 --- a/episodes/04-functions-plots.Rmd +++ b/episodes/04-functions-plots.Rmd @@ -2,27 +2,24 @@ title: "A couple of plots. And making our own functions" teaching: 80 exercises: 35 -questions: - - "How do I create scatterplots, boxplots, and barplots?" - - "How can I define my own functions?" - -objectives: - - "Produce scatter plots and boxplots using Base R." - - "Write your own function" - - "Write loops to repeat calculations" - - "Use logical tests in loops" - -keypoints: - - "Boxplots are useful for visualizing the distribution of a continuous variable." - - "Barplots are useful for visualizing categorical data." - - "Functions allows you to repeat the same set of operations again and again." - - "Loops allows you to apply the same function to lots of data." - - "Logical tests allow you to apply different calculations on different sets of data." - -source: Rmd --- +:::: questions: +- "How do I create scatterplots, boxplots, and barplots?" +- "How can I define my own functions?" + +:::: + + +:::: objectives: + +- "Produce scatter plots and boxplots using Base R." +- "Write your own function" +- "Write loops to repeat calculations" +- "Use logical tests in loops" + +:::: We start by loading the **`tidyverse`** package. @@ -317,4 +314,13 @@ interviews_plotting %>% It looks different, and we get a warning about `binwidth`. geom_histogram automatically chooses 30 bins for us, and that is normally not the right number. -{% include links.md %} +:::: keypoints + +- "Boxplots are useful for visualizing the distribution of a continuous variable." +- "Barplots are useful for visualizing categorical data." +- "Functions allows you to repeat the same set of operations again and again." +- "Loops allows you to apply the same function to lots of data." +- "Logical tests allow you to apply different calculations on different sets of data." + +:::: + diff --git a/episodes/05-whats-next.Rmd b/episodes/05-whats-next.Rmd index ea82d3d..d6a404f 100644 --- a/episodes/05-whats-next.Rmd +++ b/episodes/05-whats-next.Rmd @@ -2,24 +2,22 @@ title: "What is the next step?" teaching: 10 exercises: 0 -questions: - - "What do I do now?" - - "What is the next step?" - -objectives: - - "Present suggestions for further reading," - - "Tips on problems to work on to practice," - -keypoints: - - "Practice is important!" - - "Working on data that YOU find interesting is a really good idea," - - "The amount of ressources online is immense." - - "KUB Datalab is there for your." - -source: Rmd --- +:::: questions + +- "What do I do now?" +- "What is the next step?" + +:::: + +:::: objectives + +- "Present suggestions for further reading," +- "Tips on problems to work on to practice," + +:::: ## Great sites @@ -57,4 +55,11 @@ Our mail: kubdatalab@kb.dk -{% include links.md %} +:::: keypoints: + - "Practice is important!" + - "Working on data that YOU find interesting is a really good idea," + - "The amount of ressources online is immense." + - "KUB Datalab is there for your." + +:::: + From c5622404c955d66b8bc603f31470632e3ff379a9 Mon Sep 17 00:00:00 2001 From: Christian Knudsen Date: Wed, 6 Dec 2023 11:55:35 +0100 Subject: [PATCH 2/2] challenge eksperiment --- episodes/03-dplyr-tidyr.Rmd | 89 +++++++++++++++++++++---------------- 1 file changed, 51 insertions(+), 38 deletions(-) diff --git a/episodes/03-dplyr-tidyr.Rmd b/episodes/03-dplyr-tidyr.Rmd index e1d76ce..bb93a8b 100644 --- a/episodes/03-dplyr-tidyr.Rmd +++ b/episodes/03-dplyr-tidyr.Rmd @@ -236,22 +236,28 @@ interviews_ch Note that the final dataframe (`interviews_ch`) is the leftmost part of this expression. -> ## Exercise -> -> Using pipes, subset the `interviews` data to include interviews -> where respondents were members of an irrigation association -> (`memb_assoc`) and retain only the columns `affect_conflicts`, -> `liv_count`, and `no_meals`. -> -> > ## Solution -> > -> > ```{r} -> > interviews %>% -> > filter(memb_assoc == "yes") %>% -> > select(affect_conflicts, liv_count, no_meals) -> > ``` -> {: .solution} -{: .challenge} + +:::: challenge + + +## Exercise + + Using pipes, subset the `interviews` data to include interviews + where respondents were members of an irrigation association + (`memb_assoc`) and retain only the columns `affect_conflicts`, + `liv_count`, and `no_meals`. + +:::: solution + ## Solution + + ```{r} + interviews %>% + filter(memb_assoc == "yes") %>% + select(affect_conflicts, liv_count, no_meals) + ``` + +:::: + ### Mutate @@ -270,29 +276,32 @@ interviews %>% +:::: challenge -> ## Exercise -> -> Create a new dataframe from the `interviews` data that meets the following -> criteria: contains only the `village` column and a new column called -> `total_meals` containing a value that is equal to the total number of meals -> served in the household per day on average (`no_membrs` times `no_meals`). -> Only the rows where `total_meals` is greater than 20 should be shown in the -> final dataframe. -> -> **Hint**: think about how the commands should be ordered to produce this data -> frame! -> -> > ## Solution -> > -> > ``` {r} -> > interviews_total_meals <- interviews %>% -> > mutate(total_meals = no_membrs * no_meals) %>% -> > filter(total_meals > 20) %>% -> > select(village, total_meals) -> > ``` -> {: .solution} -{: .challenge} +## Exercise + +Create a new dataframe from the `interviews` data that meets the following +criteria: contains only the `village` column and a new column called +`total_meals` containing a value that is equal to the total number of meals +served in the household per day on average (`no_membrs` times `no_meals`). +Only the rows where `total_meals` is greater than 20 should be shown in the +final dataframe. + +**Hint**: think about how the commands should be ordered to produce this data +frame! + +:::: solution + +## Solution + + ``` {r} + interviews_total_meals <- interviews %>% + mutate(total_meals = no_membrs * no_meals) %>% + filter(total_meals > 20) %>% + select(village, total_meals) + ``` + +:::: ### Split-apply-combine data analysis and the summarize() function @@ -395,6 +404,10 @@ interviews %>% count(village, sort = TRUE) ``` + + + + > ## Exercise > > How many households in the survey have an average of