Compilation révision abf9705

calculquebec · Jan 24, 2025 · 2228844 · 2228844
1 parent abf9705
commit 2228844
Show file tree

Hide file tree

Showing 20 changed files with 308 additions and 150 deletions.
diff --git a/en/01-dataframe.ipynb b/en/01-dataframe.ipynb
@@ -171,7 +171,8 @@
    "source": [
     "### Exercise - DataFrame and Series\n",
     "\n",
-    "1. Load the table of species from the `species.csv` file and assign the result to `species_df`."
+    "1. Load the table of species from the `species.csv`\n",
+    "   file and assign the result to `species_df`. (3 min.)"
    ]
   },
   {
@@ -194,7 +195,7 @@
    },
    "source": [
     "2. Use the `.unique()` function to get\n",
-    "   the list of all different `taxa`."
+    "   the list of all different `taxa`. (2 min.)"
    ]
   },
   {
@@ -209,7 +210,7 @@
    },
    "outputs": [],
    "source": [
-    "species_df"
+    "species_df###"
    ]
   },
   {
@@ -220,8 +221,7 @@
    },
    "source": [
     "3. Use the `.nunique()` function to\n",
-    "   get the number of different taxa.\n",
-    "   Note: `nan` will not be accounted."
+    "   get the number of different taxa. (1 min.)"
    ]
   },
   {
@@ -236,7 +236,7 @@
    },
    "outputs": [],
    "source": [
-    "species_df"
+    "species_df###"
    ]
   },
   {
@@ -312,7 +312,9 @@
    },
    "source": [
     "### Exercise - Grouping\n",
-    "`1`. How many recorded individuals are female `F`, and how many male `M`?"
+    "`1`. How many recorded individuals are female `F`,\n",
+    "and how many male `M`? HINT: it is possible to select\n",
+    "a column once the data has been grouped. (2 min.)"
    ]
   },
   {
@@ -327,7 +329,7 @@
    },
    "outputs": [],
    "source": [
-    "by_sex"
+    "by_sex###"
    ]
   },
   {
@@ -337,7 +339,8 @@
     "lang": "en"
    },
    "source": [
-    "`2`. What happens when you group by two columns using the following syntax and then grab mean values:"
+    "`2`. What happens when you group by two columns using\n",
+    "the following syntax and then grab mean values? (2 min.)"
    ]
   },
   {
@@ -352,7 +355,7 @@
    },
    "outputs": [],
    "source": [
-    "surveys_df.groupby(['plot_id', 'sex'])"
+    "surveys_df.groupby(['plot_id', 'sex'])###"
    ]
   },
   {
@@ -362,7 +365,8 @@
     "lang": "en"
    },
    "source": [
-    "`3`. Summarize `weight` values for each site (`plot_id`) in your data. HINT: it is possible to select a column once the data has been grouped."
+    "`3`. Summarize `weight` values for each\n",
+    "site (`plot_id`) in your data. (3 min.)"
    ]
   },
   {
@@ -377,7 +381,7 @@
    },
    "outputs": [],
    "source": [
-    "surveys_df"
+    "surveys_df###"
    ]
   },
   {
@@ -410,7 +414,9 @@
    },
    "source": [
     "### Exercise - Plotting Challenge\n",
-    "Create a `line` plot of the median `weight` per month."
+    "Create a `line` plot of\n",
+    "[the median](https://pandas.pydata.org/docs/reference/api/pandas.Series.median.html)\n",
+    "`weight` per month. (3 min.)"
    ]
   },
   {
@@ -425,7 +431,7 @@
    },
    "outputs": [],
    "source": [
-    "surveys_df"
+    "surveys_df###"
    ]
   },
   {

diff --git a/en/02-selection.ipynb b/en/02-selection.ipynb
@@ -271,7 +271,7 @@
     "```\n",
     "Use the `isin()` method to find all different\n",
     "sites (`plot_id`) that contain particular species\n",
-    "(`AS`, `CQ`, `OX` and `UL`) in the surveys DataFrame."
+    "(`AS`, `CQ`, `OX` and `UL`) in the surveys DataFrame. (4 min.)"
    ]
   },
   {
@@ -287,10 +287,10 @@
    "outputs": [],
    "source": [
     "# Boolean mask of valid species IDs\n",
-    "species_mask = surveys_df['species_id']\n",
+    "species_mask = ###(['AS', 'CQ', 'OX', 'UL'])\n",
     "\n",
     "# List all different sites\n",
-    "surveys_df[species_mask]['plot_id']"
+    "surveys_df[###][###].unique()"
    ]
   },
   {
@@ -305,7 +305,9 @@
     "* Create a new DataFrame that contains only observations that are\n",
     "  of sex female or male and where weight values are greater than 0\n",
     "* For the final plot, only select the\n",
-    "  weight, the site and the sex columns"
+    "  weight, the site and the sex columns\n",
+    "\n",
+    "(5 min.)"
    ]
   },
   {
@@ -321,11 +323,11 @@
    "outputs": [],
    "source": [
     "# Selection of the data with isin()\n",
-    "sex_mask = surveys_df['sex']\n",
-    "weight_mask = surveys_df['weight']\n",
+    "sex_mask = surveys_df['sex']###\n",
+    "weight_mask = surveys_df['weight'] ###\n",
     "columns = ['weight', 'plot_id', 'sex']\n",
     "\n",
-    "selection = surveys_df\n",
+    "selection = surveys_df###\n",
     "selection.tail()"
    ]
   },
@@ -342,7 +344,7 @@
    "outputs": [],
    "source": [
     "# Calculate the mean weight for each plot_id and sex combination: \n",
-    "avg_by_site_sex = selection\n",
+    "avg_by_site_sex = selection###\n",
     "avg_by_site_sex.head()"
    ]
   },

diff --git a/en/03-format.ipynb b/en/03-format.ipynb
@@ -377,7 +377,9 @@
     "In the `sex` column of `copy_surveys_df`:\n",
     "* Replace undefined values by `'F|M'`\n",
     "* Any value not equal to `'F'`, `'M'` or `'F|M'` is\n",
-    "  considered invalid and must be replaced by `'F|M'`"
+    "  considered invalid and must be replaced by `'F|M'`\n",
+    "\n",
+    "(5 min.)"
    ]
   },
   {

diff --git a/en/04-combine.ipynb b/en/04-combine.ipynb
@@ -161,9 +161,8 @@
     "* In `surveys_df`, select rows where the year is 2001.\n",
     "  Do the same for year 2002.\n",
     "* Concatenate both dataframes.\n",
-    "* Compute the average weight by sex for each year.\n",
-    "* Export your results as a CSV and make\n",
-    "  sure it reads back into python properly."
+    "\n",
+    "(3 min.)"
    ]
   },
   {
@@ -186,6 +185,16 @@
     "survey_all = ###"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "4eacaca5-19e9-48c4-bbe7-74e28323d13a",
+   "metadata": {
+    "lang": "en"
+   },
+   "source": [
+    "* Compute the average weight by sex for each year. (1 min.)"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -204,6 +213,17 @@
     "weight_year"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "96877605-e612-45fb-8203-438b2b72a9c1",
+   "metadata": {
+    "lang": "en"
+   },
+   "source": [
+    "* Export your results as a CSV and make sure\n",
+    "  it reads back into python properly. (2 min.)"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -399,7 +419,8 @@
    "source": [
     "## Exercise - Joining all data\n",
     "`1`. Create a new DataFrame by joining the contents of the\n",
-    "`surveys.csv` and `species.csv` tables. Keep all survey records."
+    "`surveys.csv` and `species.csv` tables. Keep all survey records.\n",
+    "(3 min.)"
    ]
   },
   {
@@ -429,8 +450,8 @@
     "lang": "en"
    },
    "source": [
-    "`2`. Calculate and plot the distribution of surveys\n",
-    "(i.e. the number of `record_id`) by `taxa` for each `plot_id`."
+    "`2`. Calculate and plot the distribution of surveys (i.e. the\n",
+    "number of `record_id`) by `taxa` for each `plot_id`. (3 min.)"
    ]
   },
   {
@@ -470,7 +491,7 @@
    },
    "source": [
     "`3`. Calculate and plot the distribution\n",
-    "of `taxa` by `sex` for each `plot_id`."
+    "of `taxa` by `sex` for each `plot_id`. (2 min.)"
    ]
   },
   {

diff --git a/en/05-altair.ipynb b/en/05-altair.ipynb
@@ -297,7 +297,9 @@
     "* For the X axis, specify the `'plot_id'` field and the\n",
     "  [`'ordinal'` type](https://altair-viz.github.io/user_guide/encodings/#encoding-data-types)\n",
     "* For the Y axis, specify `'count()'` as a temporary field computed\n",
-    "  automatically by Altair, which saves us from using `groupby()`"
+    "  automatically by Altair, which saves us from using `groupby()`\n",
+    "\n",
+    "(4 min.)"
    ]
   },
   {
@@ -490,7 +492,9 @@
     "  and `'M'` to colors `'orange'` and `'green'`, respectively.\n",
     "  See [an example here](https://altair-viz.github.io/user_guide/customization.html#color-domain-and-range)\n",
     "* Activate the `tooltip` channel with\n",
-    "  `'count()'` in order to get the count by sex"
+    "  `'count()'` in order to get the count by sex\n",
+    "\n",
+    "(4 min.)"
    ]
   },
   {
@@ -579,7 +583,7 @@
    "source": [
     "### Exercise - Plotting time series data\n",
     "`1`. Use the `pd.to_datetime()` function to generate a new\n",
-    "`date` column from the columns `year`, `month` and `day`."
+    "`date` column from the columns `year`, `month` and `day`. (3 min.)"
    ]
   },
   {
@@ -609,7 +613,8 @@
     "lang": "en"
    },
    "source": [
-    "`2`. Visualize the median weight of each species by the `date`."
+    "`2`. Visualize the median weight of each species by the `date`.\n",
+    "(3 min.)"
    ]
   },
   {
@@ -755,7 +760,9 @@
     "* Each facet will have:\n",
     "  * Years on the X axis\n",
     "  * The average weight on the Y axis\n",
-    "  * One colored line per species"
+    "  * One colored line per species\n",
+    "\n",
+    "(5 min.)"
    ]
   },
   {
@@ -856,7 +863,7 @@
     "full species names on the X axis of a boxplot.\n",
     "\n",
     "`1`. Compute the left-join of `surveys_complete`\n",
-    "and all the species details in `species.csv`."
+    "and all the species details in `species.csv`. (3 min.)"
    ]
   },
   {
@@ -892,7 +899,9 @@
     "* The noisy weights on the Y axis, with a logarithmic\n",
     "  scale in base 2 and with the label \"Weight (g)\"\n",
     "* One color for each species identifier\n",
-    "* A title for the graphic"
+    "* A title for the graphic\n",
+    "\n",
+    "(6 min.)"
    ]
   },
   {