Skip to content

Commit

Permalink
Compilation révision abf9705
Browse files Browse the repository at this point in the history
  • Loading branch information
plstonge committed Jan 24, 2025
1 parent abf9705 commit 2228844
Show file tree
Hide file tree
Showing 20 changed files with 308 additions and 150 deletions.
34 changes: 20 additions & 14 deletions en/01-dataframe.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,8 @@
"source": [
"### Exercise - DataFrame and Series\n",
"\n",
"1. Load the table of species from the `species.csv` file and assign the result to `species_df`."
"1. Load the table of species from the `species.csv`\n",
" file and assign the result to `species_df`. (3 min.)"
]
},
{
Expand All @@ -194,7 +195,7 @@
},
"source": [
"2. Use the `.unique()` function to get\n",
" the list of all different `taxa`."
" the list of all different `taxa`. (2 min.)"
]
},
{
Expand All @@ -209,7 +210,7 @@
},
"outputs": [],
"source": [
"species_df"
"species_df###"
]
},
{
Expand All @@ -220,8 +221,7 @@
},
"source": [
"3. Use the `.nunique()` function to\n",
" get the number of different taxa.\n",
" Note: `nan` will not be accounted."
" get the number of different taxa. (1 min.)"
]
},
{
Expand All @@ -236,7 +236,7 @@
},
"outputs": [],
"source": [
"species_df"
"species_df###"
]
},
{
Expand Down Expand Up @@ -312,7 +312,9 @@
},
"source": [
"### Exercise - Grouping\n",
"`1`. How many recorded individuals are female `F`, and how many male `M`?"
"`1`. How many recorded individuals are female `F`,\n",
"and how many male `M`? HINT: it is possible to select\n",
"a column once the data has been grouped. (2 min.)"
]
},
{
Expand All @@ -327,7 +329,7 @@
},
"outputs": [],
"source": [
"by_sex"
"by_sex###"
]
},
{
Expand All @@ -337,7 +339,8 @@
"lang": "en"
},
"source": [
"`2`. What happens when you group by two columns using the following syntax and then grab mean values:"
"`2`. What happens when you group by two columns using\n",
"the following syntax and then grab mean values? (2 min.)"
]
},
{
Expand All @@ -352,7 +355,7 @@
},
"outputs": [],
"source": [
"surveys_df.groupby(['plot_id', 'sex'])"
"surveys_df.groupby(['plot_id', 'sex'])###"
]
},
{
Expand All @@ -362,7 +365,8 @@
"lang": "en"
},
"source": [
"`3`. Summarize `weight` values for each site (`plot_id`) in your data. HINT: it is possible to select a column once the data has been grouped."
"`3`. Summarize `weight` values for each\n",
"site (`plot_id`) in your data. (3 min.)"
]
},
{
Expand All @@ -377,7 +381,7 @@
},
"outputs": [],
"source": [
"surveys_df"
"surveys_df###"
]
},
{
Expand Down Expand Up @@ -410,7 +414,9 @@
},
"source": [
"### Exercise - Plotting Challenge\n",
"Create a `line` plot of the median `weight` per month."
"Create a `line` plot of\n",
"[the median](https://pandas.pydata.org/docs/reference/api/pandas.Series.median.html)\n",
"`weight` per month. (3 min.)"
]
},
{
Expand All @@ -425,7 +431,7 @@
},
"outputs": [],
"source": [
"surveys_df"
"surveys_df###"
]
},
{
Expand Down
18 changes: 10 additions & 8 deletions en/02-selection.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -271,7 +271,7 @@
"```\n",
"Use the `isin()` method to find all different\n",
"sites (`plot_id`) that contain particular species\n",
"(`AS`, `CQ`, `OX` and `UL`) in the surveys DataFrame."
"(`AS`, `CQ`, `OX` and `UL`) in the surveys DataFrame. (4 min.)"
]
},
{
Expand All @@ -287,10 +287,10 @@
"outputs": [],
"source": [
"# Boolean mask of valid species IDs\n",
"species_mask = surveys_df['species_id']\n",
"species_mask = ###(['AS', 'CQ', 'OX', 'UL'])\n",
"\n",
"# List all different sites\n",
"surveys_df[species_mask]['plot_id']"
"surveys_df[###][###].unique()"
]
},
{
Expand All @@ -305,7 +305,9 @@
"* Create a new DataFrame that contains only observations that are\n",
" of sex female or male and where weight values are greater than 0\n",
"* For the final plot, only select the\n",
" weight, the site and the sex columns"
" weight, the site and the sex columns\n",
"\n",
"(5 min.)"
]
},
{
Expand All @@ -321,11 +323,11 @@
"outputs": [],
"source": [
"# Selection of the data with isin()\n",
"sex_mask = surveys_df['sex']\n",
"weight_mask = surveys_df['weight']\n",
"sex_mask = surveys_df['sex']###\n",
"weight_mask = surveys_df['weight'] ###\n",
"columns = ['weight', 'plot_id', 'sex']\n",
"\n",
"selection = surveys_df\n",
"selection = surveys_df###\n",
"selection.tail()"
]
},
Expand All @@ -342,7 +344,7 @@
"outputs": [],
"source": [
"# Calculate the mean weight for each plot_id and sex combination: \n",
"avg_by_site_sex = selection\n",
"avg_by_site_sex = selection###\n",
"avg_by_site_sex.head()"
]
},
Expand Down
4 changes: 3 additions & 1 deletion en/03-format.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -377,7 +377,9 @@
"In the `sex` column of `copy_surveys_df`:\n",
"* Replace undefined values by `'F|M'`\n",
"* Any value not equal to `'F'`, `'M'` or `'F|M'` is\n",
" considered invalid and must be replaced by `'F|M'`"
" considered invalid and must be replaced by `'F|M'`\n",
"\n",
"(5 min.)"
]
},
{
Expand Down
35 changes: 28 additions & 7 deletions en/04-combine.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -161,9 +161,8 @@
"* In `surveys_df`, select rows where the year is 2001.\n",
" Do the same for year 2002.\n",
"* Concatenate both dataframes.\n",
"* Compute the average weight by sex for each year.\n",
"* Export your results as a CSV and make\n",
" sure it reads back into python properly."
"\n",
"(3 min.)"
]
},
{
Expand All @@ -186,6 +185,16 @@
"survey_all = ###"
]
},
{
"cell_type": "markdown",
"id": "4eacaca5-19e9-48c4-bbe7-74e28323d13a",
"metadata": {
"lang": "en"
},
"source": [
"* Compute the average weight by sex for each year. (1 min.)"
]
},
{
"cell_type": "code",
"execution_count": null,
Expand All @@ -204,6 +213,17 @@
"weight_year"
]
},
{
"cell_type": "markdown",
"id": "96877605-e612-45fb-8203-438b2b72a9c1",
"metadata": {
"lang": "en"
},
"source": [
"* Export your results as a CSV and make sure\n",
" it reads back into python properly. (2 min.)"
]
},
{
"cell_type": "code",
"execution_count": null,
Expand Down Expand Up @@ -399,7 +419,8 @@
"source": [
"## Exercise - Joining all data\n",
"`1`. Create a new DataFrame by joining the contents of the\n",
"`surveys.csv` and `species.csv` tables. Keep all survey records."
"`surveys.csv` and `species.csv` tables. Keep all survey records.\n",
"(3 min.)"
]
},
{
Expand Down Expand Up @@ -429,8 +450,8 @@
"lang": "en"
},
"source": [
"`2`. Calculate and plot the distribution of surveys\n",
"(i.e. the number of `record_id`) by `taxa` for each `plot_id`."
"`2`. Calculate and plot the distribution of surveys (i.e. the\n",
"number of `record_id`) by `taxa` for each `plot_id`. (3 min.)"
]
},
{
Expand Down Expand Up @@ -470,7 +491,7 @@
},
"source": [
"`3`. Calculate and plot the distribution\n",
"of `taxa` by `sex` for each `plot_id`."
"of `taxa` by `sex` for each `plot_id`. (2 min.)"
]
},
{
Expand Down
23 changes: 16 additions & 7 deletions en/05-altair.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -297,7 +297,9 @@
"* For the X axis, specify the `'plot_id'` field and the\n",
" [`'ordinal'` type](https://altair-viz.github.io/user_guide/encodings/#encoding-data-types)\n",
"* For the Y axis, specify `'count()'` as a temporary field computed\n",
" automatically by Altair, which saves us from using `groupby()`"
" automatically by Altair, which saves us from using `groupby()`\n",
"\n",
"(4 min.)"
]
},
{
Expand Down Expand Up @@ -490,7 +492,9 @@
" and `'M'` to colors `'orange'` and `'green'`, respectively.\n",
" See [an example here](https://altair-viz.github.io/user_guide/customization.html#color-domain-and-range)\n",
"* Activate the `tooltip` channel with\n",
" `'count()'` in order to get the count by sex"
" `'count()'` in order to get the count by sex\n",
"\n",
"(4 min.)"
]
},
{
Expand Down Expand Up @@ -579,7 +583,7 @@
"source": [
"### Exercise - Plotting time series data\n",
"`1`. Use the `pd.to_datetime()` function to generate a new\n",
"`date` column from the columns `year`, `month` and `day`."
"`date` column from the columns `year`, `month` and `day`. (3 min.)"
]
},
{
Expand Down Expand Up @@ -609,7 +613,8 @@
"lang": "en"
},
"source": [
"`2`. Visualize the median weight of each species by the `date`."
"`2`. Visualize the median weight of each species by the `date`.\n",
"(3 min.)"
]
},
{
Expand Down Expand Up @@ -755,7 +760,9 @@
"* Each facet will have:\n",
" * Years on the X axis\n",
" * The average weight on the Y axis\n",
" * One colored line per species"
" * One colored line per species\n",
"\n",
"(5 min.)"
]
},
{
Expand Down Expand Up @@ -856,7 +863,7 @@
"full species names on the X axis of a boxplot.\n",
"\n",
"`1`. Compute the left-join of `surveys_complete`\n",
"and all the species details in `species.csv`."
"and all the species details in `species.csv`. (3 min.)"
]
},
{
Expand Down Expand Up @@ -892,7 +899,9 @@
"* The noisy weights on the Y axis, with a logarithmic\n",
" scale in base 2 and with the label \"Weight (g)\"\n",
"* One color for each species identifier\n",
"* A title for the graphic"
"* A title for the graphic\n",
"\n",
"(6 min.)"
]
},
{
Expand Down
Loading

0 comments on commit 2228844

Please sign in to comment.