From 286748ca959113e5f56e308bde08c947e40790a4 Mon Sep 17 00:00:00 2001 From: Allen Downey Date: Wed, 13 Mar 2024 15:01:39 -0400 Subject: [PATCH] Updating notebooks --- 01_variables.ipynb | 150 ++++++++++++++---------- jupyter_intro.ipynb | 278 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 365 insertions(+), 63 deletions(-) create mode 100644 jupyter_intro.ipynb diff --git a/01_variables.ipynb b/01_variables.ipynb index 014229f..14d1fc4 100644 --- a/01_variables.ipynb +++ b/01_variables.ipynb @@ -1,5 +1,22 @@ { "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Welcome\n", + "\n", + "This is the Jupyter notebook for Chapter 1 of [*Elements of Data Science*](https://greenteapress.com/wp/elements-of-data-science), by Allen B. Downey.\n", + "\n", + "\n", + "% TODO: Update these links\n", + "\n", + "If you are not familiar with Jupyter notebooks,\n", + "[click here for a short introduction](https://colab.research.google.com/github/AllenDowney/ElementsOfDataScience/blob/master/jupyter_intro.ipynb).\n", + "\n", + "Then, if you are not already running this notebook on Colab, [click here to run this notebook on Colab](https://colab.research.google.com/github/AllenDowney/ElementsOfDataScience/blob/master/01_variables.ipynb)." + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -14,7 +31,6 @@ }, "source": [ "[Run this notebook on Colab](https://colab.research.google.com/github/AllenDowney/ElementsOfDataScience/blob/master/01_variables.ipynb) or \n", - "[Run this notebook on Sagemaker Studio Lab](https://studiolab.sagemaker.aws/import/github/AllenDowney/ElementsOfDataScience/blob/master/01_variables.ipynb) or \n", "[Download this notebook](https://github.com/AllenDowney/ElementsOfDataScience/raw/master/01_variables.ipynb)." ] }, @@ -22,7 +38,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Data science is the use of data to answers questions and guide decision making.\n", + "Data science is the use of data to answer questions and guide decision making.\n", "For example, a topic of current debate is whether we should raise the minimum wage in the United States.\n", "Some economists think that raising the minimum wage would raise families out of poverty; others think it would cause more unemployment.\n", "But economic theory can only take us so far.\n", @@ -49,13 +65,11 @@ "source": [ "The goal of this book is to give you the tools you need to execute a data science project from beginning to end, including these steps:\n", "\n", - "* Choosing questions, data, and methods that go together.\n", - "\n", - "* Finding data or collecting it yourself.\n", + "* Finding questions, data, and methods that go together.\n", "\n", "* Cleaning and validating data.\n", "\n", - "* Exploring datasets, visualizing distributions and relationships between variables.\n", + "* Exploring datasets by visualizing distributions and relationships between variables.\n", "\n", "* Modeling data and generating predictions.\n", "\n", @@ -83,13 +97,11 @@ "source": [ "The topics in this chapter are:\n", "\n", - "* Using Jupyter to write and run Python code.\n", - "\n", "* Basic programming features in Python: variables and values.\n", "\n", "* Translating formulas from math notation to Python.\n", "\n", - "Along the way, we'll review a couple of math topics I assume you have seen before, logarithms and algebra." + "You don't need a lot of math to do data science, but and the end of this chapter I'll review one topic that comes up a lot: logarithms." ] }, { @@ -111,17 +123,17 @@ "tags": [] }, "source": [ - "If you are running this notebook on Colab, you should see buttons in the top left that say \"+ Code\" and \"+ Text\". The first one adds code cell and the second adds a text cell.\n", + "If you are running this notebook on Colab, you should see buttons in the top left that say \"+ Code\" and \"+ Text\". The first one adds a code cell and the second adds a text cell.\n", "\n", "If you want to try them out, select this cell by clicking on it, then press the \"+ Text\" button. A new cell should appear below this one.\n", "\n", - "Add some text to the cell. You can use the buttons to format it, or you can mark up the text using [Markdown](https://www.markdownguide.org/basic-syntax/). When you are done, hold down Shift and press Enter, which will format the text you just typed and then move to the next cell.\n", + "Add some text to the cell. You can use the buttons to format it, or you can mark up the text using [Markdown](https://www.markdownguide.org/basic-syntax/). When you are done, hold down \"Shift\" and press \"Enter\", which will format the text you just typed and then move to the next cell.\n", "\n", "If you select a Code cell, you should see a button on the left with a triangle inside a circle, which is the icon for \"Play\". If you press this button, Jupyter runs the code in the cell and displays the results.\n", "\n", - "When you run code in a notebook for the first time, you might get a message warning you about the things a notebook can do. If you are running a notebook from a source you trust, which I hope includes me, you can press \"Run Anyway\".\n", + "When you run code in a notebook for the first time, you might get a message warning you about the things a notebook can do. If you are running a notebook from a source you trust -- which I hope includes me -- you can press \"Run Anyway\".\n", "\n", - "Instead of clicking the \"Play\" button, you can also run the code in a cell by holding down Shift and pressing Enter." + "Instead of clicking the \"Play\" button, you can also run the code in a cell by holding down \"Shift\" and pressing \"Enter\"." ] }, { @@ -139,7 +151,7 @@ "* `float`, which represents numbers that have a fraction part, like `3.14159`.\n", "\n", "Most often, we use `int` to represent counts and `float` to represent measurements.\n", - "Here's an example of an `int` and a `float`:" + "Here's an example of an `int`:" ] }, { @@ -151,6 +163,13 @@ "3" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "When you run a cell that contains a value like this, Jupyter displays the value. Here's an example of a `float`:" + ] + }, { "cell_type": "code", "execution_count": 2, @@ -167,17 +186,36 @@ " `float` is short for \"floating-point\", which is the name for the way these numbers are stored." ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Floating-point numbers can also be written in scientific notation, like this:" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "1.2345e3" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The `e` in `1.2345e3` stands for \"exponent\". This way of writing a number is equivalent to $1.2345 \\times 10^{3}$." + ] + }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ - "**Exercise:** Create a code cell below this one and type in the following number: `1.2345e3`\n", - "\n", - "Then run the cell. The output should be `1234.5`\n", - "\n", - "The `e` in `1.2345e3` stands for \"exponent\". This way of writing numbers is a version of scientific notation that means $1.2345 \\times 10^{3}$. If you are not familiar with scientific notation, you can read about it at . " + "If you are not familiar with scientific notation, you can read about it at . " ] }, { @@ -232,11 +270,11 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": 3, "metadata": {}, "outputs": [], "source": [ - "2**3" + "2 ** 3" ] }, { @@ -244,8 +282,7 @@ "metadata": {}, "source": [ "Unlike math notation, Python does not allow \"implicit multiplication\". For example, in math notation, if you write $3 (2 + 1)$, that's understood to be the same as $3 \\times (2+ 1)$.\n", - "Python does not allow that notation.\n", - "If you want to multiply, you have to use the `*` operator." + "Python does not allow that notation." ] }, { @@ -254,13 +291,28 @@ "tags": [] }, "source": [ - "Try running this code to see what error you get.\n", + "NOTE: The following cell uses `%%expect`, which is a Jupyter \"magic command\" that means we expect the code in this cell to produce an error. For more on this topic, see the\n", + "[Jupyter notebook introduction](https://colab.research.google.com/github/AllenDowney/ThinkPython/blob/v3/chapters/jupyter_intro.ipynb)." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "%%expect TypeError\n", "\n", - "```\n", - "3 (2 + 1)\n", - "```\n", + "3 (2 + 1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this example, the error message is not very helpful, which is why I am warning you now. \n", "\n", - "In this example, the error message is not very helpful, which is why I am warning you now. " + "If you want to multiply, you have to use the `*` operator." ] }, { @@ -375,16 +427,14 @@ ] }, { - "cell_type": "markdown", - "metadata": { - "tags": [] - }, + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], "source": [ - "If you run this code in the following cell, you should get an error:\n", + "%%expect ModuleNotFoundError\n", "\n", - "```\n", - "import NumPy as np\n", - "```" + "import NumPy as np" ] }, { @@ -947,32 +997,6 @@ "# Solution goes here" ] }, - { - "cell_type": "markdown", - "metadata": { - "tags": [] - }, - "source": [ - "## The Colab mental model\n", - "\n", - "Congratulations on completing the first notebook!\n", - "\n", - "Now that you have worked with Colab, you might find it helpful to watch this video, where I explain a little more about how it works:" - ] - }, - { - "cell_type": "code", - "execution_count": 34, - "metadata": { - "tags": [] - }, - "outputs": [], - "source": [ - "from IPython.display import YouTubeVideo \n", - "\n", - "YouTubeVideo(\"eIY-PsYBrPs\")" - ] - }, { "cell_type": "markdown", "metadata": { @@ -1011,7 +1035,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.16" + "version": "3.10.13" } }, "nbformat": 4, diff --git a/jupyter_intro.ipynb b/jupyter_intro.ipynb new file mode 100644 index 0000000..60f215d --- /dev/null +++ b/jupyter_intro.ipynb @@ -0,0 +1,278 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "WzCwnbY17x0O", + "tags": [] + }, + "source": [ + "# *Elements of Data Science* on Jupyter\n", + "\n", + "This is an introduction to Jupyter notebooks for people reading [*Elements of Data Science*](https://greenteapress.com/wp/elements-of-data-science) by Allen B. Downey.\n", + "\n", + "A Jupyter notebook is a document that contains text, code, and results from running the code.\n", + "You can read a notebook like a book, but you can also run the code, modify it, and develop new programs.\n", + "\n", + "Jupyter notebooks run in a web browser, so you can run them without installing any new software.\n", + "But they have to connect to a Jupyter server.\n", + "\n", + "You can install and run a server yourself, but to get started it is easier to use a service like [Colab](https://colab.research.google.com/), which is operated by Google.\n", + "\n", + "[On the starting page for the book](https://allendowney.github.io/ElementsOfDataScience) you will find a link for each chapter.\n", + "If you click on one of these links, it opens a notebook on Colab." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WzCwnbY17x0O", + "tags": [] + }, + "source": [ + "If you are reading this notebook on Colab, you should see an orange logo in the upper left that looks like the letters `CO`.\n", + "\n", + "If you are not running this notebook on Colab, [you can click here to open it on Colab](https://colab.research.google.com/github/AllenDowney/ElementsOfDataScience/blob/master/jupyter_intro.ipynb)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WzCwnbY17x0O", + "tags": [] + }, + "source": [ + "## What is a notebook?\n", + "\n", + "A Jupyter notebook is made up of cells, where each cell contains either text or code.\n", + "This cell contains text. \n", + "\n", + "The following cell contains code." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Hello\n" + ] + } + ], + "source": [ + "print('Hello')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WzCwnbY17x0O", + "tags": [] + }, + "source": [ + "Click on the previous cell to select it.\n", + "You should see a button on the left with a triangle inside a circle, which is the icon for \"Play\".\n", + "If you press this button, Jupyter runs the code in the cell and displays the result.\n", + "\n", + "When you run code in a notebook for the first time, it might take a few seconds to start.\n", + "And if it's a notebook you didn't write, you might get a warning message.\n", + "If you are running a notebook from a source you trust, which I hope includes me, you can press \"Run Anyway\"." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WzCwnbY17x0O", + "tags": [] + }, + "source": [ + "Instead of clicking the \"Play\" button, you can also run the code in a cell by holding down `Shift` and pressing `Enter`.\n", + "\n", + "If you are running this notebook on Colab, you should see buttons in the top left that say \"+ Code\" and \"+ Text\". The first one adds a code cell and the second adds a text cell.\n", + "If you want to try them out, select this cell by clicking on it, then press the \"+ Text\" button.\n", + "A new cell should appear below this one." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WzCwnbY17x0O", + "tags": [] + }, + "source": [ + "Add some text to the cell.\n", + "You can use the buttons to format it, or you can mark up the text using [Markdown](https://www.markdownguide.org/basic-syntax/).\n", + "When you are done, hold down `Shift` and press `Enter`, which will format the text you just typed and then move to the next cell." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "At any time Jupyter is in one of two modes:\n", + "\n", + "* In **command mode**, you can perform operations that affect cells, like adding and removing entire cells.\n", + "\n", + "* In **edit mode**, you can edit the contents of a cell.\n", + "\n", + "With text cells, it is obvious which mode you are in.\n", + "In edit mode, the cell is split vertically, with the text you are editing on the left and the formatted text on the right.\n", + "And you'll see text editing tools across the top.\n", + "In command mode, you see only the formatted text.\n", + "\n", + "With code cells, the difference is more subtle, but if there's a cursor in the cell, you are in edit mode.\n", + "\n", + "To go from edit mode to command mode, press `ESC`.\n", + "To go from command mode to edit mode, press `Enter`." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "When you are done working on a notebook, you can close the window, but any changes you have made will disappear.\n", + "If you make any changes you want to keep, open the File menu in the upper left.\n", + "You'll see several ways you can save the notebook.\n", + "\n", + "* If you have a Google account, you can save the notebook in your Drive.\n", + "\n", + "* If you have a GitHub account, you can save it on GitHub.\n", + "\n", + "* Or if you want to save the notebook on your computer, select \"Download\" and then \"Download .ipynb\" The suffix \".ipynb\" indicates that it is a notebook file, as opposed to \".py\", which indicates a file that contains Python code only." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Code for *Elements of Data Science*\n", + "\n", + "At the beginning of each notebook, you'll see a cell with code like this:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "from os.path import basename, exists\n", + "\n", + "def download(url):\n", + " filename = basename(url)\n", + " if not exists(filename):\n", + " from urllib.request import urlretrieve\n", + "\n", + " local, _ = urlretrieve(url, filename)\n", + " print(\"Downloaded \" + str(local))\n", + " return filename\n", + "\n", + "# TODO: Update this link\n", + "\n", + "download('https://raw.githubusercontent.com/AllenDowney/ThinkPython/v3/utils.py')\n", + "\n", + "import utils" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You don't need to know how this code works, but when you get to the end of the book, most of it will make sense.\n", + "As you might guess, it downloads a file -- specifically, it downloads `utils.py`, which contains Python code provided specifically for this book.\n", + "The last line \"imports\" this code, which means we can use the code in the notebook.\n", + "\n", + "In some places you will see a cell like this that begins with `%%expect`." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "ename": "SyntaxError", + "evalue": "invalid syntax (3827346253.py, line 1)", + "output_type": "error", + "traceback": [ + "\u001b[0;36m Cell \u001b[0;32mIn[3], line 1\u001b[0;36m\u001b[0m\n\u001b[0;31m abs 42\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m invalid syntax\n" + ] + } + ], + "source": [ + "%%expect SyntaxError\n", + "\n", + "abs 42" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "`%%expect` is not part of Python -- it is a Jupyter \"magic command\" that indicates that we expect the cell to product an error.\n", + "When you see this command, it means that the error is deliberate, usually intended to warn you about a common pitfall." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For more about running Jupyter notebooks on Colab, [click here](https://colab.research.google.com/notebooks/basic_features_overview.ipynb).\n", + "\n", + "Or, if you are ready to get started, [click here to read Chapter 1](https://colab.research.google.com/github/AllenDowney/ElementsOfDataScience/blob/master/01_variables.ipynb)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "M9yF11G47x0l", + "tags": [] + }, + "source": [ + "*Elements of Data Science*.\n", + "\n", + "Copyright 2024 [Allen B. Downey](https://allendowney.com)\n", + "\n", + "License: [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-nc-sa/4.0/)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Cq6EYo057x0l" + }, + "outputs": [], + "source": [] + } + ], + "metadata": { + "celltoolbar": "Tags", + "colab": { + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.13" + } + }, + "nbformat": 4, + "nbformat_minor": 1 +}