diff --git a/clean-modular-code/activity-3/clean-code-activity-3.md b/clean-modular-code/activity-3/clean-code-activity-3.md index ff3f213..386c23d 100644 --- a/clean-modular-code/activity-3/clean-code-activity-3.md +++ b/clean-modular-code/activity-3/clean-code-activity-3.md @@ -13,42 +13,45 @@ kernelspec: +++ {"editable": true, "slideshow": {"slide_type": ""}} +(clean-code-activity-3)= # Activity 3: Tests & Checks for your code -* In [activity 1](../activity-1/clean-code-activity-1), you took some code and made it cleaner using expressive variable names and docstrings to document the module. -* In [activity 2](../activity-2/clean-code-activity-2), you made your code more DRY ("Don't Repeat Yourself") using documented functions and conditionals. +* In [activity 1](../activity-1/clean-code-activity-1), you made your code cleaner and more usable using [expressive variable names](python-expressive) and docstrings to document the module. +* In [activity 2](../activity-2/clean-code-activity-2), you made your code more DRY ("Don't Repeat Yourself") using documented [functions](write-functions) and [conditionals](python-conditionals). -In this activity, you will build checks into your workflow to handle data processing "features". +In this activity, you will build checks into your workflow using [try/except](try-except) blocks added to functions to handle data processing "features". ++++ {"editable": true, "slideshow": {"slide_type": ""}} ### Real world data processing & workflows and edge cases -Real-world data rarely can be imported without "work arounds". You will often find unusual data entries and values you don't expect. Sometimes, these values are documented - for example, a 9999 may represent a missing value in a dataset. Other times, there are typos and other errors in the data that you need to handle. These unusual values or instances in a dataset or workflow are sometimes called "edge cases". -Writing robust code that handles unexpected values will make your code run smoothly and fail gracefully. This type of code, which combines functions (or classes) and checks within the functions that handle messy data, will make your code easier to maintain over time. +Real-world data rarely can be imported without "work-arounds". You will often find unusual data entries and values you don't expect. Sometimes, these values are documented - for example, a `9999` may represent a missing value in a dataset. Other times, there are typos and other errors in the data that you need to handle. These unusual values or instances in a dataset or workflow are sometimes called "edge cases". + +Writing robust code that handles unexpected values will make your code run smoothly and fail gracefully. This type of code, which combines functions (or classes) and checks within the functions that handle messy data, will make your code easier to maintain. +Things like helpful error messages and fast failing will also improve the experience for someone else using your code--OR your future self. + :::{tip} Using functions, classes, and methods (functions within a class) is a great first step in handling messy data. A function or method provides a modular unit you can test outside of the workflow for the edge cases you may encounter. Also, because a function is a modular unit, you can add elements to handle unexpected processing features as you build your workflow. -::: -something about debuggers? -* https://jupyterlab.readthedocs.io/en/stable/user/debugger.html +Once you have these functions and methods, you can add checks using conditional statements and [try/except](try-except) blocks that anticipate edge cases and errors that you may encounter when processing your data. +::: ## Manage the unexpected -In this activity, you will apply the following strategies: - -* [conditional statements](../checks-conditionals/python-conditionals) -* try/except blocks - -to process the JOSS citation data. +In this activity, you will apply the following strategies to make your code more robust, maintainable & usable: -:::{todo} -What branch is the lesson with try/except // ask for forgiveness, checks elements in?? -IN THIS PR: -https://github.com/pyOpenSci/lessons/pull/14/files#diff-7f4ff1b75e85d38f3955cca051e68e8746773c279b34c9a0a400b9c2dc1240ff -::: +* **[Fail fast with useful error messages](fail-fast)**: Failing fast is a software engineering term that means allowing your + code to stop when something goes wrong, ensuring that errors are caught + and communicated promptly. This helps the user quickly understand the error, what went + wrong, and where. +* Use **[conditional statements](../checks-conditionals/python-conditionals)** + to logically check for specific conditions before executing code. This allows you to create different pathways for code to execute based on specific conditions. +* **[Try/except blocks](../checks-conditionals/python-function-checks)** allow + you to handle potential errors by attempting an operation and catching any + exceptions if they occur, providing useful feedback. In some cases you may want the program to end on an error. In other cases, you may want to handle it in a specific way. -When you can, try to use the Pythonic approach of asking for forgiveness later (ie use try/except blocks) rather than conditional statements. +Try to use a [Pythonic approach](pythonic-checks) to catch errors early when completing this activity. This means asking for forgiveness later vs. using conditional statements to check an object's state or type. ```{code-cell} ipython3 --- @@ -56,7 +59,7 @@ editable: true slideshow: slide_type: '' --- -# This works but is less pythonic +# This works but is less Pythonic as it's a "look before you leap" approach since it does an extra check before returning the title def clean_title(title): """Notice that this function checks explicitly to see if it's a list and then processes the data. """ @@ -65,34 +68,84 @@ def clean_title(title): return title ``` -## More "pythonic" - ask for forgiveness - -easier to ask for forgiveness +The function below raises an error with a custom error message. ```{code-cell} ipython3 --- editable: true slideshow: slide_type: '' +tags: [raises-exception] --- # This is the preferred way to catch an error def clean_title(title): """ - It's more Pythonic to try first and then ask for forgiveness later. - If you are writing tests this also makes your code easier to test. + Attempts to return the first character of the title. + Raises the same error with a friendly, custom error message if the input is invalid. + """ + try: + return title[0] + except IndexError as e: + raise IndexError(f"Oops! You provided a title in an unexpected format. " + f"I expected the title to be provided in a list and you provided " + f"a {type(title)}.") from e + +# Example usage: +title = "" +print(clean_title(title)) # This will raise an IndexError with the friendly message +``` + +```{code-cell} ipython3 +# This is the preferred way to catch an error +def clean_title(title): + """ + Attempts to return the first character of the title. + + Raises the same error with a friendly message if the input is invalid. + """ + try: + return title[0] + except IndexError as e: + raise IndexError(f"Oops! You provided a title in an unexpected format. I expected the title to be provided in a list and you provided a {type(title)}.") + +# Example usage: +title = "" +print(clean_title(title)) # This will raise an IndexError with the friendly message +``` + ++++ {"editable": true, "slideshow": {"slide_type": ""}} + +If you wish, you can shorten the amount of information returned in the error by adding `from None` when you raise the error. This will look nicer to a user, but you lose some detail in the error traceback. + +```{code-cell} ipython3 +# This is the preferred way to catch an error +def clean_title(title): + """ + Attempts to return the first character of the title. + Raises the same error with a friendly message if the input is invalid. """ try: return title[0] - except (TypeError, IndexError): - return title + except IndexError as e: + raise IndexError(f"Oops! You provided a title in an unexpected format. " + f"I expected the title to be provided in a list and you provided " + f"a {type(title)}.") from None + +# Example usage: +title = "" +print(clean_title(title)) # This will raise an IndexError with the friendly message ``` +++ {"editable": true, "slideshow": {"slide_type": ""}, "tags": ["hide-output", "hide-cell"]} +In this activity, you will be working with Pandas dataframes. You may find the `.apply` function to be particularly useful. + :::{tip} -### Applying functions to DataFrame values--`.apply` +### Applying functions to DataFrame values: `.apply` + +The `.apply()` function in pandas allows you to apply any function to rows or columns in a `pandas.DataFrame`. For example, you can use it to perform operations on specific column or row values. When you use `.apply()`, you can specify whether you want to apply the function across columns `(axis=0)` (the default) or across rows `(axis=1)`. -The `.apply()` function in pandas allows you to apply any function to rows or columns in a `pandas.DataFrame`. For example, You can use it to perform operations on specific column or row values. When you use `.apply()`, you can specify whether you want to apply the function across columns `(axis=0)` (the default) or across rows `(axis=1)`. For example, if you want to apply a function to each row of a DataFrame, you would use `df.apply(your_function, axis=1)`. This function is especially useful for applying logic that can’t be easily achieved with built-in pandas functions, allowing for more flexibility in data processing. +For example, if you want to apply a function to each row of a DataFrame, you would use `df.apply(your_function, axis=1)`. This function is especially useful for applying logic that can’t be easily achieved with built-in pandas functions, allowing for more flexibility in data processing. You can use `.apply` in pandas to efficiently replace `for loops` to process row and column values in a `pandas.DataFrame`. @@ -103,20 +156,34 @@ You can use `.apply` in pandas to efficiently replace `for loops` to process row ### What's changed in your workflow? :::{warning} -You have a new data file to open in your list of `.json` files in this activity. This file has some unexpected "features" that your code needs to handle gracefully so it can process all of the data. +You have a new data file to open in your list of `.json` files in this activity. This file has some unexpected "features" that your code needs to handle gracefully to process all of the data. ::: -Your goal is to make the code below run on the data provided in the activity-3 `data/` directory. +The code below is an example of what your code might look like after completing [activity 2](../activity-2/clean-code-activity-2). You can choose to work with this code, or you can use the code that you completed in activity 2. + +Your goal is to make the code below run on the data provided in the activity-3 `data/` directory. + +The code below will fail. You will need to do the following: + +1. Use a debugger to determine why it's failing. +2. Add try/except blocks and/or conditional statements to your functions that handle various exceptions. + +Your end goal is to make the code below run. :::{tip} -The code below will fail. You will likely want to use a debugger to determine why it's failing and get the code running. +* If you are using Jupyter, then you might [find this page helpful when setting up debugging.](https://jupyterlab.readthedocs.io/en/stable/user/debugger.html) +* VSCODE has a nice visual debugger that you can use. + ::: -The code below is an example of what your code might look like after completing activity 2. You can choose to work with this code, or you can use the code that you completed in activity 2. +Important: It is ok if you can't get the code to run fully by the end of this workshop. If you can: -+++ {"editable": true, "slideshow": {"slide_type": ""}, "tags": ["raises-exception"]} +1. identify at least one of the data processing "bugs" (even if you can't fix it) and/or +2. fix at least one bug -```python +You can consider your effort today as a success! We will work on the first element together as a group. + +```{code-cell} ipython3 import json from pathlib import Path @@ -159,37 +226,110 @@ def format_date(date_parts: list) -> str: Returns ------- - str - Formatted date string. + pd.datetime + A date formatted as a pd.datetime object. """ - return f"{date_parts[0]}-{date_parts[1]:02d}-{date_parts[2]:02d}" + date_str = ( + f"{date_parts[0][0]}-{date_parts[0][1]:02d}-{date_parts[0][2]:02d}" + ) + return pd.to_datetime(date_str, format="%Y-%m-%d") def clean_title(value): """A function that removes a value contained in a list.""" + print("hi", value) return value[0] -def process_published_date(date_parts): - """Parse a date provided as a list of values into a proper date format. +columns_to_keep = [ + "publisher", + "DOI", + "type", + "author", + "is-referenced-by-count", + "title", + "published.date-parts", +] + +data_dir = Path("data") + +all_papers_list = [] +for json_file in data_dir.glob("*.json"): + papers_df = load_clean_json(json_file, columns_to_keep) + + papers_df["title"] = papers_df["title"].apply(clean_title) + papers_df["published_date"] = papers_df["published.date-parts"].apply( + format_date + ) + + all_papers_list.append(papers_df) + +all_papers_df = pd.concat(all_papers_list, axis=0, ignore_index=True) + +print("Final shape of combined DataFrame:", all_papers_df.shape) +``` + ++++ {"editable": true, "slideshow": {"slide_type": ""}, "tags": ["raises-exception"]} + +```python +# Note that this code has no checks or tests in the functions provided. You will need to add them to make the code run. + +import json +from pathlib import Path + +import pandas as pd + + +def load_clean_json(file_path, columns_to_keep): + """ + Load JSON data from a file. Drop unnecessary columns and normalize + to DataFrame. + + Parameters + ---------- + file_path : Path + Path to the JSON file. + columns_to_keep : list + List of columns to keep in the DataFrame. + + Returns + ------- + dict + Loaded JSON data. + """ + + with file_path.open("r") as json_file: + json_data = json.load(json_file) + normalized_data = pd.json_normalize(json_data) + + return normalized_data.filter(items=columns_to_keep) + + +def format_date(date_parts: list) -> str: + """ + Format date parts into a string. Parameters ---------- - date_parts : str or int - The elements of a date provided as a list from CrossRef + date_parts : list + List containing year, month, and day. Returns ------- pd.datetime A date formatted as a pd.datetime object. """ - date_str = ( f"{date_parts[0][0]}-{date_parts[0][1]:02d}-{date_parts[0][2]:02d}" ) return pd.to_datetime(date_str, format="%Y-%m-%d") +def clean_title(value): + """A function that removes a value contained in a list.""" + return value[0] + + columns_to_keep = [ "publisher", "DOI", @@ -208,7 +348,7 @@ for json_file in data_dir.glob("*.json"): papers_df["title"] = papers_df["title"].apply(clean_title) papers_df["published_date"] = papers_df["published.date-parts"].apply( - process_published_date + format_date ) all_papers_list.append(papers_df) @@ -220,20 +360,111 @@ print("Final shape of combined DataFrame:", all_papers_df.shape) +++ {"editable": true, "slideshow": {"slide_type": ""}} -:::{admonition} On your own 1 +:::{admonition} Part 1 - What happens when your code can't find the data? :class: attention -Ideas for on your own welcome! +Let's break the code below and see how our code performs. + +* Note that the code below has a modified `/data` directory path (that doesn't exist!). + +Questions: +* What type of error do you expect Python to throw? Use Google, LLMs or our [tests and checks](common-exceptions) lesson to help figure this out. +* Does your code handle this error gracefully? +* How can we make the code handle it better? + +```python +import json +from pathlib import Path + +import pandas as pd + +def load_clean_json(file_path, columns_to_keep): + """ + Load JSON data from a file. Drop unnecessary columns and normalize + to DataFrame. + + Parameters + ---------- + file_path : Path + Path to the JSON file. + columns_to_keep : list + List of columns to keep in the DataFrame. + + Returns + ------- + dict + Loaded JSON data. + """ + + with file_path.open("r") as json_file: + json_data = json.load(json_file) + normalized_data = pd.json_normalize(json_data) + + # Return the pandas DataFrame, filtering out some of the columns that we don't need + return normalized_data.filter(items=columns_to_keep) + + +columns_to_keep = [ + "publisher", + "DOI", + "type", + "author", + "is-referenced-by-count", + "title", + "published.date-parts", +] + +# Break this data path by giving it a dir name that doesn't exist - what happens when your code runs? +data_dir = Path("bad-bad-data") + +all_papers_list = [] +for json_file in data_dir.glob("*.json"): + papers_df = load_clean_json(json_file, columns_to_keep) + papers_df["title"] = papers_df["title"].apply(clean_title) + + all_papers_list.append(papers_df) + +all_papers_df = pd.concat(all_papers_list, axis=0, ignore_index=True) + +print("Final shape of combined DataFrame:", all_papers_df.shape) +``` + +Your goal is to troubleshoot any issues associated with cleaning up the title so you can work with it later in a `pandas.DataFrame`. + ::: ++++ {"editable": true, "slideshow": {"slide_type": ""}, "tags": ["hide-cell"]} + +Note: we can have two groups - one that wants to work on their own and another that wants to work with the instructor together. + +++ {"editable": true, "slideshow": {"slide_type": ""}} -:::{admonition} On your own 2 +:::{admonition} Activity 3: part 1 :class: attention -Ideas welcome? + +In this activity, we will work together in groups or as a whole class to add a try/except block that handles messiness when parsing titles in the cross-ref data. + ::: -I want to have them move their code into a module if possible during this workshop but we could also kick that off in the day 2 workshop. ++++ {"editable": true, "slideshow": {"slide_type": ""}} + +:::{admonition} On your own 1 +:class: attention + +If you get through the activity above and want a second challenge, try to parse the date values for each JOSS publication. Use the published key to extract the date for each publication. You may run into some data "features" when completing this activity. + +```json +"published": { + "date-parts": [ + [ + 2020, + 7, + 4 + ] + ] +``` + +::: ```{code-cell} ipython3 --- diff --git a/clean-modular-code/checks-conditionals/python-functions.md b/clean-modular-code/checks-conditionals/about-python-functions.md similarity index 94% rename from clean-modular-code/checks-conditionals/python-functions.md rename to clean-modular-code/checks-conditionals/about-python-functions.md index 1b5d646..d800c21 100644 --- a/clean-modular-code/checks-conditionals/python-functions.md +++ b/clean-modular-code/checks-conditionals/about-python-functions.md @@ -45,7 +45,7 @@ print(x) Functions can help you to both eliminate repetition and improve efficiency by making it more modular. -Modulare code is code that is separated into independent units that can be reused and even combined to complete a longer chain of tasks. +Modular code is separated into independent units that can be reused and even combined to complete a longer chain of tasks. :::{figure} ../images/clean-code/functions-for-all-things.png :alt: You can implement strategies such as loops and functions in your code to replace tasks that you are performing over and over. Source: Francois Michonneau. @@ -93,7 +93,7 @@ When you write modular functions, you can reuse them for other workflows and pro When coding tasks step by step, you are likely creating many intermediate variables that are not needed again but are stored in your computer's memory. -By using functions, these intermediate variables are confined to the function’s local scope. Once the function finishes executing, the variables created within the function are discarded making your code cleaner and more efficient +These intermediate variables are confined to the function’s local scope by using functions. Once the function finishes executing, the variables created within the function are discarded, making your code cleaner and more efficient ## Reasons why functions improve code readability @@ -122,7 +122,7 @@ Organizing your code using functions from the beginning allows you to explicitly ### Functions and tests -While you will not learn about testing in this lesson, functions are also useful for testing. +Functions are also useful for testing. As your code gets longer and more complex, it is more prone to mistakes. For example, if your analysis relies on data that gets updated often, you may want to make sure that all data are up-to-date before performing an analysis. Or that the new data are not formatted in a different way. diff --git a/clean-modular-code/checks-conditionals/python-common-exceptions.md b/clean-modular-code/checks-conditionals/python-common-exceptions.md new file mode 100644 index 0000000..dde1bf4 --- /dev/null +++ b/clean-modular-code/checks-conditionals/python-common-exceptions.md @@ -0,0 +1,169 @@ +--- +jupytext: + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.4 +kernelspec: + display_name: Python 3 (ipykernel) + language: python + name: python3 +--- + ++++ {"editable": true, "slideshow": {"slide_type": ""}} + +(common-exceptions)= +### Common Python exceptions + +Python has dozens of specific errors that can be raised when code fails to run. Below are a few common ones that you may encounter in [Activity 3](clean-code-activity-3). + +### TypeError + +Occurs when an operation is applied to an object of an inappropriate type. + +```{code-cell} ipython3 +--- +editable: true +slideshow: + slide_type: '' +tags: [raises-exception] +--- +# Example: Trying to add a number and a string +1 + 'string' # This will raise a TypeError +``` + ++++ {"editable": true, "slideshow": {"slide_type": ""}} + +#### ValueError + +- **Raised when** a function receives an argument of the right type but an invalid value. +- **Example:** `int('abc')` (trying to convert an invalid string to an integer). + +```{code-cell} ipython3 +--- +editable: true +slideshow: + slide_type: '' +tags: [raises-exception] +--- +int("abc") +``` + ++++ {"editable": true, "slideshow": {"slide_type": ""}} + +#### KeyError + +- **Raised when** a dictionary key is not found. +- **Example:** `my_dict['nonexistent_key']` (trying to access a key that doesn’t exist in the dictionary). + +```{code-cell} ipython3 +--- +editable: true +slideshow: + slide_type: '' +tags: [raises-exception] +--- +# Example: Accessing a nonexistent key in a dictionary +my_dict = {"a": 1, "b": 2} +my_dict['nonexistent_key'] +``` + ++++ {"editable": true, "slideshow": {"slide_type": ""}} + +#### IndexError: + +- **Raised when** an invalid index is used to access a list or tuple. +- **Example:** `my_list[10]` (trying to access the 11th element of a list with fewer elements). + +```{code-cell} ipython3 +--- +editable: true +slideshow: + slide_type: '' +tags: [raises-exception] +--- +my_list = [1, 2, 3] +my_list[10] +``` + ++++ {"editable": true, "slideshow": {"slide_type": ""}} + +#### AttributeError: + +Raised when an object does not have a specific attribute or method. + +```{code-cell} ipython3 +--- +editable: true +slideshow: + slide_type: '' +tags: [raises-exception] +--- +my_string = "Hello" +my_string.nonexistent_method() +``` + ++++ {"editable": true, "slideshow": {"slide_type": ""}} + +#### FileNotFoundError + +A `FileNotFoundError` occurs in Python when the code attempts to open or access a file that does not exist at the specified path. + +```{code-cell} ipython3 +--- +editable: true +slideshow: + slide_type: '' +tags: [raises-exception] +--- +with open("data/nonexistent_file.json", "r") as file: + data = file.read() +``` + ++++ {"editable": true, "slideshow": {"slide_type": ""}} + +By catching this exception, you can + +1. Raise a kinder and more informative message. +2. Direct the user toward the next steps +3. FUTURE: write tests for this step of the workflow (if you are creating a package!) that make sure that it handles a bad file path properly. + +```{code-cell} ipython3 +--- +editable: true +slideshow: + slide_type: '' +tags: [raises-exception] +--- +from pathlib import Path + +file_path = Path("data") / "nonexistent_file.json" +try: + with open(file_path, "r") as file: + data = file.read() +except FileNotFoundError as fe: + raise FileNotFoundError(f"Oops! it looks like you provided a path to a file that doesn't exist. You provided: {file_path}. Make sure the file path exists. ") +``` + ++++ {"editable": true, "slideshow": {"slide_type": ""}} + +If you don't raise the error but instead provide a print statement, you can provide a simple, clean output without the full "stack" or set of Python messages that provides the full "tracking" or traceback of where the error originated. + +The challenge with not raising a FileNotFound error is that it will be a bit trickier to test the output. + +* you could do `sys.exit` too... bbut i've ru into issues with that in the past (i wish i could remember what they were ) . + +```{code-cell} ipython3 +--- +editable: true +slideshow: + slide_type: '' +--- +file_path = Path("data") / "nonexistent_file.json" +try: + with open(file_path, "r") as file: + data = file.read() +except FileNotFoundError as fe: + print(f"Oops! it looks like you provided a path to a file that doesn't exist. You provided: {file_path}. Make sure the file path exists. ") +``` diff --git a/clean-modular-code/checks-conditionals/python-conditionals.md b/clean-modular-code/checks-conditionals/python-conditionals.md index f4f6cbe..b5f001a 100644 --- a/clean-modular-code/checks-conditionals/python-conditionals.md +++ b/clean-modular-code/checks-conditionals/python-conditionals.md @@ -18,6 +18,7 @@ jupyter: name: python3 --- +(python-conditionals)= # Conditional statements in Python While there are many strategies for improving efficiency and removing repetition in code, three commonly used DRY strategies are conditional statements, loops, and functions. diff --git a/clean-modular-code/checks-conditionals/python-function-checks.md b/clean-modular-code/checks-conditionals/python-function-checks.md index a653c72..99bfac8 100644 --- a/clean-modular-code/checks-conditionals/python-function-checks.md +++ b/clean-modular-code/checks-conditionals/python-function-checks.md @@ -11,32 +11,38 @@ kernelspec: name: python3 --- -+++ {"editable": true, "slideshow": {"slide_type": ""}} +# Write Flexible Functions to Handle Messy Data -# Write Flexible Functions for Messy Data +When dealing with messy or unpredictable data, ensuring your code +It is important to handle errors early and gracefully. [Using functions](python-functions) +is a great first step in creating a robust +and maintainable data processing workflow. Functions provide modular units that +can be tested independently, allowing you to handle various edge cases and +unexpected scenarios effectively. -When dealing with messy or unpredictable data, using functions is an excellent first step in creating a robust and maintainable data processing workflow. Functions provide modular units that can be tested independently, allowing you to handle various edge cases and unexpected scenarios effectively. +Adding checks to your functions +is the next step towards making your code more robust and maintainable over time. -## Function benefits +This lesson will cover several strategies for making this happen: -Using functions in your data processing pipeline offers several advantages: -1. **Modularity**: Functions encapsulate specific tasks, making your code more organized and easier to understand. -2. **Testability**: You can test functions individually, outside of the main workflow, to ensure they handle different scenarios correctly. -3. **Flexibility**: As you build out your workflow, you can easily add elements to functions to handle new processing requirements or edge cases. -4. **Reusability**: Well-designed functions can be reused across different parts of your project or even in other projects. +1. Use [`try/except blocks`](#try-except) rather than simply allowing errors to occur. +1. [Make checks Pythonic](#pythonic-checks) +1. [Fail fast](fail-fast) -## Handling edge cases ++++ {"editable": true, "slideshow": {"slide_type": ""}} -When working with messy data, you'll often encounter edge cases - unusual or unexpected data that can break your processing pipeline. Functions allow you to implement robust error handling and data validation. Here are some techniques you can use: +(try-except)= +## Use Try/Except blocks -+++ {"editable": true, "slideshow": {"slide_type": ""}} +`try/except` blocks in Python help you handle errors gracefully instead of letting your program crash. They are used when you think a part of your code might fail, like when working with missing data, or when converting data types. -## Try/Except blocks +Here’s how try/except blocks work: -Try/except blocks help you catch and handle errors that might happen while your code is running. This is useful for tasks that might fail, like converting data types or working with data that’s missing or incorrect. +* **try block:** You write the code that might cause an error here. Python will attempt to run this code. +* **except block:** If Python encounters an error in the try block, it jumps to the except block to handle it. You can specify what to do when an error occurs, such as printing a friendly message or providing a fallback option. -A try/except block looks like this: +A `try/except` block looks like this: ```python try: @@ -45,6 +51,10 @@ except SomeError: # what to do if there's an error ``` +:::{tip} +We pulled together some of the more [common exceptions that Python can throw here](common-exceptions). +::: + ```{code-cell} ipython3 --- editable: true @@ -55,7 +65,7 @@ def convert_to_int(value): try: return int(value) except ValueError: - print("Oops i can't process this so I will fail gracefully.") + print("Oops i can't process this so I will fail quietly with a print statement.") return None # or some default value ``` @@ -79,77 +89,69 @@ convert_to_int("abc") +++ {"editable": true, "slideshow": {"slide_type": ""}} -This function attempts to convert a value to an integer, returning None and a message if the conversion fails. - -## Make checks Pythonic - -Python has a unique philosophy regarding handling potential errors or exceptional cases. This philosophy is often summarized by the acronym EAFP: "Easier to Ask for Forgiveness than Permission." +This function attempts to convert a value to an integer, returning `None` and a message if the conversion fails. However, is that message helpful to a person using your code? -### EAFP vs. LBYL - -There are two main approaches to handling potential errors: ++++ {"editable": true, "slideshow": {"slide_type": ""}} -**LBYL (Look Before You Leap)**: Check for conditions before making calls or accessing data. -**EAFP (Easier to Ask for Forgiveness than Permission)**: Assume the operation will succeed and handle any exceptions if they occur. +(fail-fast)= +## Fail fast strategy -Pythonic code generally favors the EAFP approach. +Identify data processing or workflow problems immediately when they occur and throw an error immediately rather than allowing +them to propagate through your code. This approach saves time and simplifies debugging, providing clearer, more useful error outputs (stack traces). Below, you can see that the code tries to open a file, but Python can't find the file. In response, Python throws a `FileNotFoundError`. ```{code-cell} ipython3 --- editable: true slideshow: slide_type: '' +tags: [raises-exception] --- -# LBYL approach - manually check that the user provides a int -def convert_to_int(value): - if isinstance(value, int): - return int(value) - else: - print("Oops i can't process this so I will fail gracefully.") - return None +# Open a file (but it doesn't exist +def read_file(file_path): + with open(file_path, 'r') as file: + data = file.read() + return data -convert_to_int(1) -convert_to_int("a") +file_data = read_file("nonexistent_file.txt") ``` ++++ {"editable": true, "slideshow": {"slide_type": ""}} + +You could anticipate a user providing a bad file path. This might be especailly possible if you plan to share your code with others and run it on different computers and different operating systems. + +In the example below, you use a [conditional statement](python-conditionals) to check if the file exists; if it doesn't, it returns None. In this case, the code will fail quietly, and the user will not understand that there is an error. + +This is also dangerous territory for a user who may not understand why the code runs but doesn't work. + ```{code-cell} ipython3 --- editable: true slideshow: slide_type: '' --- -# EAFP approach - Consider what the user might provide and catch the error. -def convert_to_int(value): - try: - return int(value) - except ValueError: - print("Oops i can't process this so I will fail gracefully.") - return None # or some default value - -convert_to_int(1) -convert_to_int("a") -``` - -+++ {"editable": true, "slideshow": {"slide_type": ""}} +import os -The EAFP (Easier to Ask for Forgiveness than Permission) approach is more Pythonic because: - -* It’s often faster, avoiding redundant checks when operations succeed. -* It’s more readable, separating the intended operation and error handling. - -## Any Check is a Good Check +def read_file(file_path): + if os.path.exists(file_path): + with open(file_path, 'r') as file: + data = file.read() + return data + else: + return None # Doesn't fail immediately, just returns None -As long as you consider edge cases, you're writing great code! You don’t need to worry about being “Pythonic” immediately, but understanding both approaches is useful regardless of which approach you chose. +# No error raised, even though the file doesn't exist +file_data = read_file("nonexistent_file.txt") +``` +++ {"editable": true, "slideshow": {"slide_type": ""}} -## Common Python exceptions +This code example below is better than the examples above for three reasons: -Python has dozens of specific errors that can be raised when code fails to run. Below are a few common ones that you may encounter in the activity 3. - -### TypeError +1. It's **pythonic**: it asks for forgiveness later by using a try/except +2. It fails quickly - as soon as it tries to open the file. The code won't continue to run after this step fails. +3. It raises a clean, useful error that the user can understand -Occurs when an operation is applied to an object of an inappropriate type. +The code anticipates what will happen if it can't find the file. It then raises a `FileNotFoundError` and provides a useful and friendly message to the user. ```{code-cell} ipython3 --- @@ -158,80 +160,112 @@ slideshow: slide_type: '' tags: [raises-exception] --- -# Example: Trying to add a number and a string -1 + 'string' # This will raise a TypeError +def read_file(file_path): + try: + with open(file_path, 'r') as file: + data = file.read() + return data + except FileNotFoundError: + raise FileNotFoundError(f"Oops! I couldn't find the file located at: " + f"{file_path}. Please check to see if it exists") + +# Raises an error immediately if the file doesn't exist +file_data = read_file("nonexistent_file.txt") ``` -+++ {"editable": true, "slideshow": {"slide_type": ""}} +## Customizing error messages + +The code above is useful because it fails and provides a simple and effective message that tells the user to check that their file path is correct. -### ValueError +However, the amount of text returned from the error is significant because it finds the error when it can't open the file. Still, then you raise the error intentionally within the except statement. -* **Raised when** a function receives an argument of the right type but an invalid value. -* **Example:** `int('abc')` (trying to convert an invalid string to an integer). +If you wanted to provide less information to the user, you could use `from None`. From None ensure that you +only return the exception information related to the error that you handle within the try/except block. ```{code-cell} ipython3 ---- -editable: true -slideshow: - slide_type: '' -tags: [raises-exception] ---- -int("abc") +def read_file(file_path): + try: + with open(file_path, 'r') as file: + data = file.read() + return data + except FileNotFoundError: + raise FileNotFoundError(f"Oops! I couldn't find the file located at: {file_path}. " + "Please check to see if it exists") from None + +# Raises an error immediately if the file doesn't exist +file_data = read_file("nonexistent_file.txt") ``` +++ {"editable": true, "slideshow": {"slide_type": ""}} -### KeyError +(pythonic-checks)= +## Make Checks Pythonic -* **Raised when** a dictionary key is not found. -* **Example:** `my_dict['nonexistent_key']` (trying to access a key that doesn’t exist in the dictionary). +Python has a unique philosophy regarding handling potential errors or +exceptional cases. This philosophy is often summarized by the acronym EAFP: +"Easier to Ask for Forgiveness than Permission." When combined with the **fail +fast** approach, your code can be flexible and resilient to the messy +realities of data processing. + +### EAFP vs. LBYL + +There are two main approaches to handling potential errors: + +- **LBYL (Look Before You Leap)**: Check for conditions before making calls or + accessing data. +- **EAFP (Easier to Ask for Forgiveness than Permission)**: Assume the operation + will succeed and handle any exceptions if they occur. + +Pythonic code generally favors the EAFP approach, which allows for **failing +fast** when an error occurs, providing useful feedback without unnecessary +checks. ```{code-cell} ipython3 --- editable: true slideshow: slide_type: '' -tags: [raises-exception] --- -# Example: Accessing a nonexistent key in a dictionary -my_dict = {"a": 1, "b": 2} -my_dict['nonexistent_key'] -``` - -+++ {"editable": true, "slideshow": {"slide_type": ""}} - -### IndexError +# LBYL approach - manually check that the user provides a int +def convert_to_int(value): + if isinstance(value, int): + return int(value) + else: + print("Oops i can't process this so I will fail gracefully.") + return None -* **Raised when** an invalid index is used to access a list or tuple. -* **Example:** `my_list[10]` (trying to access the 11th element of a list with fewer elements). +convert_to_int(1) +convert_to_int("a") +``` ```{code-cell} ipython3 --- editable: true slideshow: slide_type: '' -tags: [raises-exception] --- -my_list = [1, 2, 3] -my_list[10] +# EAFP approach - Consider what the user might provide and catch the error. +def convert_to_int(value): + try: + return int(value) + except ValueError: + print("Oops i can't process this so I will fail gracefully.") + return None # or some default value + +convert_to_int(1) +convert_to_int("a") ``` +++ {"editable": true, "slideshow": {"slide_type": ""}} -### AttributeError +The EAFP (Easier to Ask for Forgiveness than Permission) approach is more Pythonic because: + +* It’s often faster, avoiding redundant checks when operations succeed. +* It’s more readable, separating the intended operation and error handling. -Raised when an object does not have a specific attribute or method. +## Any Check is a Good Check -```{code-cell} ipython3 ---- -editable: true -slideshow: - slide_type: '' -tags: [raises-exception] ---- -my_string = "Hello" -my_string.nonexistent_method() -``` +As long as you consider edge cases, you're writing great code! You don’t need to worry about being “Pythonic” immediately, but understanding both approaches is useful regardless of which approach you chose. ```{code-cell} ipython3 --- diff --git a/clean-modular-code/checks-conditionals/python-functions-multi-parameters.md b/clean-modular-code/checks-conditionals/python-functions-multi-parameters.md new file mode 100644 index 0000000..538767b --- /dev/null +++ b/clean-modular-code/checks-conditionals/python-functions-multi-parameters.md @@ -0,0 +1,608 @@ +--- +layout: single +title: 'Write Functions with Multiple Parameters in Python' +excerpt: "A function is a reusable block of code that performs a specific task. Learn how to write functions that can take multiple as well as optional parameters in Python to eliminate repetition and improve efficiency in your code." +authors: ['Jenny Palomino', 'Leah Wasser'] +category: [courses] +class-lesson: ['intro-functions-tb'] +permalink: /courses/intro-to-earth-data-science/write-efficient-python-code/functions-modular-code/write-functions-with-multiple-and-optional-parameters-in-python/ +nav-title: "Write Multi-Parameter Functions in Python" +dateCreated: 2019-11-12 +modified: '{:%Y-%m-%d}'.format(datetime.now()) +module-type: 'class' +chapter: 19 +course: "intro-to-earth-data-science-textbook" +week: 7 +sidebar: + nav: +author_profile: false +comments: true +order: 3 +topics: + reproducible-science-and-programming: ['python'] +redirect_from: + - "/courses/intro-to-earth-data-science/write-efficient-python-code/functions/write-functions-with-multiple-and-optional-parameters-in-python/" +jupyter: + jupytext: + formats: ipynb,md + text_representation: + extension: .md + format_name: markdown + format_version: '1.3' + jupytext_version: 1.16.4 + kernelspec: + display_name: Python 3 (ipykernel) + language: python + name: python3 +--- + + +{% include toc title="On This Page" icon="file-text" %} + +
+ +## Learning Objectives + +* Write and execute custom functions with multiple input parameters in **Python**. +* Write and execute custom functions with optional input parameters in **Python**. + +
+ + +## How to Define a Function with Multiple Parameters in Python + +Previously in this textbook, you learned that an input parameter is the required information that you pass to the function for it to run successfully. The function will take the value or object provided as the input parameter and use it to perform some task. + +You also learned that in **Python**, the required parameter can be defined using a placeholder variable, such as `data`, which represents the value or object that will be acted upon in the function. + + +```python +def function_name(data): +``` + +However, sometimes you may need additional information for the function to run successfully. + +Luckily, you can write functions that take in more than one parameter by defining as many parameters as needed, for example: + +```python +def function_name(data_1, data_2): +``` + +When the function is called, a user can provide any value for `data_1` or `data_2` that the function can take as an input for that parameter (e.g. single value variable, list, **numpy** array, **pandas** dataframe column). + + +## Write a Function with Multiple Parameters in Python + +Imagine that you want to define a function that will take in two numeric values as inputs and return the product of these input values (i.e. multiply the values). + +Begin with the `def` keyword and the function name, just as you have before to define a function: + +```python +def multiply_values +``` + +Next, provide two placeholder variable names for the input parameters, as shown below. + +```python +def multiply_values(x,y): +``` + +Add the code to multiply the values and the `return` statement to returns the product of the two values: + +```python +def multiply_values(x,y): + z = x * y + return z +``` + +Last, write a docstring to provide the details about this function, including a brief description of the function (i.e. how it works, purpose) as well as identify the input parameters (i.e. type, description) and the returned output (i.e. type, description). + +```python +def multiply_values(x,y): + """Calculate product of two inputs. + + Parameters + ---------- + x : int or float + y : int or float + + Returns + ------ + z : int or float + """ + z = x * y + return z +``` + +## Call Custom Functions with Multiple Parameters in Python + +Now that you have defined the function `multiple_values()`, you can call it by providing values for the two input parameters. + +```python +# Call function with numeric values +multiply_values(x = 0.7, y = 25.4) +``` + +Recall that you can also provide pre-defined variables as inputs, for example, a value for precipitation and another value for a unit conversion value. + +```python +# Average monthly precip (inches) for Jan in Boulder, CO +precip_jan_in = 0.7 + +# Conversion factor from inches to millimeters +to_mm = 25.4 +``` + +```python +# Call function with pre-defined variables +precip_jan_mm = multiply_values( + x = precip_jan_in, + y = to_mm) + +precip_jan_mm +``` + +Note that the function is not defined specifically for unit conversions, but as it completes a generalizable task, it can be used for simple unit conversions. + + +## Combine Unit Conversion and Calculation of Statistics into One Function + +Now imagine that you want to both convert the units of a **numpy** array from millimeters to inches and calculate the mean value along a specified axis for either columns or rows. + +Recall the function definition that you previously wrote to convert values from millimeters to inches: + +```python +def mm_to_in(mm): + """Convert input from millimeters to inches. + + Parameters + ---------- + mm : int or float + Numeric value with units in millimeters. + + Returns + ------ + inches : int or float + Numeric value with units in inches. + """ + inches = mm / 25.4 + return inches +``` + +You can expand this function to include running a mean along a specified axis for columns or rows, and then use this function over and over on many **numpy** arrays as needed. + +This new function can have descriptive names for the function and the input parameters that describe more clearly what the function accomplishes. + +Begin by defining the function with a descriptive name and the two necessary parameters: +* the input array with values in millimeters +* the axis value for the mean calculation + +Use placeholder variable names that highlight the purpose of each parameter: + + +```python +def mean_mm_to_in_arr(arr_mm, axis_value): +``` + + +Next, add the code to first calculate the mean of the input array along a specified axis, and then to convert the mean values from millimeters to inches. + +First, add the code line to calculate a mean along a specified axis. + + +```python +def mean_mm_to_in_arr(arr_mm, axis_value): + mean_arr_mm = np.mean(arr_mm, axis = axis_value) +``` + + +Next, add the code line to convert the mean array from millimeters to inches. In this case, the `return` statement should return the mean array in inches. + + +```python +def mean_mm_to_in_arr(arr_mm, axis_value): + mean_arr_mm = np.mean(arr_mm, axis = axis_value) + mean_arr_in = mean_arr_mm / 25.4 + + return mean_arr_in + +``` + +Note that the function could be written to convert the values first and then calculate the mean. However, given that the function will complete both tasks and return the mean values in the desired units, it is more efficient to calculate the mean values first and then convert just those values, rather than converting all of the values in the input array. + + +Last, include a docstring to provide the details about this function, including a brief description of the function (i.e. how it works, purpose) as well as identify the input parameters (i.e. type, description) and the returned output (i.e. type, description). + + +```python +def mean_mm_to_in_arr(arr_mm, axis_value): + """Calculate mean values of input array along a specified + axis and convert values from millimeters to inches. + + Parameters + ---------- + arr_mm : numpy array + Numeric values in millimeters. + axis_value : int + 0 to calculate mean for each column. + 1 to calculate mean for each row. + + Returns + ------ + mean_arr_in : numpy array + Mean values of input array in inches. + """ + mean_arr_mm = np.mean(arr_mm, axis = axis_value) + mean_arr_in = mean_arr_mm / 25.4 + + return mean_arr_in +``` + +Now that you have defined `mean_mm_to_in_arr()`, you can call the function with the appropriate input parameters. + +Create some data and test your new function with different input values for the `axis_value` parameter. + +```python +# Import necessary package to run function +import numpy as np +``` + +```python +# 2d array of average monthly precip (mm) for 2002 and 2013 in Boulder, CO +precip_2002_2013_mm = np.array([[27.178, 11.176, 38.1, 5.08, 81.28, 29.972, + 2.286, 36.576, 38.608, 61.976, 19.812, 0.508], + [6.858, 28.702, 43.688, 105.156, 67.564, 15.494, + 26.162, 35.56 , 461.264, 56.896, 7.366, 12.7] + ]) +``` + +```python +# Calculate monthly mean (inches) for precip_2002_2013 +monthly_mean_in = mean_mm_to_in_arr(arr_mm = precip_2002_2013_mm, + axis_value = 0) + +monthly_mean_in +``` + +```python +# Calculate yearly mean (inches) for precip_2002_2013 +yearly_mean_in = mean_mm_to_in_arr(arr_mm = precip_2002_2013_mm, + axis_value = 1) + +yearly_mean_in +``` + +## Define Optional Input Parameters for a Function + +Your previously defined function works well if you want to use a specified axis for the mean. + +However, notice what happens when you try to call the function without providing an axis value, such as for a one-dimensional array. + +```python +# 1d array of average monthly precip (mm) for 2002 in Boulder, CO +precip_2002_mm = np.array([27.178, 11.176, 38.1, 5.08, 81.28, 29.972, + 2.286, 36.576, 38.608, 61.976, 19.812, 0.508]) +``` + + +```python +# Calculate mean (inches) for precip_2002 +monthly_mean_in = mean_mm_to_in_arr(arr_mm = precip_2002_mm) +``` + +You get an error that the `axis_value` is missing: + +```python +TypeError: mean_mm_to_in_arr() missing 1 required positional argument: 'axis_value' +``` + + + +What if you want to make the function more generalizable, so that the axis value is optional? + +You can do that by specifying a default value for `axis_value` as `None` as shown below: + +```python +def mean_mm_to_in_arr(arr_mm, axis_value=None): +``` + +The function will assume that the axis value is `None` (i.e. that an input value has not been provided by the user), unless specified otherwise in the function call. + +However, as written, the original function code uses the axis value to calculate the mean, so you need to make a few more changes, so that the mean code runs with an axis value if a value is provided or runs without an axis value if one is not provided. + +Luckily, you have already learned about conditional statements, which you can now add to your function to run the mean code with or without an axis value as needed. + +Using a conditional statement, you can check if `axis_value` is equal to `None`, in which case the mean code will run without an axis value. + +```python +def mean_mm_to_in_arr(arr_mm, axis_value=None): + + if axis_value is None: + mean_arr_mm = np.mean(arr_mm) +``` + +The `else` statement would mean that `axis_value` is not equal to `None` (i.e. a user has provided an input value) and thus would run the mean code with the specified axis value. + +```python +def mean_mm_to_in_arr(arr_mm, axis_value=None): + + if axis_value is None: + mean_arr_mm = np.mean(arr_mm) + else: + mean_arr_mm = np.mean(arr_mm, axis = axis_value) +``` + +The code for the unit conversion and the `return` remain the same, just with updated names: + +```python +def mean_mm_to_in_arr(arr_mm, axis_value=None): + if axis_value is None: + mean_arr_mm = np.mean(arr_mm) + else: + mean_arr_mm = np.mean(arr_mm, axis = axis_value) + + mean_arr_in = mean_arr_mm / 25.4 + + return mean_arr_in +``` + +Last, include a docstring to provide the details about this revised function. Notice that the axis value has been labeled optional in the docstring. + + +```python +def mean_mm_to_in_arr(arr_mm, axis_value=None): + """Calculate mean values of input array and convert values + from millimeters to inches. If an axis is specified, + the mean will be calculated along that axis. + + + Parameters + ---------- + arr_mm : numpy array + Numeric values in millimeters. + axis_value : int (optional) + 0 to calculate mean for each column. + 1 to calculate mean for each row. + + Returns + ------ + mean_arr_in : numpy array + Mean values of input array in inches. + """ + if axis_value is None: + mean_arr_mm = np.mean(arr_mm) + else: + mean_arr_mm = np.mean(arr_mm, axis = axis_value) + + mean_arr_in = mean_arr_mm / 25.4 + + return mean_arr_in +``` + +Notice that the function will return the same output as before for the two-dimensional array `precip_2002_2013_mm`. + +```python +# Calculate monthly mean (inches) for precip_2002_2013 +monthly_mean_in = mean_mm_to_in_arr(arr_mm = precip_2002_2013_mm, + axis_value = 0) + +monthly_mean_in +``` + +However, now you can also provide a one-dimensional array as an input without a specified axis and receive the appropriate output. + +```python +# Calculate mean (inches) for precip_2002 +monthly_mean_in = mean_mm_to_in_arr(arr_mm = precip_2002_mm) + +monthly_mean_in +``` + + +## Combine Download and Import of Data Files into One Function + +You can also write multi-parameter functions to combine other tasks into one function, such as downloading and importing data files into a **pandas** dataframe. + +Think about the code that you need to include in the function: +1. download data file from URL: `et.data.get_data(url=file_url)` +2. import data file into **pandas** dataframe: `pd.read_csv(path)` + +From this code, you can see that you will need two input parameters for the combined function: +1. the URL to the data file +2. the path to the downloaded file + +Begin by specifying a function name and the placeholder variable names for the necessary input parameters. + +```python +def download_import_df(file_url, path): +``` + +Next, add the code for download and the import. + +```python +def download_import_df(file_url, path): + et.data.get_data(url=file_url) + df = pd.read_csv(path) +``` + +However, what if the working directory has not been set before this function is called, and you do not want to use absolute paths? + +Since you know that the `get_data()` function creates the `earth-analytics` directory under the home directory if it does not already exist, you can safely assume that this combined function will also create that directory. + +As such, you can include setting the working directory in the function, so that you do not have to worry about providing absolute paths to the function: + +```python +def download_import_df(file_url, path): + + et.data.get_data(url=file_url) + os.chdir(os.path.join(et.io.HOME, "earth-analytics")) + df = pd.read_csv(path) + + return df +``` + +Last, include a docstring to provide the details about this function, including a brief description of the function (i.e. how it works, purpose) as well as identify the input parameters (i.e. type, description) and the returned output (i.e. type, description). + + +```python +def download_import_df(file_url, path): + """Download file from specified URL and import file + into a pandas dataframe from a specified path. + + Working directory is set to earth-analytics directory + under home, which is automatically created by the + download. + + + Parameters + ---------- + file_url : str + URL to CSV file (http or https). + path : str + Path to CSV file using relative path + to earth-analytics directory under home. + + Returns + ------ + df : pandas dataframe + Dataframe imported from downloaded CSV file. + """ + + et.data.get_data(url=file_url) + os.chdir(os.path.join(et.io.HOME, "earth-analytics")) + df = pd.read_csv(path) + + return df +``` + +Now that you have defined the function, you can import the packages needed to run the function and define the variables that you will use as input parameters. + +```python +# Import necessary packages to run function +import os +import pandas as pd +import earthpy as et +``` + +```python +# URL for average monthly precip (inches) for 2002 and 2013 in Boulder, CO +precip_2002_2013_df_url = "https://ndownloader.figshare.com/files/12710621" + +# Path to downloaded .csv file with headers +precip_2002_2013_df_path = os.path.join("data", "earthpy-downloads", + "precip-2002-2013-months-seasons.csv") +``` + +Using these variables, you can now call the function to download and import the file into a **pandas** dataframe. + +```python +# Create dataframe using download/import function +precip_2002_2013_df = download_import_df( + file_url = precip_2002_2013_df_url, + path = precip_2002_2013_df_path) + +precip_2002_2013_df +``` + + +### Making Functions More Efficient Does Not Always Mean More Parameters + +Note that you previously defined `download_import_df()` to take in two parameters, one for the URL and for the path, and the function works well to accomplish the task. + +However, with a little investigation into the `et.data.get_data()` function, you can see that the output of that function is actually a path to the downloaded file! + +```python +help(et.data.get_data) +``` +In the docstring details provided, you can see that the full path to the downloaded data is returned by the function: + +``` +Returns +------- +path_data : str + The path to the downloaded data. +``` + +This means that you can redefine `download_import_df()` to be more efficient by simply using the output of the `et.data.get_data()` function as the input to the `pd.read_csv()` function. + +Now, you actually only need one parameter for the URL and you do not have to define the working directory in the function, in order to find the appropriate file. + + +```python +def download_import_df(file_url): + """Download file from specified URL and import file + into a pandas dataframe. + + The path to the downloaded file is automatically + generated by the download and is passed to the + pandas function to create a new dataframe. + + Parameters + ---------- + file_url : str + URL to CSV file (http or https). + + Returns + ------ + df : pandas dataframe + Dataframe imported from downloaded CSV file. + """ + + df = pd.read_csv(et.data.get_data(url=file_url)) + + return df +``` + +Your revised function now executes only one line, rather than three lines! Note that the docstring was also updated to reflect that there is only one input parameter for this function. + +Now you can call the function with just a single parameter for the URL. + +```python +# Create dataframe using download/import function +precip_2002_2013_df = download_import_df( + file_url = precip_2002_2013_df_url) + +precip_2002_2013_df +``` + + +
+ +## Practice Writing Multi-Parameter Functions for Pandas Dataframes + +You have a function that combines the mean calculation along a specified axis and the conversion from millimeters to inches for a **numpy** array. + +How might you need to change this function to create a similar function for **pandas** dataframe, but now converting from inches to millimeters? + +For the mean, you can run summary statistics on pandas using a specified axis (just like a **numpy** array) with the following code: + +```python +df.mean(axis = axis_value) +``` + +With the axis value `0`, the code will calculate a mean for each numeric column in the dataframe. + +With the axis value `1`, the code will calculate a mean for each row with numeric values in the dataframe. + +Think about which code lines in the existing function `mean_mm_to_in_arr()` can be modified to run the equivalent code on a **pandas** dataframe. + +Note that the `df.mean(axis = axis_value)` returns the mean values of a dataframe (along the specified axis) as a **pandas** series. + +
+ + +
+ +## Practice Writing Multi-Parameter Functions for Numpy Arrays + +You also have a function that combines the data download and import for a **pandas** dataframe, you can modify the function for other data structures such as a **numpy** array. + +How might you need to change this function to create an equivalent for **numpy** arrays? + +Think about which code lines in the existing function `download_import_df()` can be modified to write a new function that downloads and imports data into a **numpy** array. + +To begin, you may want to write one function for a 1-dimensional array and another function for a 2-dimensional array. + +To advance in your practice, you can think about adding a conditional statement that would check for the file type (.txt for a 1-dimensional array .csv for a 2-dimensional array) before executing the appropriate import code. + +
diff --git a/clean-modular-code/checks-conditionals/write-python-functions.md b/clean-modular-code/checks-conditionals/write-python-functions.md index 9f78d6c..3d8e749 100644 --- a/clean-modular-code/checks-conditionals/write-python-functions.md +++ b/clean-modular-code/checks-conditionals/write-python-functions.md @@ -17,6 +17,7 @@ jupyter: name: python3 --- +(write-functions)= ## How to write a Python function :::{tip} @@ -54,9 +55,9 @@ add_numbers(1,2) ``` -## How to Define Functions in Python +## How to define functions in Python -There are several components needed to define a function in **Python**, including the `def` keyword, function name, parameters (inputs), and the `return` statement, which specifies the output of the function. +Several components are needed to define a function in **Python**, including the `def` keyword, function name, parameters (inputs), and the `return` statement, which specifies the function's output. ```python def function_name(parameter): @@ -64,7 +65,7 @@ def function_name(parameter): return output ``` -### def keyword and function Name +### `def` keyword and function Name In **Python**, function definitions begin with the keyword **`def`** to indicate the start of a definition for a new function. The function name follows this keyword. diff --git a/clean-modular-code/intro-clean-code.md b/clean-modular-code/intro-clean-code.md index 737d2bd..a2b0f48 100644 --- a/clean-modular-code/intro-clean-code.md +++ b/clean-modular-code/intro-clean-code.md @@ -24,7 +24,7 @@ jupyter: :::{toctree} :hidden: -:caption: Lessons +:caption: Clean, Expressive Code :maxdepth: 2 Intro @@ -35,23 +35,16 @@ Expressive Code :::{toctree} :hidden: -:caption: Functions, Checks & Tests +:caption: Functions, Conditionals & Checks :maxdepth: 2 -Functions +Conditional statements +Functions Functions Function checks -Function Tests & Checks +Function Tests & Checks ::: -:::{toctree} -:hidden: -:caption: Conditional statements -:maxdepth: 2 - -Conditional statements -Tests & Checks -::: :::{toctree} :hidden: diff --git a/clean-modular-code/python-expressive-code.md b/clean-modular-code/python-expressive-code.md index ad6b8a8..10a1415 100644 --- a/clean-modular-code/python-expressive-code.md +++ b/clean-modular-code/python-expressive-code.md @@ -28,6 +28,7 @@ authors: ['Leah Wasser', 'Jenny Palomino'] +++ {"editable": true, "slideshow": {"slide_type": ""}} +(python-expressive)= # Make Your Code Easier to Read Using Expressive Variable Names in Python :::{admonition} Learning Objectives