diff --git a/clean-modular-code/activity-3/clean-code-activity-3.md b/clean-modular-code/activity-3/clean-code-activity-3.md index ff3f213..386c23d 100644 --- a/clean-modular-code/activity-3/clean-code-activity-3.md +++ b/clean-modular-code/activity-3/clean-code-activity-3.md @@ -13,42 +13,45 @@ kernelspec: +++ {"editable": true, "slideshow": {"slide_type": ""}} +(clean-code-activity-3)= # Activity 3: Tests & Checks for your code -* In [activity 1](../activity-1/clean-code-activity-1), you took some code and made it cleaner using expressive variable names and docstrings to document the module. -* In [activity 2](../activity-2/clean-code-activity-2), you made your code more DRY ("Don't Repeat Yourself") using documented functions and conditionals. +* In [activity 1](../activity-1/clean-code-activity-1), you made your code cleaner and more usable using [expressive variable names](python-expressive) and docstrings to document the module. +* In [activity 2](../activity-2/clean-code-activity-2), you made your code more DRY ("Don't Repeat Yourself") using documented [functions](write-functions) and [conditionals](python-conditionals). -In this activity, you will build checks into your workflow to handle data processing "features". +In this activity, you will build checks into your workflow using [try/except](try-except) blocks added to functions to handle data processing "features". ++++ {"editable": true, "slideshow": {"slide_type": ""}} ### Real world data processing & workflows and edge cases -Real-world data rarely can be imported without "work arounds". You will often find unusual data entries and values you don't expect. Sometimes, these values are documented - for example, a 9999 may represent a missing value in a dataset. Other times, there are typos and other errors in the data that you need to handle. These unusual values or instances in a dataset or workflow are sometimes called "edge cases". -Writing robust code that handles unexpected values will make your code run smoothly and fail gracefully. This type of code, which combines functions (or classes) and checks within the functions that handle messy data, will make your code easier to maintain over time. +Real-world data rarely can be imported without "work-arounds". You will often find unusual data entries and values you don't expect. Sometimes, these values are documented - for example, a `9999` may represent a missing value in a dataset. Other times, there are typos and other errors in the data that you need to handle. These unusual values or instances in a dataset or workflow are sometimes called "edge cases". + +Writing robust code that handles unexpected values will make your code run smoothly and fail gracefully. This type of code, which combines functions (or classes) and checks within the functions that handle messy data, will make your code easier to maintain. +Things like helpful error messages and fast failing will also improve the experience for someone else using your code--OR your future self. + :::{tip} Using functions, classes, and methods (functions within a class) is a great first step in handling messy data. A function or method provides a modular unit you can test outside of the workflow for the edge cases you may encounter. Also, because a function is a modular unit, you can add elements to handle unexpected processing features as you build your workflow. -::: -something about debuggers? -* https://jupyterlab.readthedocs.io/en/stable/user/debugger.html +Once you have these functions and methods, you can add checks using conditional statements and [try/except](try-except) blocks that anticipate edge cases and errors that you may encounter when processing your data. +::: ## Manage the unexpected -In this activity, you will apply the following strategies: - -* [conditional statements](../checks-conditionals/python-conditionals) -* try/except blocks - -to process the JOSS citation data. +In this activity, you will apply the following strategies to make your code more robust, maintainable & usable: -:::{todo} -What branch is the lesson with try/except // ask for forgiveness, checks elements in?? -IN THIS PR: -https://github.com/pyOpenSci/lessons/pull/14/files#diff-7f4ff1b75e85d38f3955cca051e68e8746773c279b34c9a0a400b9c2dc1240ff -::: +* **[Fail fast with useful error messages](fail-fast)**: Failing fast is a software engineering term that means allowing your + code to stop when something goes wrong, ensuring that errors are caught + and communicated promptly. This helps the user quickly understand the error, what went + wrong, and where. +* Use **[conditional statements](../checks-conditionals/python-conditionals)** + to logically check for specific conditions before executing code. This allows you to create different pathways for code to execute based on specific conditions. +* **[Try/except blocks](../checks-conditionals/python-function-checks)** allow + you to handle potential errors by attempting an operation and catching any + exceptions if they occur, providing useful feedback. In some cases you may want the program to end on an error. In other cases, you may want to handle it in a specific way. -When you can, try to use the Pythonic approach of asking for forgiveness later (ie use try/except blocks) rather than conditional statements. +Try to use a [Pythonic approach](pythonic-checks) to catch errors early when completing this activity. This means asking for forgiveness later vs. using conditional statements to check an object's state or type. ```{code-cell} ipython3 --- @@ -56,7 +59,7 @@ editable: true slideshow: slide_type: '' --- -# This works but is less pythonic +# This works but is less Pythonic as it's a "look before you leap" approach since it does an extra check before returning the title def clean_title(title): """Notice that this function checks explicitly to see if it's a list and then processes the data. """ @@ -65,34 +68,84 @@ def clean_title(title): return title ``` -## More "pythonic" - ask for forgiveness - -easier to ask for forgiveness +The function below raises an error with a custom error message. ```{code-cell} ipython3 --- editable: true slideshow: slide_type: '' +tags: [raises-exception] --- # This is the preferred way to catch an error def clean_title(title): """ - It's more Pythonic to try first and then ask for forgiveness later. - If you are writing tests this also makes your code easier to test. + Attempts to return the first character of the title. + Raises the same error with a friendly, custom error message if the input is invalid. + """ + try: + return title[0] + except IndexError as e: + raise IndexError(f"Oops! You provided a title in an unexpected format. " + f"I expected the title to be provided in a list and you provided " + f"a {type(title)}.") from e + +# Example usage: +title = "" +print(clean_title(title)) # This will raise an IndexError with the friendly message +``` + +```{code-cell} ipython3 +# This is the preferred way to catch an error +def clean_title(title): + """ + Attempts to return the first character of the title. + + Raises the same error with a friendly message if the input is invalid. + """ + try: + return title[0] + except IndexError as e: + raise IndexError(f"Oops! You provided a title in an unexpected format. I expected the title to be provided in a list and you provided a {type(title)}.") + +# Example usage: +title = "" +print(clean_title(title)) # This will raise an IndexError with the friendly message +``` + ++++ {"editable": true, "slideshow": {"slide_type": ""}} + +If you wish, you can shorten the amount of information returned in the error by adding `from None` when you raise the error. This will look nicer to a user, but you lose some detail in the error traceback. + +```{code-cell} ipython3 +# This is the preferred way to catch an error +def clean_title(title): + """ + Attempts to return the first character of the title. + Raises the same error with a friendly message if the input is invalid. """ try: return title[0] - except (TypeError, IndexError): - return title + except IndexError as e: + raise IndexError(f"Oops! You provided a title in an unexpected format. " + f"I expected the title to be provided in a list and you provided " + f"a {type(title)}.") from None + +# Example usage: +title = "" +print(clean_title(title)) # This will raise an IndexError with the friendly message ``` +++ {"editable": true, "slideshow": {"slide_type": ""}, "tags": ["hide-output", "hide-cell"]} +In this activity, you will be working with Pandas dataframes. You may find the `.apply` function to be particularly useful. + :::{tip} -### Applying functions to DataFrame values--`.apply` +### Applying functions to DataFrame values: `.apply` + +The `.apply()` function in pandas allows you to apply any function to rows or columns in a `pandas.DataFrame`. For example, you can use it to perform operations on specific column or row values. When you use `.apply()`, you can specify whether you want to apply the function across columns `(axis=0)` (the default) or across rows `(axis=1)`. -The `.apply()` function in pandas allows you to apply any function to rows or columns in a `pandas.DataFrame`. For example, You can use it to perform operations on specific column or row values. When you use `.apply()`, you can specify whether you want to apply the function across columns `(axis=0)` (the default) or across rows `(axis=1)`. For example, if you want to apply a function to each row of a DataFrame, you would use `df.apply(your_function, axis=1)`. This function is especially useful for applying logic that can’t be easily achieved with built-in pandas functions, allowing for more flexibility in data processing. +For example, if you want to apply a function to each row of a DataFrame, you would use `df.apply(your_function, axis=1)`. This function is especially useful for applying logic that can’t be easily achieved with built-in pandas functions, allowing for more flexibility in data processing. You can use `.apply` in pandas to efficiently replace `for loops` to process row and column values in a `pandas.DataFrame`. @@ -103,20 +156,34 @@ You can use `.apply` in pandas to efficiently replace `for loops` to process row ### What's changed in your workflow? :::{warning} -You have a new data file to open in your list of `.json` files in this activity. This file has some unexpected "features" that your code needs to handle gracefully so it can process all of the data. +You have a new data file to open in your list of `.json` files in this activity. This file has some unexpected "features" that your code needs to handle gracefully to process all of the data. ::: -Your goal is to make the code below run on the data provided in the activity-3 `data/` directory. +The code below is an example of what your code might look like after completing [activity 2](../activity-2/clean-code-activity-2). You can choose to work with this code, or you can use the code that you completed in activity 2. + +Your goal is to make the code below run on the data provided in the activity-3 `data/` directory. + +The code below will fail. You will need to do the following: + +1. Use a debugger to determine why it's failing. +2. Add try/except blocks and/or conditional statements to your functions that handle various exceptions. + +Your end goal is to make the code below run. :::{tip} -The code below will fail. You will likely want to use a debugger to determine why it's failing and get the code running. +* If you are using Jupyter, then you might [find this page helpful when setting up debugging.](https://jupyterlab.readthedocs.io/en/stable/user/debugger.html) +* VSCODE has a nice visual debugger that you can use. + ::: -The code below is an example of what your code might look like after completing activity 2. You can choose to work with this code, or you can use the code that you completed in activity 2. +Important: It is ok if you can't get the code to run fully by the end of this workshop. If you can: -+++ {"editable": true, "slideshow": {"slide_type": ""}, "tags": ["raises-exception"]} +1. identify at least one of the data processing "bugs" (even if you can't fix it) and/or +2. fix at least one bug -```python +You can consider your effort today as a success! We will work on the first element together as a group. + +```{code-cell} ipython3 import json from pathlib import Path @@ -159,37 +226,110 @@ def format_date(date_parts: list) -> str: Returns ------- - str - Formatted date string. + pd.datetime + A date formatted as a pd.datetime object. """ - return f"{date_parts[0]}-{date_parts[1]:02d}-{date_parts[2]:02d}" + date_str = ( + f"{date_parts[0][0]}-{date_parts[0][1]:02d}-{date_parts[0][2]:02d}" + ) + return pd.to_datetime(date_str, format="%Y-%m-%d") def clean_title(value): """A function that removes a value contained in a list.""" + print("hi", value) return value[0] -def process_published_date(date_parts): - """Parse a date provided as a list of values into a proper date format. +columns_to_keep = [ + "publisher", + "DOI", + "type", + "author", + "is-referenced-by-count", + "title", + "published.date-parts", +] + +data_dir = Path("data") + +all_papers_list = [] +for json_file in data_dir.glob("*.json"): + papers_df = load_clean_json(json_file, columns_to_keep) + + papers_df["title"] = papers_df["title"].apply(clean_title) + papers_df["published_date"] = papers_df["published.date-parts"].apply( + format_date + ) + + all_papers_list.append(papers_df) + +all_papers_df = pd.concat(all_papers_list, axis=0, ignore_index=True) + +print("Final shape of combined DataFrame:", all_papers_df.shape) +``` + ++++ {"editable": true, "slideshow": {"slide_type": ""}, "tags": ["raises-exception"]} + +```python +# Note that this code has no checks or tests in the functions provided. You will need to add them to make the code run. + +import json +from pathlib import Path + +import pandas as pd + + +def load_clean_json(file_path, columns_to_keep): + """ + Load JSON data from a file. Drop unnecessary columns and normalize + to DataFrame. + + Parameters + ---------- + file_path : Path + Path to the JSON file. + columns_to_keep : list + List of columns to keep in the DataFrame. + + Returns + ------- + dict + Loaded JSON data. + """ + + with file_path.open("r") as json_file: + json_data = json.load(json_file) + normalized_data = pd.json_normalize(json_data) + + return normalized_data.filter(items=columns_to_keep) + + +def format_date(date_parts: list) -> str: + """ + Format date parts into a string. Parameters ---------- - date_parts : str or int - The elements of a date provided as a list from CrossRef + date_parts : list + List containing year, month, and day. Returns ------- pd.datetime A date formatted as a pd.datetime object. """ - date_str = ( f"{date_parts[0][0]}-{date_parts[0][1]:02d}-{date_parts[0][2]:02d}" ) return pd.to_datetime(date_str, format="%Y-%m-%d") +def clean_title(value): + """A function that removes a value contained in a list.""" + return value[0] + + columns_to_keep = [ "publisher", "DOI", @@ -208,7 +348,7 @@ for json_file in data_dir.glob("*.json"): papers_df["title"] = papers_df["title"].apply(clean_title) papers_df["published_date"] = papers_df["published.date-parts"].apply( - process_published_date + format_date ) all_papers_list.append(papers_df) @@ -220,20 +360,111 @@ print("Final shape of combined DataFrame:", all_papers_df.shape) +++ {"editable": true, "slideshow": {"slide_type": ""}} -:::{admonition} On your own 1 +:::{admonition} Part 1 - What happens when your code can't find the data? :class: attention -Ideas for on your own welcome! +Let's break the code below and see how our code performs. + +* Note that the code below has a modified `/data` directory path (that doesn't exist!). + +Questions: +* What type of error do you expect Python to throw? Use Google, LLMs or our [tests and checks](common-exceptions) lesson to help figure this out. +* Does your code handle this error gracefully? +* How can we make the code handle it better? + +```python +import json +from pathlib import Path + +import pandas as pd + +def load_clean_json(file_path, columns_to_keep): + """ + Load JSON data from a file. Drop unnecessary columns and normalize + to DataFrame. + + Parameters + ---------- + file_path : Path + Path to the JSON file. + columns_to_keep : list + List of columns to keep in the DataFrame. + + Returns + ------- + dict + Loaded JSON data. + """ + + with file_path.open("r") as json_file: + json_data = json.load(json_file) + normalized_data = pd.json_normalize(json_data) + + # Return the pandas DataFrame, filtering out some of the columns that we don't need + return normalized_data.filter(items=columns_to_keep) + + +columns_to_keep = [ + "publisher", + "DOI", + "type", + "author", + "is-referenced-by-count", + "title", + "published.date-parts", +] + +# Break this data path by giving it a dir name that doesn't exist - what happens when your code runs? +data_dir = Path("bad-bad-data") + +all_papers_list = [] +for json_file in data_dir.glob("*.json"): + papers_df = load_clean_json(json_file, columns_to_keep) + papers_df["title"] = papers_df["title"].apply(clean_title) + + all_papers_list.append(papers_df) + +all_papers_df = pd.concat(all_papers_list, axis=0, ignore_index=True) + +print("Final shape of combined DataFrame:", all_papers_df.shape) +``` + +Your goal is to troubleshoot any issues associated with cleaning up the title so you can work with it later in a `pandas.DataFrame`. + ::: ++++ {"editable": true, "slideshow": {"slide_type": ""}, "tags": ["hide-cell"]} + +Note: we can have two groups - one that wants to work on their own and another that wants to work with the instructor together. + +++ {"editable": true, "slideshow": {"slide_type": ""}} -:::{admonition} On your own 2 +:::{admonition} Activity 3: part 1 :class: attention -Ideas welcome? + +In this activity, we will work together in groups or as a whole class to add a try/except block that handles messiness when parsing titles in the cross-ref data. + ::: -I want to have them move their code into a module if possible during this workshop but we could also kick that off in the day 2 workshop. ++++ {"editable": true, "slideshow": {"slide_type": ""}} + +:::{admonition} On your own 1 +:class: attention + +If you get through the activity above and want a second challenge, try to parse the date values for each JOSS publication. Use the published key to extract the date for each publication. You may run into some data "features" when completing this activity. + +```json +"published": { + "date-parts": [ + [ + 2020, + 7, + 4 + ] + ] +``` + +::: ```{code-cell} ipython3 --- diff --git a/clean-modular-code/checks-conditionals/python-functions.md b/clean-modular-code/checks-conditionals/about-python-functions.md similarity index 94% rename from clean-modular-code/checks-conditionals/python-functions.md rename to clean-modular-code/checks-conditionals/about-python-functions.md index 1b5d646..d800c21 100644 --- a/clean-modular-code/checks-conditionals/python-functions.md +++ b/clean-modular-code/checks-conditionals/about-python-functions.md @@ -45,7 +45,7 @@ print(x) Functions can help you to both eliminate repetition and improve efficiency by making it more modular. -Modulare code is code that is separated into independent units that can be reused and even combined to complete a longer chain of tasks. +Modular code is separated into independent units that can be reused and even combined to complete a longer chain of tasks. :::{figure} ../images/clean-code/functions-for-all-things.png :alt: You can implement strategies such as loops and functions in your code to replace tasks that you are performing over and over. Source: Francois Michonneau. @@ -93,7 +93,7 @@ When you write modular functions, you can reuse them for other workflows and pro When coding tasks step by step, you are likely creating many intermediate variables that are not needed again but are stored in your computer's memory. -By using functions, these intermediate variables are confined to the function’s local scope. Once the function finishes executing, the variables created within the function are discarded making your code cleaner and more efficient +These intermediate variables are confined to the function’s local scope by using functions. Once the function finishes executing, the variables created within the function are discarded, making your code cleaner and more efficient ## Reasons why functions improve code readability @@ -122,7 +122,7 @@ Organizing your code using functions from the beginning allows you to explicitly ### Functions and tests -While you will not learn about testing in this lesson, functions are also useful for testing. +Functions are also useful for testing. As your code gets longer and more complex, it is more prone to mistakes. For example, if your analysis relies on data that gets updated often, you may want to make sure that all data are up-to-date before performing an analysis. Or that the new data are not formatted in a different way. diff --git a/clean-modular-code/checks-conditionals/python-common-exceptions.md b/clean-modular-code/checks-conditionals/python-common-exceptions.md new file mode 100644 index 0000000..dde1bf4 --- /dev/null +++ b/clean-modular-code/checks-conditionals/python-common-exceptions.md @@ -0,0 +1,169 @@ +--- +jupytext: + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.16.4 +kernelspec: + display_name: Python 3 (ipykernel) + language: python + name: python3 +--- + ++++ {"editable": true, "slideshow": {"slide_type": ""}} + +(common-exceptions)= +### Common Python exceptions + +Python has dozens of specific errors that can be raised when code fails to run. Below are a few common ones that you may encounter in [Activity 3](clean-code-activity-3). + +### TypeError + +Occurs when an operation is applied to an object of an inappropriate type. + +```{code-cell} ipython3 +--- +editable: true +slideshow: + slide_type: '' +tags: [raises-exception] +--- +# Example: Trying to add a number and a string +1 + 'string' # This will raise a TypeError +``` + ++++ {"editable": true, "slideshow": {"slide_type": ""}} + +#### ValueError + +- **Raised when** a function receives an argument of the right type but an invalid value. +- **Example:** `int('abc')` (trying to convert an invalid string to an integer). + +```{code-cell} ipython3 +--- +editable: true +slideshow: + slide_type: '' +tags: [raises-exception] +--- +int("abc") +``` + ++++ {"editable": true, "slideshow": {"slide_type": ""}} + +#### KeyError + +- **Raised when** a dictionary key is not found. +- **Example:** `my_dict['nonexistent_key']` (trying to access a key that doesn’t exist in the dictionary). + +```{code-cell} ipython3 +--- +editable: true +slideshow: + slide_type: '' +tags: [raises-exception] +--- +# Example: Accessing a nonexistent key in a dictionary +my_dict = {"a": 1, "b": 2} +my_dict['nonexistent_key'] +``` + ++++ {"editable": true, "slideshow": {"slide_type": ""}} + +#### IndexError: + +- **Raised when** an invalid index is used to access a list or tuple. +- **Example:** `my_list[10]` (trying to access the 11th element of a list with fewer elements). + +```{code-cell} ipython3 +--- +editable: true +slideshow: + slide_type: '' +tags: [raises-exception] +--- +my_list = [1, 2, 3] +my_list[10] +``` + ++++ {"editable": true, "slideshow": {"slide_type": ""}} + +#### AttributeError: + +Raised when an object does not have a specific attribute or method. + +```{code-cell} ipython3 +--- +editable: true +slideshow: + slide_type: '' +tags: [raises-exception] +--- +my_string = "Hello" +my_string.nonexistent_method() +``` + ++++ {"editable": true, "slideshow": {"slide_type": ""}} + +#### FileNotFoundError + +A `FileNotFoundError` occurs in Python when the code attempts to open or access a file that does not exist at the specified path. + +```{code-cell} ipython3 +--- +editable: true +slideshow: + slide_type: '' +tags: [raises-exception] +--- +with open("data/nonexistent_file.json", "r") as file: + data = file.read() +``` + ++++ {"editable": true, "slideshow": {"slide_type": ""}} + +By catching this exception, you can + +1. Raise a kinder and more informative message. +2. Direct the user toward the next steps +3. FUTURE: write tests for this step of the workflow (if you are creating a package!) that make sure that it handles a bad file path properly. + +```{code-cell} ipython3 +--- +editable: true +slideshow: + slide_type: '' +tags: [raises-exception] +--- +from pathlib import Path + +file_path = Path("data") / "nonexistent_file.json" +try: + with open(file_path, "r") as file: + data = file.read() +except FileNotFoundError as fe: + raise FileNotFoundError(f"Oops! it looks like you provided a path to a file that doesn't exist. You provided: {file_path}. Make sure the file path exists. ") +``` + ++++ {"editable": true, "slideshow": {"slide_type": ""}} + +If you don't raise the error but instead provide a print statement, you can provide a simple, clean output without the full "stack" or set of Python messages that provides the full "tracking" or traceback of where the error originated. + +The challenge with not raising a FileNotFound error is that it will be a bit trickier to test the output. + +* you could do `sys.exit` too... bbut i've ru into issues with that in the past (i wish i could remember what they were ) . + +```{code-cell} ipython3 +--- +editable: true +slideshow: + slide_type: '' +--- +file_path = Path("data") / "nonexistent_file.json" +try: + with open(file_path, "r") as file: + data = file.read() +except FileNotFoundError as fe: + print(f"Oops! it looks like you provided a path to a file that doesn't exist. You provided: {file_path}. Make sure the file path exists. ") +``` diff --git a/clean-modular-code/checks-conditionals/python-conditionals.md b/clean-modular-code/checks-conditionals/python-conditionals.md index f4f6cbe..b5f001a 100644 --- a/clean-modular-code/checks-conditionals/python-conditionals.md +++ b/clean-modular-code/checks-conditionals/python-conditionals.md @@ -18,6 +18,7 @@ jupyter: name: python3 --- +(python-conditionals)= # Conditional statements in Python While there are many strategies for improving efficiency and removing repetition in code, three commonly used DRY strategies are conditional statements, loops, and functions. diff --git a/clean-modular-code/checks-conditionals/python-function-checks.md b/clean-modular-code/checks-conditionals/python-function-checks.md index a653c72..99bfac8 100644 --- a/clean-modular-code/checks-conditionals/python-function-checks.md +++ b/clean-modular-code/checks-conditionals/python-function-checks.md @@ -11,32 +11,38 @@ kernelspec: name: python3 --- -+++ {"editable": true, "slideshow": {"slide_type": ""}} +# Write Flexible Functions to Handle Messy Data -# Write Flexible Functions for Messy Data +When dealing with messy or unpredictable data, ensuring your code +It is important to handle errors early and gracefully. [Using functions](python-functions) +is a great first step in creating a robust +and maintainable data processing workflow. Functions provide modular units that +can be tested independently, allowing you to handle various edge cases and +unexpected scenarios effectively. -When dealing with messy or unpredictable data, using functions is an excellent first step in creating a robust and maintainable data processing workflow. Functions provide modular units that can be tested independently, allowing you to handle various edge cases and unexpected scenarios effectively. +Adding checks to your functions +is the next step towards making your code more robust and maintainable over time. -## Function benefits +This lesson will cover several strategies for making this happen: -Using functions in your data processing pipeline offers several advantages: -1. **Modularity**: Functions encapsulate specific tasks, making your code more organized and easier to understand. -2. **Testability**: You can test functions individually, outside of the main workflow, to ensure they handle different scenarios correctly. -3. **Flexibility**: As you build out your workflow, you can easily add elements to functions to handle new processing requirements or edge cases. -4. **Reusability**: Well-designed functions can be reused across different parts of your project or even in other projects. +1. Use [`try/except blocks`](#try-except) rather than simply allowing errors to occur. +1. [Make checks Pythonic](#pythonic-checks) +1. [Fail fast](fail-fast) -## Handling edge cases ++++ {"editable": true, "slideshow": {"slide_type": ""}} -When working with messy data, you'll often encounter edge cases - unusual or unexpected data that can break your processing pipeline. Functions allow you to implement robust error handling and data validation. Here are some techniques you can use: +(try-except)= +## Use Try/Except blocks -+++ {"editable": true, "slideshow": {"slide_type": ""}} +`try/except` blocks in Python help you handle errors gracefully instead of letting your program crash. They are used when you think a part of your code might fail, like when working with missing data, or when converting data types. -## Try/Except blocks +Here’s how try/except blocks work: -Try/except blocks help you catch and handle errors that might happen while your code is running. This is useful for tasks that might fail, like converting data types or working with data that’s missing or incorrect. +* **try block:** You write the code that might cause an error here. Python will attempt to run this code. +* **except block:** If Python encounters an error in the try block, it jumps to the except block to handle it. You can specify what to do when an error occurs, such as printing a friendly message or providing a fallback option. -A try/except block looks like this: +A `try/except` block looks like this: ```python try: @@ -45,6 +51,10 @@ except SomeError: # what to do if there's an error ``` +:::{tip} +We pulled together some of the more [common exceptions that Python can throw here](common-exceptions). +::: + ```{code-cell} ipython3 --- editable: true @@ -55,7 +65,7 @@ def convert_to_int(value): try: return int(value) except ValueError: - print("Oops i can't process this so I will fail gracefully.") + print("Oops i can't process this so I will fail quietly with a print statement.") return None # or some default value ``` @@ -79,77 +89,69 @@ convert_to_int("abc") +++ {"editable": true, "slideshow": {"slide_type": ""}} -This function attempts to convert a value to an integer, returning None and a message if the conversion fails. - -## Make checks Pythonic - -Python has a unique philosophy regarding handling potential errors or exceptional cases. This philosophy is often summarized by the acronym EAFP: "Easier to Ask for Forgiveness than Permission." +This function attempts to convert a value to an integer, returning `None` and a message if the conversion fails. However, is that message helpful to a person using your code? -### EAFP vs. LBYL - -There are two main approaches to handling potential errors: ++++ {"editable": true, "slideshow": {"slide_type": ""}} -**LBYL (Look Before You Leap)**: Check for conditions before making calls or accessing data. -**EAFP (Easier to Ask for Forgiveness than Permission)**: Assume the operation will succeed and handle any exceptions if they occur. +(fail-fast)= +## Fail fast strategy -Pythonic code generally favors the EAFP approach. +Identify data processing or workflow problems immediately when they occur and throw an error immediately rather than allowing +them to propagate through your code. This approach saves time and simplifies debugging, providing clearer, more useful error outputs (stack traces). Below, you can see that the code tries to open a file, but Python can't find the file. In response, Python throws a `FileNotFoundError`. ```{code-cell} ipython3 --- editable: true slideshow: slide_type: '' +tags: [raises-exception] --- -# LBYL approach - manually check that the user provides a int -def convert_to_int(value): - if isinstance(value, int): - return int(value) - else: - print("Oops i can't process this so I will fail gracefully.") - return None +# Open a file (but it doesn't exist +def read_file(file_path): + with open(file_path, 'r') as file: + data = file.read() + return data -convert_to_int(1) -convert_to_int("a") +file_data = read_file("nonexistent_file.txt") ``` ++++ {"editable": true, "slideshow": {"slide_type": ""}} + +You could anticipate a user providing a bad file path. This might be especailly possible if you plan to share your code with others and run it on different computers and different operating systems. + +In the example below, you use a [conditional statement](python-conditionals) to check if the file exists; if it doesn't, it returns None. In this case, the code will fail quietly, and the user will not understand that there is an error. + +This is also dangerous territory for a user who may not understand why the code runs but doesn't work. + ```{code-cell} ipython3 --- editable: true slideshow: slide_type: '' --- -# EAFP approach - Consider what the user might provide and catch the error. -def convert_to_int(value): - try: - return int(value) - except ValueError: - print("Oops i can't process this so I will fail gracefully.") - return None # or some default value - -convert_to_int(1) -convert_to_int("a") -``` - -+++ {"editable": true, "slideshow": {"slide_type": ""}} +import os -The EAFP (Easier to Ask for Forgiveness than Permission) approach is more Pythonic because: - -* It’s often faster, avoiding redundant checks when operations succeed. -* It’s more readable, separating the intended operation and error handling. - -## Any Check is a Good Check +def read_file(file_path): + if os.path.exists(file_path): + with open(file_path, 'r') as file: + data = file.read() + return data + else: + return None # Doesn't fail immediately, just returns None -As long as you consider edge cases, you're writing great code! You don’t need to worry about being “Pythonic” immediately, but understanding both approaches is useful regardless of which approach you chose. +# No error raised, even though the file doesn't exist +file_data = read_file("nonexistent_file.txt") +``` +++ {"editable": true, "slideshow": {"slide_type": ""}} -## Common Python exceptions +This code example below is better than the examples above for three reasons: -Python has dozens of specific errors that can be raised when code fails to run. Below are a few common ones that you may encounter in the activity 3. - -### TypeError +1. It's **pythonic**: it asks for forgiveness later by using a try/except +2. It fails quickly - as soon as it tries to open the file. The code won't continue to run after this step fails. +3. It raises a clean, useful error that the user can understand -Occurs when an operation is applied to an object of an inappropriate type. +The code anticipates what will happen if it can't find the file. It then raises a `FileNotFoundError` and provides a useful and friendly message to the user. ```{code-cell} ipython3 --- @@ -158,80 +160,112 @@ slideshow: slide_type: '' tags: [raises-exception] --- -# Example: Trying to add a number and a string -1 + 'string' # This will raise a TypeError +def read_file(file_path): + try: + with open(file_path, 'r') as file: + data = file.read() + return data + except FileNotFoundError: + raise FileNotFoundError(f"Oops! I couldn't find the file located at: " + f"{file_path}. Please check to see if it exists") + +# Raises an error immediately if the file doesn't exist +file_data = read_file("nonexistent_file.txt") ``` -+++ {"editable": true, "slideshow": {"slide_type": ""}} +## Customizing error messages + +The code above is useful because it fails and provides a simple and effective message that tells the user to check that their file path is correct. -### ValueError +However, the amount of text returned from the error is significant because it finds the error when it can't open the file. Still, then you raise the error intentionally within the except statement. -* **Raised when** a function receives an argument of the right type but an invalid value. -* **Example:** `int('abc')` (trying to convert an invalid string to an integer). +If you wanted to provide less information to the user, you could use `from None`. From None ensure that you +only return the exception information related to the error that you handle within the try/except block. ```{code-cell} ipython3 ---- -editable: true -slideshow: - slide_type: '' -tags: [raises-exception] ---- -int("abc") +def read_file(file_path): + try: + with open(file_path, 'r') as file: + data = file.read() + return data + except FileNotFoundError: + raise FileNotFoundError(f"Oops! I couldn't find the file located at: {file_path}. " + "Please check to see if it exists") from None + +# Raises an error immediately if the file doesn't exist +file_data = read_file("nonexistent_file.txt") ``` +++ {"editable": true, "slideshow": {"slide_type": ""}} -### KeyError +(pythonic-checks)= +## Make Checks Pythonic -* **Raised when** a dictionary key is not found. -* **Example:** `my_dict['nonexistent_key']` (trying to access a key that doesn’t exist in the dictionary). +Python has a unique philosophy regarding handling potential errors or +exceptional cases. This philosophy is often summarized by the acronym EAFP: +"Easier to Ask for Forgiveness than Permission." When combined with the **fail +fast** approach, your code can be flexible and resilient to the messy +realities of data processing. + +### EAFP vs. LBYL + +There are two main approaches to handling potential errors: + +- **LBYL (Look Before You Leap)**: Check for conditions before making calls or + accessing data. +- **EAFP (Easier to Ask for Forgiveness than Permission)**: Assume the operation + will succeed and handle any exceptions if they occur. + +Pythonic code generally favors the EAFP approach, which allows for **failing +fast** when an error occurs, providing useful feedback without unnecessary +checks. ```{code-cell} ipython3 --- editable: true slideshow: slide_type: '' -tags: [raises-exception] --- -# Example: Accessing a nonexistent key in a dictionary -my_dict = {"a": 1, "b": 2} -my_dict['nonexistent_key'] -``` - -+++ {"editable": true, "slideshow": {"slide_type": ""}} - -### IndexError +# LBYL approach - manually check that the user provides a int +def convert_to_int(value): + if isinstance(value, int): + return int(value) + else: + print("Oops i can't process this so I will fail gracefully.") + return None -* **Raised when** an invalid index is used to access a list or tuple. -* **Example:** `my_list[10]` (trying to access the 11th element of a list with fewer elements). +convert_to_int(1) +convert_to_int("a") +``` ```{code-cell} ipython3 --- editable: true slideshow: slide_type: '' -tags: [raises-exception] --- -my_list = [1, 2, 3] -my_list[10] +# EAFP approach - Consider what the user might provide and catch the error. +def convert_to_int(value): + try: + return int(value) + except ValueError: + print("Oops i can't process this so I will fail gracefully.") + return None # or some default value + +convert_to_int(1) +convert_to_int("a") ``` +++ {"editable": true, "slideshow": {"slide_type": ""}} -### AttributeError +The EAFP (Easier to Ask for Forgiveness than Permission) approach is more Pythonic because: + +* It’s often faster, avoiding redundant checks when operations succeed. +* It’s more readable, separating the intended operation and error handling. -Raised when an object does not have a specific attribute or method. +## Any Check is a Good Check -```{code-cell} ipython3 ---- -editable: true -slideshow: - slide_type: '' -tags: [raises-exception] ---- -my_string = "Hello" -my_string.nonexistent_method() -``` +As long as you consider edge cases, you're writing great code! You don’t need to worry about being “Pythonic” immediately, but understanding both approaches is useful regardless of which approach you chose. ```{code-cell} ipython3 --- diff --git a/clean-modular-code/checks-conditionals/python-functions-multi-parameters.md b/clean-modular-code/checks-conditionals/python-functions-multi-parameters.md new file mode 100644 index 0000000..538767b --- /dev/null +++ b/clean-modular-code/checks-conditionals/python-functions-multi-parameters.md @@ -0,0 +1,608 @@ +--- +layout: single +title: 'Write Functions with Multiple Parameters in Python' +excerpt: "A function is a reusable block of code that performs a specific task. Learn how to write functions that can take multiple as well as optional parameters in Python to eliminate repetition and improve efficiency in your code." +authors: ['Jenny Palomino', 'Leah Wasser'] +category: [courses] +class-lesson: ['intro-functions-tb'] +permalink: /courses/intro-to-earth-data-science/write-efficient-python-code/functions-modular-code/write-functions-with-multiple-and-optional-parameters-in-python/ +nav-title: "Write Multi-Parameter Functions in Python" +dateCreated: 2019-11-12 +modified: '{:%Y-%m-%d}'.format(datetime.now()) +module-type: 'class' +chapter: 19 +course: "intro-to-earth-data-science-textbook" +week: 7 +sidebar: + nav: +author_profile: false +comments: true +order: 3 +topics: + reproducible-science-and-programming: ['python'] +redirect_from: + - "/courses/intro-to-earth-data-science/write-efficient-python-code/functions/write-functions-with-multiple-and-optional-parameters-in-python/" +jupyter: + jupytext: + formats: ipynb,md + text_representation: + extension: .md + format_name: markdown + format_version: '1.3' + jupytext_version: 1.16.4 + kernelspec: + display_name: Python 3 (ipykernel) + language: python + name: python3 +--- + + +{% include toc title="On This Page" icon="file-text" %} + +