Skip to content

Commit

Permalink
feat(functions): start at function lessons
Browse files Browse the repository at this point in the history
  • Loading branch information
lwasser committed Oct 16, 2024
1 parent 2cae635 commit 59bc51b
Show file tree
Hide file tree
Showing 5 changed files with 403 additions and 248 deletions.
39 changes: 24 additions & 15 deletions clean-modular-code/checks-conditionals/python-function-checks.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,22 @@ kernelspec:

+++ {"editable": true, "slideshow": {"slide_type": ""}}

# Write Flexible Functions to Handle Messy Data
# Write Flexible Functions for Messy Data

When dealing with messy or unpredictable data, [using functions](python-functions) is an excellent first step in creating a robust and maintainable data processing workflow. Functions provide modular units that can be tested independently, allowing you to handle various edge cases and unexpected scenarios effectively.
When dealing with messy or unpredictable data, using functions is an excellent first step in creating a robust and maintainable data processing workflow. Functions provide modular units that can be tested independently, allowing you to handle various edge cases and unexpected scenarios effectively.

## Function benefits

Using functions in your data processing pipeline offers several advantages:

1. **Modularity**: Functions encapsulate specific tasks, making your code more organized and easier to understand.
2. **Testability**: You can test functions individually, outside of the main workflow, to ensure they handle different scenarios correctly.
3. **Flexibility**: As you build out your workflow, you can easily add elements to functions to handle new processing requirements or edge cases.
4. **Reusability**: Well-designed functions can be reused across different parts of your project or even in other projects.

## Handling edge cases

When working with messy data, you'll often encounter edge cases - unusual or unexpected data that can break your processing pipeline. You can add checks to your functions to handle potentail errors you may encounter in your data.
When working with messy data, you'll often encounter edge cases - unusual or unexpected data that can break your processing pipeline. Functions allow you to implement robust error handling and data validation. Here are some techniques you can use:

+++ {"editable": true, "slideshow": {"slide_type": ""}}

Expand Down Expand Up @@ -83,7 +92,7 @@ There are two main approaches to handling potential errors:
**LBYL (Look Before You Leap)**: Check for conditions before making calls or accessing data.
**EAFP (Easier to Ask for Forgiveness than Permission)**: Assume the operation will succeed and handle any exceptions if they occur.

Pythonic code generally favors the EAFP approach.
Pythonic code generally favors the EAFP approach.

```{code-cell} ipython3
---
Expand Down Expand Up @@ -134,10 +143,10 @@ As long as you consider edge cases, you're writing great code! You don’t need

+++ {"editable": true, "slideshow": {"slide_type": ""}}

## Common Python exceptions
## Common Python exceptions

Python has dozens of specific errors that can be raised when code fails to run. Below are a few common ones that you may encounter in the activity 3.
Python has dozens of specific errors that can be raised when code fails to run. Below are a few common ones that you may encounter in the activity 3.
### TypeError

Occurs when an operation is applied to an object of an inappropriate type.
Expand All @@ -157,8 +166,8 @@ tags: [raises-exception]

### ValueError

- **Raised when** a function receives an argument of the right type but an invalid value.
- **Example:** `int('abc')` (trying to convert an invalid string to an integer).
* **Raised when** a function receives an argument of the right type but an invalid value.
* **Example:** `int('abc')` (trying to convert an invalid string to an integer).

```{code-cell} ipython3
---
Expand All @@ -174,8 +183,8 @@ int("abc")

### KeyError

- **Raised when** a dictionary key is not found.
- **Example:** `my_dict['nonexistent_key']` (trying to access a key that doesn’t exist in the dictionary).
* **Raised when** a dictionary key is not found.
* **Example:** `my_dict['nonexistent_key']` (trying to access a key that doesn’t exist in the dictionary).

```{code-cell} ipython3
---
Expand All @@ -191,10 +200,10 @@ my_dict['nonexistent_key']

+++ {"editable": true, "slideshow": {"slide_type": ""}}

### IndexError:
### IndexError

- **Raised when** an invalid index is used to access a list or tuple.
- **Example:** `my_list[10]` (trying to access the 11th element of a list with fewer elements).
* **Raised when** an invalid index is used to access a list or tuple.
* **Example:** `my_list[10]` (trying to access the 11th element of a list with fewer elements).

```{code-cell} ipython3
---
Expand All @@ -209,7 +218,7 @@ my_list[10]

+++ {"editable": true, "slideshow": {"slide_type": ""}}

### AttributeError:
### AttributeError

Raised when an object does not have a specific attribute or method.

Expand Down
140 changes: 140 additions & 0 deletions clean-modular-code/checks-conditionals/python-functions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
---
title: "Why Functions Make Your Scientific Code Better"
excerpt: "A function is a reusable block of code that performs a specific task. Learn how to use functions to write DRY (Do Not Repeat Yourself) code in Python."
jupyter:
jupytext:
formats: ipynb,md
text_representation:
extension: .md
format_name: markdown
format_version: '1.3'
jupytext_version: 1.16.4
kernelspec:
display_name: Python 3 (ipykernel)
language: python
name: python3
---

<!-- #region editable=true slideshow={"slide_type": ""} -->
# Why write functions?

There are several strategies for making your code more modular. Here, you will learn about
functions as one strategy that eliminates repetition in your code and also can improve the
efficiency and maintainability of your code.

A function is a reusable block of code that performs a specific task. Functions receive inputs to which code is applied and return outputs (or results) of the code.

`input parameter –> function does something –> output results`

:::{tip}

Functions (and classes) are become the base for creating Python packages.

:::

For example:
<!-- #endregion -->

```python editable=true slideshow={"slide_type": ""}
x = 5
# The print statement is a function that provides output.
print(x)
```

<!-- #region editable=true slideshow={"slide_type": ""} -->
Functions can help you to both eliminate repetition and improve efficiency by making it more modular.


Modulare code is code that is separated into independent units that can be reused and even combined to complete a longer chain of tasks.

:::{figure} ../images/clean-code/functions-for-all-things.png
:alt: You can implement strategies such as loops and functions in your code to replace tasks that you are performing over and over. Source: Francois Michonneau.
:target: ../images/clean-code/functions-for-all-things.png

You can use loops and functions in your code to replace repeating tasks.
Source: Francois Michonneau.
:::

## The benefits of functions

* **Modularity:** Functions only need to be defined once in a workflow. Functions that you write for specific tasks can be used over and over without redefining the function again. A function that you write for one **Python** workflow can also be reused in other workflows, especially if you [make your code installable](https://www.pyopensci.org/python-package-guide/tutorials/installable-code.html).

* **Fewer variables:** When you run a function, the intermediate variables that the function creates are not by default stored as explicit variables. These placeholder variables are thrown out once the function has run so it saves memory and keeps your **Python** environment cleaner.
* **Better documentation:** Well-documented functions help other users understand the steps of your processing and helps your future self to understand previously written code.
* **Easier to maintain and edit your code:** Because a function is only defined once in the workflow, you can simply just update the original function definition. Then, each instance in which you call that function in your code (i.e., when the same task is performed) is automatically updated.
* **Tests & checks:** Writing functions allows you to handle issues and edge cases in your code. It also can make it easier to write tests for your code.

### Write modular functions and code

A well-defined function only does one thing but does it well and often in various contexts. Often, the operations in a good function are useful for many tasks.

Take, for instance, the **numpy** function called `mean()`, which computes mean values from a **numpy** array.

This function only does one thing-- it computes a mean. However, you may use the `np.mean()` function many times in your code on multiple **numpy** arrays because it has been defined to take any **numpy** array as an input.

For example:
<!-- #endregion -->

```python
import numpy as np
arr = np.array([1, 2, 3])

# Calculate mean of input array
np.mean(arr)
```

<!-- #region editable=true slideshow={"slide_type": ""} -->
The `np.mean()` function is modular, and it can be easily combined with other functions to accomplish various tasks.

When you write modular functions, you can reuse them for other workflows and projects. Some people even write their own **Python** packages for personal and professional use that contain custom functions for tasks that they have to complete regularly.

### Variables produced in functions are discarded after the function runs

When coding tasks step by step, you are likely creating many intermediate variables that are not needed again but are
stored in your computer's memory.

By using functions, these intermediate variables are confined to the function’s local scope. Once the function finishes executing, the variables created within the function are discarded making your code cleaner and more efficient

## Reasons why functions improve code readability

### Functions help you document your code

Ideally, your code is easy to understand and is well-documented with **Python** comments (or **Markdown** in **Jupyter Notebook**) and expressive variable and function names. However, what might seem clear to you now might not be clear 6 months from now, or even 3 weeks from now.

Well-written functions help you document your workflow if:

* They have clear docstrings that outline the function's inputs and outputs.
* They use descriptive names that clearly describe the function's task.

### Expressive function names make code self-describing

When writing your own functions, you should name functions using verbs and/or clear labels to indicate what the function does (i.e., `in_to_mm` for converting inches to millimeters).

This makes your code more expressive (or self-describing) and, in turn, makes it easier to read for you, your future self, and your colleagues.

### Modular code is easier to maintain and edit

It can be challenging to maintain and edit if your code is written line by line (with repeated code in multiple parts of your script).

Imagine having to fix one element of a code line repeated many times. You will have to find and replace that code to implement the fix in EVERY INSTANCE it occurs in your code!

Organizing your code using functions from the beginning allows you to explicitly document the tasks that your code performs, as all code and documentation for the function is contained in the function definition.

### Functions and tests

While you will not learn about testing in this lesson, functions are also useful for testing.

As your code gets longer and more complex, it is more prone to mistakes. For example, if your analysis relies on data that gets updated often, you may want to make sure that all data are up-to-date before performing an analysis. Or that the new data are not formatted in a different way.

Changes in data structure and format could break your code. In the worst-case scenario, your code may run but return the wrong values.

If all your code is composed of functions (with built-in tests and checks to ensure that they run as expected), then you can control the input to the function and test that the output returned is correct for that input. It is something that would be difficult to do if all of your code is written line by line with repeated steps.

## Summary

It is a good idea to learn how to:

1. Modularize your code into generalizable tasks using functions.
2. Write functions for parts of your code that include repeated steps.
3. Document your functions clearly, specifying the structure of the inputs and outputs with clear comments about what the function can do.
<!-- #endregion -->
Loading

0 comments on commit 59bc51b

Please sign in to comment.