Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: edits to activity 3 and supporting lesson #21

Merged
merged 8 commits into from
Oct 17, 2024

Conversation

lwasser
Copy link
Member

@lwasser lwasser commented Oct 11, 2024

This is the start to revisiting concepts taught in activity 3. it's a followup of merged pr #18 I will bring several comments over from @ucodery

  1. encourage the fail fast approach to checks.
  2. Raise errors with useful messages rather than using print statements. Print statements will allow the program to continue to run vs raising the error properly so this will avoid failing quietly and encourage fast fails.

We want to make sure that most of the lesson content is in the lesson rather than the activity but we can then link to it in the activity.

Two suggestions I have for this lesson:
encourage attendees to "fail fast"
only when it is unrecoverable
this makes the failure happen earlier, wasting less time waiting
this can make the stack trace more useful, as it points the the code experiencing the error

clearly describe the error.
rather than just printing out "exiting now", try to tell the caller why it was a mistake, and what they could maybe do to fix it.
maybe also explain that exceptions can (most of the time) take a message string, and it doesn't have to be printed also.

also - this is the example provided around failing fast

I think this content could be a single workshop or even a day or two of programming on its own, but we will keep it high-level.

try:
return int(value)
except ValueError:
print("Oops i can't process this so I will fail gracefully.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this isn't failing gracefully, this is recovering. Both are valid, depending on what guarantees the function wants to make. But I would probably not print any message if I was able to recover from an error and the rest of the program could continue

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ucodery, I appreciate this. I was actually going to ask you about this. There are two sides here (I think?). If I raise the error, then I get the full stack output. My message is at the bottom. For a lot of people, that's a lot of "stuff" to read.
Screenshot 2024-10-11 at 5 09 30 PM

If i print the error - i understand that this is a "recovery" as you call it above. from a user perspective (end user) it looks prettier / cleaner is easier to digest.

Screenshot 2024-10-11 at 5 10 41 PM

The program continues to run, which is problematic because, in this case, it will fail later because it has no data to process. From a development perspective, raising the error is nice for testing and feels more robust. it also should fail at this point for our demo workflow in this workshop because this means there is no data! But again the traceback output is a lot for a user to digest and has a lot of info that most users don't need.

i'm curious what your thoughts are here in terms of what to suggest people do - or how do they make decisions?

After working through this, I was thinking about providing an easier example check around data that throws a key error or index error . That's easier because we'd want to keep processing but do something with that data to clean it even it that something is returning None

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can appreciate that printing the error message makes for a nicer intro to errors at runtime for students. I think my reaction to this is because print is in the except block, but then no error is raised. So the exception is passively squashed, and execution continues, without a data variable, possible causing issues later on (the opposite of "fail fast").

If you do (re)raise an exception, the student is now faced with the info dump of a stack trace as well as an error message. To avoid this, the program could sys.exit() or in some other way terminate.


As an aside, part of the reason there is so much going on in your first example is that the code is raising a new exception inside of an except block, which triggers a chained exception. This can be very useful to debugging, but to suppress the extra information that is not helpful in this case, raise FileNotFoundError("Oops!") from None can be used.

This has the advantage of reducing unnecessary information, but makes teaching the raise and except statements potentially harder.

Another strategy, which adds complication to introducing basic error handling, but gives more control over the output, is to explicitly format exceptions. For instance, there is a stdlib that lets you do

try:
    with open(file_path, "r") as file:
        data = file.read()
except FileNotFoundError as fe:
    print("".join(traceback.format_exception_only(fe)))
    sys.exit()

and there are 3rd party libraries that are meant to enhance tracebacks. Often this means adding more information, but there are ways to instead reduce the overall information (like suppressing foreign libraries).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you @ucodery ok I think that I just learned a LOT more about exceptions. It is the chaining that I didn't fully understand. I know how to read the chains, but I didn't know about from None and about the concept of chaining. thank you.

this lesson is actually a lot more complex that I originally thought it would be!

Copy link
Contributor

@willingc willingc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good.

clean-modular-code/activity-3/clean-code-activity-3.md Outdated Show resolved Hide resolved
clean-modular-code/activity-3/clean-code-activity-3.md Outdated Show resolved Hide resolved
clean-modular-code/activity-3/clean-code-activity-3.md Outdated Show resolved Hide resolved
clean-modular-code/activity-3/clean-code-activity-3.md Outdated Show resolved Hide resolved
@@ -220,20 +355,101 @@ print("Final shape of combined DataFrame:", all_papers_df.shape)

+++ {"editable": true, "slideshow": {"slide_type": ""}}

:::{admonition} On your own 1
## What i don't like
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## What i don't like
## What I don't like

## Handling edge cases
Your goal is to identify detect and data processing or workflow problems immediately when they occur, rather than allowing
them to propagate through your code. This approach saves time and makes
debugging easier, providing clearer, more useful error outputs (known as stack traces).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
debugging easier, providing clearer, more useful error outputs (known as stack traces).
debugging easier, providing clearer, more useful error outputs (known as tracebacks).


When working with messy data, you'll often encounter edge cases - unusual or unexpected data that can break your processing pipeline. Functions allow you to implement robust error handling and data validation. Here are some techniques you can use:
When working with messy data, you'll often encounter edge cases - unusual or unexpected data that can break your processing pipeline. Functions allow you to implement robust error handling and data validation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When working with messy data, you'll often encounter edge cases - unusual or unexpected data that can break your processing pipeline. Functions allow you to implement robust error handling and data validation.
When working with messy data, you'll often encounter unanticipated edge cases, such as unusual or unexpected data that can break your processing pipeline. Functions allow you to implement robust error handling and data validation.

except FileNotFoundError:
return f"File not found: {file_path}"
except Exception as e:
return f"An error occurred while reading {file_path}: {e}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return f"An error occurred while reading {file_path}: {e}"
return f"An unexpected error occurred while reading {file_path}: {e}"

Comment on lines +158 to +167
There are two main approaches to handling potential errors:

- **LBYL (Look Before You Leap)**: Check for conditions before making calls or
accessing data.
- **EAFP (Easier to Ask for Forgiveness than Permission)**: Assume the operation
will succeed and handle any exceptions if they occur.

Pythonic code generally favors the EAFP approach, which allows for **failing
fast** when an error occurs, providing useful feedback without unnecessary
checks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that this is generally true since it depends on the use case, but YOLO.

@lwasser
Copy link
Member Author

lwasser commented Oct 17, 2024

ok merging this!! The most recent visual version is on circle ci for now.

@lwasser lwasser merged commit db4b2c0 into pyOpenSci:main Oct 17, 2024
1 of 2 checks passed
@lwasser lwasser deleted the activity-3b branch October 17, 2024 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants