Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve error message when 2024 files are passed to process_data() #32

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

mthomas-ketchbrook
Copy link
Collaborator

This PR improves the error message seen by users when 2024 files downloaded from FCA are processed via process_data() (see #23). There is also an additional test added to process_data() for the condition when the dir argument is a path that does not exist.

"Bad" 2024 data files have been added in the tests/testthat/bad_data/ directory, so that the unit test can pass even in the event that FCA fixes the files posted on the website.

Lastly, this PR re-introduces the {cli} package as a dependency. This shouldn't be an issue, as {cli} is also a dependency of other packages that are already listed in DESCRIPTION (e.g., {dplyr}). I believe that for the purposes of what we are trying to explain to the user, simply using {rlang} (as we do in the rest of this package) does not suffice.

@mthomas-ketchbrook mthomas-ketchbrook linked an issue Dec 13, 2024 that may be closed by this pull request
Copy link
Collaborator

@IvanM26 IvanM26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good! I have some comments that, in my opinion, are worth discussing!

bad_data

I was skeptical about having the .TXT files in the package (due to package size) but I checked that their combined size is around 0.5Mb, which seems okay.

Unit test output

As of now, running testthat::test_dir("tests/testthat") shows the following in the console:
image

In an ideal scenario, I think I would prefer to avoid the in-between messages. Do you think we should address that, and if so, if we should address that as part of this task?

About the error message

Running process_data(dir = testthat::test_path("bad_data")) shows the following output:

image

  1. Given that we are "capturing" this error (i.e. we know why it's happening), I'm wondering whether we should "keep it simple" and find a way to not show the Backtrace. I think the Backtrace adds a layer of difficulty to read our error message that explains the problem.

  2. I think we should rephrase the error message a bit, making it more "general". Right now is focusing on "2024", but what if the error continues in 2025? I know that the problem would be the same, but it would feel weird reading about 2024 errors when processing 2025 data.

  3. Shouldn't we say first that there is an error and then provide the info about it? Currently, we see a message, then something that looks like a cryptic error and finally the Backtrace. I'm also wondering if we should show the "actual" error (the purrr problem etc) or if we just need to comment on the file problems without worrying about the "cryptic" error. Basically, I'm thinking whether we should have a check on files as they are processed and say something like "there is an error in file XYZ" and then point to the discussion in GitHub.

@mthomas-ketchbrook
Copy link
Collaborator Author

I was skeptical about having the .TXT files in the package (due to package size) but I checked that their combined size is around 0.5Mb, which seems okay.

When an R package is installed, the tests/ directory is not installed with the package.

In an ideal scenario, I think I would prefer to avoid the in-between messages. Do you think we should address that, and if so, if we should address that as part of this task?

I agree -- but I couldn't figure out how to fix this. I'd like to do this as part of this task, but don't want to spend more than 30 minutes troubleshooting it.

  1. Given that we are "capturing" this error (i.e. we know why it's happening), I'm wondering whether we should "keep it simple" and find a way to not show the Backtrace. I think the Backtrace adds a layer of difficulty to read our error message that explains the problem.

I completely agree -- I think the backtrace makes the error more difficult to interpret. I couldn't figure out how to fix this. I'd like to do this as part of this task, but don't want to spend more than 30 minutes troubleshooting it.

  1. I think we should rephrase the error message a bit, making it more "general". Right now is focusing on "2024", but what if the error continues in 2025? I know that the problem would be the same, but it would feel weird reading about 2024 errors when processing 2025 data.

I think that's a great idea. However, I don't think we will have to face this concern until 2025 Q1 data is published in ~May 2025. I think we should keep it as is and address the need to change it at that time.

  1. Shouldn't we say first that there is an error and then provide the info about it? Currently, we see a message, then something that looks like a cryptic error and finally the Backtrace. I'm also wondering if we should show the "actual" error (the purrr problem etc) or if we just need to comment on the file problems without worrying about the "cryptic" error. Basically, I'm thinking whether we should have a check on files as they are processed and say something like "there is an error in file XYZ" and then point to the discussion in GitHub.

I think that's a great idea, too. With the backtrace, I was worried that the audience might not scroll down far enough in the console to see our "Note". If we can get rid of the backtrace, I am open to reversing the order of the output (i.e., error first, message second).

I know that with the way the package currently operates (before this PR was introduced), {rlang} actually does a pretty nice job of articulating that the error is occurring in file RCR7. With some of the tryCatch() development work I was doing, I was having a hard time preserving this error message such that it continued to display the information about file RCR7. If there is a way that we can make the error message more straightforward & as informative as possible, I am open to all ideas!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

process_data() throws an error with 2024 data
2 participants