Improve error message when 2024 files are passed to `process_data()` #32

mthomas-ketchbrook · 2024-12-13T21:27:22Z

This PR improves the error message seen by users when 2024 files downloaded from FCA are processed via process_data() (see #23). There is also an additional test added to process_data() for the condition when the dir argument is a path that does not exist.

"Bad" 2024 data files have been added in the tests/testthat/bad_data/ directory, so that the unit test can pass even in the event that FCA fixes the files posted on the website.

Lastly, this PR re-introduces the {cli} package as a dependency. This shouldn't be an issue, as {cli} is also a dependency of other packages that are already listed in DESCRIPTION (e.g., {dplyr}). I believe that for the purposes of what we are trying to explain to the user, simply using {rlang} (as we do in the rest of this package) does not suffice.

renamed old `process_data() fn to `process_data_all()`

IvanM26

The code looks good! I have some comments that, in my opinion, are worth discussing!

`bad_data`

I was skeptical about having the .TXT files in the package (due to package size) but I checked that their combined size is around 0.5Mb, which seems okay.

Unit test output

As of now, running testthat::test_dir("tests/testthat") shows the following in the console:

In an ideal scenario, I think I would prefer to avoid the in-between messages. Do you think we should address that, and if so, if we should address that as part of this task?

About the error message

Running process_data(dir = testthat::test_path("bad_data")) shows the following output:

Given that we are "capturing" this error (i.e. we know why it's happening), I'm wondering whether we should "keep it simple" and find a way to not show the Backtrace. I think the Backtrace adds a layer of difficulty to read our error message that explains the problem.
I think we should rephrase the error message a bit, making it more "general". Right now is focusing on "2024", but what if the error continues in 2025? I know that the problem would be the same, but it would feel weird reading about 2024 errors when processing 2025 data.
Shouldn't we say first that there is an error and then provide the info about it? Currently, we see a message, then something that looks like a cryptic error and finally the Backtrace. I'm also wondering if we should show the "actual" error (the purrr problem etc) or if we just need to comment on the file problems without worrying about the "cryptic" error. Basically, I'm thinking whether we should have a check on files as they are processed and say something like "there is an error in file XYZ" and then point to the discussion in GitHub.

mthomas-ketchbrook · 2024-12-20T20:12:06Z

I was skeptical about having the .TXT files in the package (due to package size) but I checked that their combined size is around 0.5Mb, which seems okay.

When an R package is installed, the tests/ directory is not installed with the package.

In an ideal scenario, I think I would prefer to avoid the in-between messages. Do you think we should address that, and if so, if we should address that as part of this task?

I agree -- but I couldn't figure out how to fix this. I'd like to do this as part of this task, but don't want to spend more than 30 minutes troubleshooting it.

Given that we are "capturing" this error (i.e. we know why it's happening), I'm wondering whether we should "keep it simple" and find a way to not show the Backtrace. I think the Backtrace adds a layer of difficulty to read our error message that explains the problem.

I completely agree -- I think the backtrace makes the error more difficult to interpret. I couldn't figure out how to fix this. I'd like to do this as part of this task, but don't want to spend more than 30 minutes troubleshooting it.

I think we should rephrase the error message a bit, making it more "general". Right now is focusing on "2024", but what if the error continues in 2025? I know that the problem would be the same, but it would feel weird reading about 2024 errors when processing 2025 data.

I think that's a great idea. However, I don't think we will have to face this concern until 2025 Q1 data is published in ~May 2025. I think we should keep it as is and address the need to change it at that time.

Shouldn't we say first that there is an error and then provide the info about it? Currently, we see a message, then something that looks like a cryptic error and finally the Backtrace. I'm also wondering if we should show the "actual" error (the purrr problem etc) or if we just need to comment on the file problems without worrying about the "cryptic" error. Basically, I'm thinking whether we should have a check on files as they are processed and say something like "there is an error in file XYZ" and then point to the discussion in GitHub.

I think that's a great idea, too. With the backtrace, I was worried that the audience might not scroll down far enough in the console to see our "Note". If we can get rid of the backtrace, I am open to reversing the order of the output (i.e., error first, message second).

I know that with the way the package currently operates (before this PR was introduced), {rlang} actually does a pretty nice job of articulating that the error is occurring in file RCR7. With some of the tryCatch() development work I was doing, I was having a hard time preserving this error message such that it continued to display the information about file RCR7. If there is a way that we can make the error message more straightforward & as informative as possible, I am open to all ideas!

mthomas-ketchbrook added 4 commits December 13, 2024 15:24

Wrapped process_data() in tryCatch

4b673f1

renamed old `process_data() fn to `process_data_all()`

Add error handling for bad path

2aab8de

Add tests

43ed9eb

Add bad data for unit test

fb7b944

mthomas-ketchbrook requested a review from IvanM26 December 13, 2024 21:27

mthomas-ketchbrook linked an issue Dec 13, 2024 that may be closed by this pull request

process_data() throws an error with 2024 data #23

Open

mthomas-ketchbrook and others added 3 commits December 13, 2024 16:48

Remove dependency on {fs}

c1c115e

Fix missing roxygen brace

6134bf2

Remove empty line

055de3d

IvanM26 reviewed Dec 20, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve error message when 2024 files are passed to `process_data()` #32

Improve error message when 2024 files are passed to `process_data()` #32

mthomas-ketchbrook commented Dec 13, 2024

IvanM26 left a comment •

edited

Loading

mthomas-ketchbrook commented Dec 20, 2024

Improve error message when 2024 files are passed to process_data() #32

Are you sure you want to change the base?

Improve error message when 2024 files are passed to process_data() #32

Conversation

mthomas-ketchbrook commented Dec 13, 2024

IvanM26 left a comment • edited Loading

Choose a reason for hiding this comment

bad_data

Unit test output

About the error message

mthomas-ketchbrook commented Dec 20, 2024

Improve error message when 2024 files are passed to `process_data()` #32

Improve error message when 2024 files are passed to `process_data()` #32

IvanM26 left a comment •

edited

Loading

`bad_data`