-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve error message when 2024 files are passed to process_data()
#32
base: master
Are you sure you want to change the base?
Improve error message when 2024 files are passed to process_data()
#32
Conversation
renamed old `process_data() fn to `process_data_all()`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good! I have some comments that, in my opinion, are worth discussing!
bad_data
I was skeptical about having the .TXT
files in the package (due to package size) but I checked that their combined size is around 0.5Mb, which seems okay.
Unit test output
As of now, running testthat::test_dir("tests/testthat")
shows the following in the console:
In an ideal scenario, I think I would prefer to avoid the in-between messages. Do you think we should address that, and if so, if we should address that as part of this task?
About the error message
Running process_data(dir = testthat::test_path("bad_data"))
shows the following output:
-
Given that we are "capturing" this error (i.e. we know why it's happening), I'm wondering whether we should "keep it simple" and find a way to not show the
Backtrace
. I think theBacktrace
adds a layer of difficulty to read our error message that explains the problem. -
I think we should rephrase the error message a bit, making it more "general". Right now is focusing on "2024", but what if the error continues in 2025? I know that the problem would be the same, but it would feel weird reading about 2024 errors when processing 2025 data.
-
Shouldn't we say first that there is an error and then provide the info about it? Currently, we see a message, then something that looks like a cryptic error and finally the
Backtrace
. I'm also wondering if we should show the "actual" error (thepurrr
problem etc) or if we just need to comment on the file problems without worrying about the "cryptic" error. Basically, I'm thinking whether we should have a check on files as they are processed and say something like "there is an error in file XYZ" and then point to the discussion in GitHub.
When an R package is installed, the
I agree -- but I couldn't figure out how to fix this. I'd like to do this as part of this task, but don't want to spend more than 30 minutes troubleshooting it.
I completely agree -- I think the backtrace makes the error more difficult to interpret. I couldn't figure out how to fix this. I'd like to do this as part of this task, but don't want to spend more than 30 minutes troubleshooting it.
I think that's a great idea. However, I don't think we will have to face this concern until 2025 Q1 data is published in ~May 2025. I think we should keep it as is and address the need to change it at that time.
I think that's a great idea, too. With the backtrace, I was worried that the audience might not scroll down far enough in the console to see our "Note". If we can get rid of the backtrace, I am open to reversing the order of the output (i.e., error first, message second). I know that with the way the package currently operates (before this PR was introduced), {rlang} actually does a pretty nice job of articulating that the error is occurring in file |
This PR improves the error message seen by users when 2024 files downloaded from FCA are processed via
process_data()
(see #23). There is also an additional test added toprocess_data()
for the condition when thedir
argument is a path that does not exist."Bad" 2024 data files have been added in the
tests/testthat/bad_data/
directory, so that the unit test can pass even in the event that FCA fixes the files posted on the website.Lastly, this PR re-introduces the {cli} package as a dependency. This shouldn't be an issue, as {cli} is also a dependency of other packages that are already listed in
DESCRIPTION
(e.g., {dplyr}). I believe that for the purposes of what we are trying to explain to the user, simply using {rlang} (as we do in the rest of this package) does not suffice.