-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: check_enough_train_data #283
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generally lgtm, couple of inlines and a couple of commits you can take or leave (slightly too much to be easy to edit in github suggestions)
#' when [prep()] is run, some operations may not be able to be | ||
#' conducted on new data (e.g. processing the outcome variable(s)). | ||
#' Care should be taken when using `skip = TRUE` as it may affect | ||
#' the computations for subsequent operations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if looks like you have a test demonstrating it does do that! So we definitely have a functional check for training data, if not test data.skip=TRUE
by default would solve the issue about running during fit
vs predict
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup! I'd like to handle test data checking next, unclear if that will be possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is failing checks because purrr
isn't in the namespace (though map()
is available internally).
Also a few comments below. The simplification for counting is perhaps important (at least easier).
I'm requesting changes only because of the possible bug and the missing NAMESPACE.
@dshemetov One more thing before merging: can you add the check to the |
Sure, let me take a look at how much work that will be. |
Attempt at fixing #106.
It seems to work well (see the tests) for both geo pooled and non geo pooled (more generally key col pooled) cases, but I still don't know how to keep this from running at test time, where the number of data points available will be way different and may cause problems.
@dajmcdon any suggestions?