feat: check_enough_train_data #283

dshemetov · 2024-01-18T22:37:06Z

Attempt at fixing #106.

It seems to work well (see the tests) for both geo pooled and non geo pooled (more generally key col pooled) cases, but I still don't know how to keep this from running at test time, where the number of data points available will be way different and may cause problems.

@dajmcdon any suggestions?

dsweber2

generally lgtm, couple of inlines and a couple of commits you can take or leave (slightly too much to be easy to edit in github suggestions)

dsweber2 · 2024-01-19T21:50:13Z

R/check_enough_train_data.R

+#'  when [prep()] is run, some operations may not be able to be
+#'  conducted on new data (e.g. processing the outcome variable(s)).
+#'  Care should be taken when using `skip = TRUE` as it may affect
+#'  the computations for subsequent operations.


~~I'm wondering if skip=TRUE by default would solve the issue about running during fit vs predict?~~ looks like you have a test demonstrating it does do that! So we definitely have a functional check for training data, if not test data.

yup! I'd like to handle test data checking next, unclear if that will be possible.

R/check_enough_train_data.R

tests/testthat/test-check_enough_train_data.R

dajmcdon

This is failing checks because purrr isn't in the namespace (though map() is available internally).

Also a few comments below. The simplification for counting is perhaps important (at least easier).

I'm requesting changes only because of the possible bug and the missing NAMESPACE.

R/check_enough_train_data.R

dajmcdon · 2024-01-21T18:51:40Z

@dshemetov One more thing before merging: can you add the check to the arx_classifier() and the smooth forecaster?

dshemetov · 2024-01-22T20:34:30Z

Sure, let me take a look at how much work that will be.

* n = NULL no longer adds the check to recipe

wip: check_enough_train_data

cf65b1b

dshemetov requested a review from dajmcdon as a code owner January 18, 2024 22:37

dsweber2 linked an issue Jan 19, 2024 that may be closed by this pull request

Handling insufficient training data. #272

Closed

feat: add check_enough_data

ad74faa

dshemetov changed the title ~~wip: check_enough_train_data~~ feat: check_enough_train_data Jan 19, 2024

dshemetov added 2 commits January 19, 2024 13:50

doc: update check_enough_train_data docstring

f237ca5

fix: remove browser()

33e00ca

dsweber2 approved these changes Jan 19, 2024

View reviewed changes

dsweber2 added 2 commits January 19, 2024 14:45

typos and ambiguous names

d4d5189

slightly more test structure

3662bc3

dajmcdon requested changes Jan 19, 2024

View reviewed changes

R/check_enough_train_data.R Outdated Show resolved Hide resolved

dshemetov added 7 commits January 19, 2024 16:44

refactor: use Dan's suggest in check_enough_train_data

fe2c91a

fix: check_enough_train_data, tests

1846097

feat: add check_enough_data to arx_forecaster

4bc17c4

fix: add default n to check_enough_train_data, import dplyr funcs

3f1630d

repo: ignore renv stuff

38f19da

doc: update NEWS

c655bf6

doc: document

437f5d1

dshemetov requested a review from dajmcdon January 20, 2024 04:11

dajmcdon approved these changes Jan 20, 2024

View reviewed changes

dshemetov added 5 commits January 22, 2024 14:17

refactor: rename check_enough_data args in arx_forecaster

edb15e3

* n = NULL no longer adds the check to recipe

feat: add check_enough_data to arx_classifier

39bb81e

refactor: change the default n in check_enough_data to #predictors

efde51e

repo: change version bump

580ca5a

doc: document

b869222

dshemetov merged commit 6ddffb2 into main Jan 22, 2024
2 checks passed

dshemetov deleted the ds/check_enough_train_data branch January 22, 2024 23:15

dshemetov mentioned this pull request Jan 22, 2024

Check for amount of training data #106

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: check_enough_train_data #283

feat: check_enough_train_data #283

dshemetov commented Jan 18, 2024

dsweber2 left a comment

dsweber2 Jan 19, 2024

dshemetov Jan 20, 2024

dajmcdon left a comment

dajmcdon commented Jan 21, 2024

dshemetov commented Jan 22, 2024

feat: check_enough_train_data #283

feat: check_enough_train_data #283

Conversation

dshemetov commented Jan 18, 2024

dsweber2 left a comment

Choose a reason for hiding this comment

dsweber2 Jan 19, 2024

Choose a reason for hiding this comment

dshemetov Jan 20, 2024

Choose a reason for hiding this comment

dajmcdon left a comment

Choose a reason for hiding this comment

dajmcdon commented Jan 21, 2024

dshemetov commented Jan 22, 2024