Early stopping / validation set #57

sebffischer · 2022-09-20T07:49:08Z

We have already discussed this multiple times with Michel and Marc.
The question was which data to use for early stopping / validation, when conducting a resampling.

There were two options:

The learner splits the training data it receives again into "actual training" and validation data
The learner uses the test set in the task for the early stopping.

About 1.

(-) This can significantly reduce the training size just for early stopping.
(+) We still get an unbiased estimate

About 2.
This was made possible here mlr-org/mlr3@435c9d1, i.e. this PR ensures that the learners have access to the test set.

(+) more data is used for actually training the model
(-) the performance estimate might be slightly biased

An additional complication is what happens in an AutoTuner

In case 1. we now have to decide whether we still want to do early stopping when fitting the final model (probably not) or estimate the number of rounds from the obtained training iterations over all the folds (e.g. the maximum)

In case 2. we are forced to estimate the number of rounds, because no test set is available for the final model.

In principle we could let the user select between 1 and 2 by providing this as a parameter, i.e. when use_test_set = TRUE we use the test set for validation and if use_test_set = FALSE we split away data from the training set.

The text was updated successfully, but these errors were encountered:

sebffischer · 2022-09-20T11:21:14Z

Currently learner_torch_train still assumes that the row_role early_stopping exists.

sebffischer · 2022-09-20T13:45:08Z

Note also, that the parameter keep_last_prediction has to be be actually implemented, i.e. we (optionally) store the predictions from the last evaluation round (if done on the test set) so that we don't have to recompute them

mb706 · 2022-09-26T09:57:32Z

Two points:

using the test set for early stopping leaks information from the test set to the train set and gives a biased resampling result. It would probably be good to make it possible somehow (e.g. by making it possible that rows can have "testset" and "early stopping" role simultaneously), but it should probably not be the common case? An idea would be to have a pipeop that does splitting of train and test data (so that early stopping also works inside resampling folds, for example): convert task into X% train and (1-X%) early-stopping-data, add early stopping role to all test set rows, maybe other things...
It would be good to consider the interaction of mlr3pipelines with the concept of early-stopping-rows. How should e.g. imputation be handled? Should PipeOpImpute treat early-stopping-rows as training data, or as prediction data? What should class-balancing oversampling or smote do?

mb706 · 2022-09-26T11:39:51Z

I guess using the "test" split works fine with the proposed hyperparameters.

The mlr3pipelines-issue is now mlr-org/mlr3pipelines#698

sebffischer added this to the 0.1 milestone Sep 20, 2022

sebffischer added the pipelines label Mar 24, 2023

sebffischer closed this as completed Feb 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Early stopping / validation set #57

Early stopping / validation set #57

sebffischer commented Sep 20, 2022

sebffischer commented Sep 20, 2022

sebffischer commented Sep 20, 2022

mb706 commented Sep 26, 2022

mb706 commented Sep 26, 2022

Early stopping / validation set #57

Early stopping / validation set #57

Comments

sebffischer commented Sep 20, 2022

sebffischer commented Sep 20, 2022

sebffischer commented Sep 20, 2022

mb706 commented Sep 26, 2022

mb706 commented Sep 26, 2022