Skip to content

Commit

Permalink
Merge pull request #799 from mlr-org/documentation_improvements
Browse files Browse the repository at this point in the history
Documentation improvements
  • Loading branch information
mb706 authored Aug 16, 2024
2 parents b718971 + 88d2b1b commit ac825b3
Show file tree
Hide file tree
Showing 4 changed files with 50 additions and 10 deletions.
11 changes: 6 additions & 5 deletions R/PipeOpImputeHist.R
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,11 @@
#' @description
#' Impute numerical features by histogram.
#'
#' During training, a histogram is fitted using R's [`hist()`][graphics::hist] function.
#' The fitted histogram is then sampled from for imputation. This is an approximation to
#' sampling from the empirical training data distribution (i.e. sampling from training data
#' with replacement), but is much more memory efficient for large datasets, since the `$state`
#' During training, a histogram is fitted on each column using R's [`hist()`][graphics::hist] function.
#' The fitted histogram is then sampled from for imputation. Sampling happens in a two-step process:
#' First, a bin is sampled from the histogram, then a value is sampled uniformly from the bin.
#' This is an approximation to sampling from the empirical training data distribution (i.e. sampling
#' from training data with replacement), but is much more memory efficient for large datasets, since the `$state`
#' does not need to save the training data.
#'
#' @section Construction:
Expand All @@ -26,7 +27,7 @@
#' @section Input and Output Channels:
#' Input and output channels are inherited from [`PipeOpImpute`].
#'
#' The output is the input [`Task`][mlr3::Task] with all affected numeric features missing values imputed by (column-wise) histogram.
#' The output is the input [`Task`][mlr3::Task] with all affected numeric features missing values imputed by (column-wise) histogram; see Description for details.
#'
#' @section State:
#' The `$state` is a named `list` with the `$state` elements inherited from [`PipeOpImpute`].
Expand Down
19 changes: 19 additions & 0 deletions R/PipeOpImputeOOR.R
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,13 @@
#' This type of imputation is especially sensible in the context of tree-based methods, see also
#' Ding & Simonoff (2010).
#'
#' If a factor is missing during prediction, but not during training, this adds an unseen level
#' `".MISSING"`, which would be a problem for most models. This is why it is recommended to use
#' [`po("fixfactors")`][mlr_pipeops_fixfactors] and
#' [`po("imputesample", affect_columns = selector_type(types = c("factor", "ordered")))`][mlr_pipeops_imputesample]
#' (or some other imputation method) after this imputation method, if missing values are expected during prediction
#' in factor columns that had no missing values during training.
#'
#' @section Construction:
#' ```
#' PipeOpImputeOOR$new(id = "imputeoor", param_vals = list())
Expand Down Expand Up @@ -71,6 +78,18 @@
#' new_task = po$train(list(task = task))[[1]]
#' new_task$missings()
#' new_task$data()
#'
#' # recommended use when missing values are expected during prediction on
#' # factor columns that had no missing values during training
#' gr = po("imputeoor") %>>%
#' po("fixfactors") %>>%
#' po("imputesample", affect_columns = selector_type(types = c("factor", "ordered")))
#' t1 = as_task_classif(data.frame(l = as.ordered(letters[1:3]), t = letters[1:3]), target = "t")
#' t2 = as_task_classif(data.frame(l = as.ordered(c("a", NA, NA)), t = letters[1:3]), target = "t")
#' gr$train(t1)[[1]]$data()
#'
#' # missing values during prediction are sampled randomly
#' gr$predict(t2)[[1]]$data()
#' @family PipeOps
#' @family Imputation PipeOps
#' @template seealso_pipeopslist
Expand Down
11 changes: 6 additions & 5 deletions man/mlr_pipeops_imputehist.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

19 changes: 19 additions & 0 deletions man/mlr_pipeops_imputeoor.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit ac825b3

Please sign in to comment.