diff --git a/NEWS.md b/NEWS.md index 2e678b33..fa4979e7 100644 --- a/NEWS.md +++ b/NEWS.md @@ -126,7 +126,7 @@ * Attempts to stratify on a `Surv` object now error more informatively (#230). -* Exposed `pool` argument from `make_strata()` in user-facing resampling functions (#229). +* Exposed argument from `make_strata()` in user-facing resampling functions (#229). * Deprecated the `gather()` method for `rset` objects in favor of `tidyr::pivot_longer()` (#233). @@ -144,7 +144,7 @@ * The `reg_intervals()` function is a convenience function for `lm()`, `glm()`, `survreg()`, and `coxph()` models (#206). -* A few internal functions were exported so that `rsample`-adjacent packages can use the same underlying code. +* A few internal functions were exported so that rsample-adjacent packages can use the same underlying code. * The `obj_sum()` method for `rsplit` objects was updated (#215). @@ -165,11 +165,11 @@ * The `print()` methods for `rsplit` and `val_split` objects were adjusted to show `""` and ``, respectively. -* The `drinks`, `attrition`, and `two_class_dat` data sets were removed. They are in the `modeldata` package. +* The `drinks`, `attrition`, and `two_class_dat` data sets were removed. They are in the modeldata package. -* Compatability with `dplyr` 1.0.0. +* Compatability with dplyr 1.0.0. -# `rsample` 0.0.6 +# rsample 0.0.6 * Added `validation_set()` for making a single resample. @@ -181,7 +181,7 @@ * `initial_time_split()` and `rolling_origin()` now have a `lag` parameter that ensures that previous data are available so that lagged variables can be calculated. (#135, #136) -# `rsample` 0.0.5 +# rsample 0.0.5 * Added three functions to compute different bootstrap confidence intervals. * A new function (`add_resample_id()`) augments a data frame with columns for the resampling identifier. @@ -189,16 +189,16 @@ * Updated `initial_split()`, `mc_cv()`, `vfold_cv()`, `bootstraps()` with new `breaks` parameter that specifies the number of bins to stratify by for a numeric stratification variable. -# `rsample` 0.0.4 +# rsample 0.0.4 Small maintenance release. ## Minor improvements and fixes * `fill()` was removed per the deprecation warning. - * Small changes were made for the new version of `tibble`. + * Small changes were made for the new version of tibble. -# `rsample` 0.0.3 +# rsample 0.0.3 ## New features @@ -210,25 +210,25 @@ Small maintenance release. * Changed the R version requirement to be R >= 3.1 instead of 3.3.3. -* The `recipes`-related `prepper` function was [moved to the `recipes` package](https://github.com/tidymodels/rsample/issues/48). This makes the `rsample` install footprint much smaller. +* The recipes-related `prepper()` function was [moved to the recipes package](https://github.com/tidymodels/rsample/issues/48). This makes the rsample install footprint much smaller. * `rsplit` objects are shown differently inside of a tibble. -* Moved from the `broom` package to the `generics` package. +* Moved from the broom package to the generics package. -# `rsample` 0.0.2 +# rsample 0.0.2 * `initial_split`, `training`, and `testing` were added to do training/testing splits prior to resampling. * Another resampling method, `group_vfold_cv`, was added. * `caret2rsample` and `rsample2caret` can convert `rset` objects to those used by `caret::trainControl` and vice-versa. * A function called `form_pred` can be used to determine the original names of the predictors in a formula or `terms` object. -* A vignette and a function (`prepper`) were included to facilitate using the `recipes` with `rsample`. +* A vignette and a function (`prepper`) were included to facilitate using the recipes with rsample. * A `gather` method was added for `rset` objects. * A `labels` method was added for `rsplit` objects. This can help identify which resample is being used even when the whole `rset` object is not available. -* A variety of `dplyr` methods were added (e.g. `filter`, `mutate`, etc) that work without dropping classes or attributes of the `rsample` objects. +* A variety of dplyr methods were added (e.g. `filter()`, `mutate()`, etc) that work without dropping classes or attributes of the `rsample` objects. -# `rsample` 0.0.1 (2017-07-08) +# rsample 0.0.1 (2017-07-08) Initial public version on CRAN diff --git a/R/make_strata.R b/R/make_strata.R index dc498962..cef69225 100644 --- a/R/make_strata.R +++ b/R/make_strata.R @@ -51,7 +51,7 @@ #' table(x3) #' table(make_strata(x3)) #' -#' # `oilType` data from `caret` +#' # `oilType` data from #' x4 <- rep(LETTERS[1:7], c(37, 26, 3, 7, 11, 10, 2)) #' table(x4) #' table(make_strata(x4)) diff --git a/R/misc.R b/R/misc.R index 97119298..20b6a377 100644 --- a/R/misc.R +++ b/R/misc.R @@ -125,7 +125,7 @@ split_unnamed <- function(x, f) { #' @param x An `rset` or `tune_results` object. #' @param ... Not currently used. #' @return A character value or `NA_character_` if the object was created prior -#' to `rsample` version 0.1.0. +#' to rsample version 0.1.0. #' @rdname get_fingerprint #' @aliases .get_fingerprint #' @examples diff --git a/R/nest.R b/R/nest.R index d3c6b080..3077d79e 100644 --- a/R/nest.R +++ b/R/nest.R @@ -2,7 +2,7 @@ #' #' `nested_cv` can be used to take the results of one resampling procedure #' and conduct further resamples within each split. Any type of resampling -#' used in `rsample` can be used. +#' used in rsample can be used. #' #' @details #' It is a bad idea to use bootstrapping as the outer resampling procedure (see diff --git a/R/permutations.R b/R/permutations.R index 8e2df6b0..e5a42a13 100644 --- a/R/permutations.R +++ b/R/permutations.R @@ -5,12 +5,12 @@ #' by permuting/shuffling one or more columns. This results in analysis #' samples where some columns are in their original order and some columns #' are permuted to a random order. Unlike other sampling functions in -#' `rsample`, there is no assessment set and calling `assessment()` on a +#' rsample, there is no assessment set and calling `assessment()` on a #' permutation split will throw an error. #' #' @param data A data frame. #' @param permute One or more columns to shuffle. This argument supports -#' `tidyselect` selectors. Multiple expressions can be combined with `c()`. +#' tidyselect selectors. Multiple expressions can be combined with `c()`. #' Variable names can be used as if they were positions in the data frame, so #' expressions like `x:y` can be used to select a range of variables. #' See \code{\link[tidyselect]{language}} for more details. diff --git a/man/get_fingerprint.Rd b/man/get_fingerprint.Rd index 14492dd3..cc912420 100644 --- a/man/get_fingerprint.Rd +++ b/man/get_fingerprint.Rd @@ -19,7 +19,7 @@ } \value{ A character value or \code{NA_character_} if the object was created prior -to \code{rsample} version 0.1.0. +to rsample version 0.1.0. } \description{ This function returns a hash (or NA) for an attribute that is created when diff --git a/man/make_strata.Rd b/man/make_strata.Rd index 9023356d..c2b7b434 100644 --- a/man/make_strata.Rd +++ b/man/make_strata.Rd @@ -64,7 +64,7 @@ x3 <- factor(x2) table(x3) table(make_strata(x3)) -# `oilType` data from `caret` +# `oilType` data from x4 <- rep(LETTERS[1:7], c(37, 26, 3, 7, 11, 10, 2)) table(x4) table(make_strata(x4)) diff --git a/man/nested_cv.Rd b/man/nested_cv.Rd index 826a9c47..a592c8cf 100644 --- a/man/nested_cv.Rd +++ b/man/nested_cv.Rd @@ -27,7 +27,7 @@ additional resamples. \description{ \code{nested_cv} can be used to take the results of one resampling procedure and conduct further resamples within each split. Any type of resampling -used in \code{rsample} can be used. +used in rsample can be used. } \details{ It is a bad idea to use bootstrapping as the outer resampling procedure (see diff --git a/man/permutations.Rd b/man/permutations.Rd index 4e543195..c2773bed 100644 --- a/man/permutations.Rd +++ b/man/permutations.Rd @@ -10,7 +10,7 @@ permutations(data, permute = NULL, times = 25, apparent = FALSE, ...) \item{data}{A data frame.} \item{permute}{One or more columns to shuffle. This argument supports -\code{tidyselect} selectors. Multiple expressions can be combined with \code{c()}. +tidyselect selectors. Multiple expressions can be combined with \code{c()}. Variable names can be used as if they were positions in the data frame, so expressions like \code{x:y} can be used to select a range of variables. See \code{\link[tidyselect]{language}} for more details.} @@ -33,7 +33,7 @@ A permutation sample is the same size as the original data set and is made by permuting/shuffling one or more columns. This results in analysis samples where some columns are in their original order and some columns are permuted to a random order. Unlike other sampling functions in -\code{rsample}, there is no assessment set and calling \code{assessment()} on a +rsample, there is no assessment set and calling \code{assessment()} on a permutation split will throw an error. } \details{ diff --git a/vignettes/Applications/Intervals.Rmd b/vignettes/Applications/Intervals.Rmd index ed700e8f..832fe5c2 100644 --- a/vignettes/Applications/Intervals.Rmd +++ b/vignettes/Applications/Intervals.Rmd @@ -193,7 +193,7 @@ intervals %>% split(intervals$term) For bias-corrected and accelerated (BCa) intervals, an additional argument is required. The `.fn` argument is a function that computes the statistic of interest. The first argument should be for the `rsplit` object and other arguments can be passed in using the ellipses. -These intervals use an internal leave-one-out resample to compute the Jackknife statistic and will recompute the statistic for _every bootstrap resample_. If the statistic is expensive to compute, this may take some time. For those calculations, we use the `furrr` package so these can be computed in parallel if you have set up a parallel processing plan (see `?future::plan`). +These intervals use an internal leave-one-out resample to compute the Jackknife statistic and will recompute the statistic for _every bootstrap resample_. If the statistic is expensive to compute, this may take some time. For those calculations, we use the furrr package so these can be computed in parallel if you have set up a parallel processing plan (see `?future::plan`). The user-facing function takes an argument for the function and the ellipses. diff --git a/vignettes/Working_with_rsets.Rmd b/vignettes/Working_with_rsets.Rmd index 8f59b2e8..4d6b4a2b 100644 --- a/vignettes/Working_with_rsets.Rmd +++ b/vignettes/Working_with_rsets.Rmd @@ -72,7 +72,7 @@ Now let's write a function that will, for each resample: 1. obtain the analysis data set (i.e. the 90% used for modeling) 1. fit a logistic regression model -1. predict the assessment data (the other 10% not used for the model) using the `broom` package +1. predict the assessment data (the other 10% not used for the model) using the broom package 1. determine if each sample was predicted correctly. Here is our function: @@ -109,7 +109,7 @@ example[1:10, setdiff(names(example), names(attrition))] For this model, the `.fitted` value is the linear predictor in log-odds units. -To compute this data set for each of the 100 resamples, we'll use the `map` function from the `purrr` package: +To compute this data set for each of the 100 resamples, we'll use the `map` function from the package: ```{r model_purrr, warning=FALSE} library(purrr) @@ -182,7 +182,7 @@ The calculated 95% confidence interval contains zero, so we don't have evidence ## Bootstrap Estimates of Model Coefficients -Unless there is already a column in the resample object that contains the fitted model, a function can be used to fit the model and save all of the model coefficients. The [`broom` package](https://cran.r-project.org/package=broom) package has a `tidy` function that will save the coefficients in a data frame. Instead of returning a data frame with a row for each model term, we will save a data frame with a single row and columns for each model term. As before, `purrr::map` can be used to estimate and save these values for each split. +Unless there is already a column in the resample object that contains the fitted model, a function can be used to fit the model and save all of the model coefficients. The [broom package](https://cran.r-project.org/package=broom) package has a `tidy` function that will save the coefficients in a data frame. Instead of returning a data frame with a row for each model term, we will save a data frame with a single row and columns for each model term. As before, `purrr::map()` can be used to estimate and save these values for each split. ```{r coefs} @@ -200,7 +200,7 @@ bt_resamples$betas[[1]] ## Keeping Tidy -As previously mentioned, the [`broom` package](https://cran.r-project.org/package=broom) contains a class called `tidy` that created representations of objects that can be easily used for analysis, plotting, etc. rsample contains `tidy` methods for `rset` and `rsplit` objects. For example: +As previously mentioned, the [broom package](https://cran.r-project.org/package=broom) contains a class called `tidy` that created representations of objects that can be easily used for analysis, plotting, etc. rsample contains `tidy` methods for `rset` and `rsplit` objects. For example: ```{r tidy_rsplit} first_resample <- bt_resamples$splits[[1]]