diff --git a/404.html b/404.html index fda4a711..b5175df6 100644 --- a/404.html +++ b/404.html @@ -16,7 +16,7 @@ - +
Developed by Byron Jaeger.
+Developed by Byron Jaeger.
diff --git a/articles/fast.html b/articles/fast.html index 99f90138..5200e01f 100644 --- a/articles/fast.html +++ b/articles/fast.html @@ -17,7 +17,7 @@ - +Learn how to get started with the basics of aorsf.
@@ -71,11 +71,11 @@Developed by Byron Jaeger.
+Developed by Byron Jaeger.
diff --git a/authors.html b/authors.html index 9285dd7d..11b82660 100644 --- a/authors.html +++ b/authors.html @@ -1,5 +1,5 @@ -Developed by Byron Jaeger.
+Developed by Byron Jaeger.
diff --git a/reference/index.html b/reference/index.html index dae58677..9e420ed7 100644 --- a/reference/index.html +++ b/reference/index.html @@ -1,5 +1,5 @@ -Fit, inspect, summarize, and apply oblique RFs
+Fit, inspect, summarize, and apply oblique RFs
Choose how to identify linear combinations of predictors and set tuning parameters for your approach
+Choose how to identify linear combinations of predictors and set tuning parameters for your approach
Estimate the importance of individual variables and conduct variable selection using ORSFs
+Estimate the importance of individual variables and conduct variable selection using ORSFs
Interpret your model by generating partial dependence or individual conditional expectation values. Plotting functions not included (but see examples)
+Interpret your model by generating partial dependence or individual conditional expectation values. Plotting functions not included (but see examples)
Datasets used in examples and vignettes.
+Datasets used in examples and vignettes.
Functions that don’t fit neatly into a category above, but are still helpful.
+Functions that don’t fit neatly into a category above, but are still helpful.
Techniques used by aorsf that may be helpful in other contexts.
+Techniques used by aorsf that may be helpful in other contexts.
Developed by Byron Jaeger.
+Developed by Byron Jaeger.
-fit_accel <- orsf(pbc_orsf,
+fit_accel <- orsf(pbc_orsf,
control = orsf_control_survival(),
formula = Surv(time, status) ~ . - id,
tree_seeds = 329)
@@ -232,8 +231,7 @@ Linear combinations with Cox re
repeat iterations until convergence allows you to run Cox regression in
each non-terminal node of each survival tree, using the regression
coefficients to create linear combinations of predictors:
-
-control_cph <- orsf_control_survival(method = 'glm',
+control_cph <- orsf_control_survival(method = 'glm',
scale_x = TRUE,
max_iter = 20)
@@ -251,8 +249,7 @@ Linear combinations w
non-terminal node of each survival tree. This can be really helpful if
you want to do feature selection within the node, but it is a lot slower
than the 'glm'
option.
-
-# select 3 predictors out of 5 to be used in
+# select 3 predictors out of 5 to be used in
# each linear combination of predictors.
control_net <- orsf_control_survival(method = 'net', target_df = 3)
@@ -270,12 +267,10 @@ Linear combinations with you
In addition to the built-in methods, customized functions can be used to
identify linear combinations of predictors. We’ll demonstrate a few
here.
The first uses random coefficients
-
-f_rando <- function(x_node, y_node, w_node){
+The second derives coefficients from principal component analysis
-
-f_pca <- function(x_node, y_node, w_node) {
+f_pca <- function(x_node, y_node, w_node) {
# estimate two principal components.
pca <- stats::prcomp(x_node, rank. = 2)
@@ -286,8 +281,7 @@ Linear combinations with you
similar to a method known as reinforcement learning trees (see the
RLT
package), although our method of “muting” is very crude compared
to the method proposed by Zhu et al.
-
-f_rlt <- function(x_node, y_node, w_node){
+f_rlt <- function(x_node, y_node, w_node){
colnames(y_node) <- c('time', 'status')
colnames(x_node) <- paste("x", seq(ncol(x_node)), sep = '')
@@ -323,8 +317,7 @@ Linear combinations with you
}
We can plug these functions into orsf_control_custom()
, and then pass
the result into orsf()
:
-
-fit_rando <- orsf(pbc_orsf,
+fit_rando <- orsf(pbc_orsf,
Surv(time, status) ~ . - id,
control = orsf_control_survival(method = f_rando),
tree_seeds = 329)
@@ -339,8 +332,7 @@ Linear combinations with you
tree_seeds = 329)
So which fit seems to work best in this example? Let’s find out by
evaluating the out-of-bag survival predictions.
-
-risk_preds <- list(
+risk_preds <- list(
accel = fit_accel$pred_oobag,
cph = fit_cph$pred_oobag,
net = fit_net$pred_oobag,
@@ -355,26 +347,26 @@ Linear combinations with you
summary = 'IPA',
times = fit_accel$pred_horizon)
The AUC values, from highest to lowest:
-sc$AUC$score[order(-AUC)]
-#> model times AUC se lower upper
-#> <fctr> <num> <num> <num> <num> <num>
-#> 1: net 1788 0.9151649 0.02025057 0.8754745 0.9548553
-#> 2: rlt 1788 0.9119200 0.02090107 0.8709547 0.9528854
-#> 3: accel 1788 0.9095628 0.02143250 0.8675558 0.9515697
-#> 4: cph 1788 0.9095628 0.02143250 0.8675558 0.9515697
-#> 5: rando 1788 0.9062197 0.02148854 0.8641029 0.9483365
-#> 6: pca 1788 0.8999479 0.02226683 0.8563057 0.9435901
+sc$AUC$score[order(-AUC)]
+## model times AUC se lower upper
+## <fctr> <num> <num> <num> <num> <num>
+## 1: net 1788 0.9151649 0.02025057 0.8754745 0.9548553
+## 2: rlt 1788 0.9119200 0.02090107 0.8709547 0.9528854
+## 3: accel 1788 0.9095628 0.02143250 0.8675558 0.9515697
+## 4: cph 1788 0.9095628 0.02143250 0.8675558 0.9515697
+## 5: rando 1788 0.9062197 0.02148854 0.8641029 0.9483365
+## 6: pca 1788 0.8999479 0.02226683 0.8563057 0.9435901
And the indices of prediction accuracy:
-sc$Brier$score[order(-IPA), .(model, times, IPA)]
-#> model times IPA
-#> <fctr> <num> <num>
-#> 1: net 1788 0.4905777
-#> 2: accel 1788 0.4806649
-#> 3: cph 1788 0.4806649
-#> 4: rlt 1788 0.4675228
-#> 5: pca 1788 0.4383995
-#> 6: rando 1788 0.4302814
-#> 7: Null model 1788 0.0000000
+sc$Brier$score[order(-IPA), .(model, times, IPA)]
+## model times IPA
+## <fctr> <num> <num>
+## 1: net 1788 0.4905777
+## 2: accel 1788 0.4806649
+## 3: cph 1788 0.4806649
+## 4: rlt 1788 0.4675228
+## 5: pca 1788 0.4383995
+## 6: rando 1788 0.4302814
+## 7: Null model 1788 0.0000000
From inspection,
diff --git a/reference/orsf_control_cph.html b/reference/orsf_control_cph.html
index 4332d93d..359fe945 100644
--- a/reference/orsf_control_cph.html
+++ b/reference/orsf_control_cph.html
@@ -1,7 +1,7 @@
Cox regression ORSF control — orsf_control_cph • aorsf Cox regression ORSF control — orsf_control_cph • aorsf Custom ORSF control — orsf_control_custom • aorsf Custom ORSF control — orsf_control_custom • aorsf
@@ -104,11 +104,11 @@ See also
diff --git a/reference/orsf_control_fast.html b/reference/orsf_control_fast.html
index 88d0f4d4..9c8cbffd 100644
--- a/reference/orsf_control_fast.html
+++ b/reference/orsf_control_fast.html
@@ -1,6 +1,6 @@
Accelerated ORSF control — orsf_control_fast • aorsf Accelerated ORSF control — orsf_control_fast • aorsf Penalized Cox regression ORSF control — orsf_control_net • aorsf Penalized Cox regression ORSF control — orsf_control_net • aorsf Individual Conditional Expectations — orsf_ice_oob • aorsf ClassificationBegin by fitting an oblique classification random forest:
-
-set.seed(329)
+set.seed(329)
index_train <- sample(nrow(penguins_orsf), 150)
@@ -243,26 +242,25 @@ Classification formula = species ~ .)
Compute individual conditional expectation using out-of-bag data for
flipper_length_mm = c(190, 210)
.
-
-pred_spec <- list(flipper_length_mm = c(190, 210))
+pred_spec <- list(flipper_length_mm = c(190, 210))
ice_oob <- orsf_ice_oob(fit_clsf, pred_spec = pred_spec)
-ice_oob
-#> Key: <class>
-#> id_variable id_row class flipper_length_mm pred
-#> <int> <char> <fctr> <num> <num>
-#> 1: 1 1 Adelie 190 0.92169247
-#> 2: 1 2 Adelie 190 0.80944657
-#> 3: 1 3 Adelie 190 0.85172955
-#> 4: 1 4 Adelie 190 0.93559327
-#> 5: 1 5 Adelie 190 0.97708693
-#> ---
-#> 896: 2 146 Gentoo 210 0.26092984
-#> 897: 2 147 Gentoo 210 0.04798334
-#> 898: 2 148 Gentoo 210 0.07927359
-#> 899: 2 149 Gentoo 210 0.84779971
-#> 900: 2 150 Gentoo 210 0.11105143
+ice_oob
+## Key: <class>
+## id_variable id_row class flipper_length_mm pred
+## <int> <char> <fctr> <num> <num>
+## 1: 1 1 Adelie 190 0.92169247
+## 2: 1 2 Adelie 190 0.80944657
+## 3: 1 3 Adelie 190 0.85172955
+## 4: 1 4 Adelie 190 0.93559327
+## 5: 1 5 Adelie 190 0.97708693
+## ---
+## 896: 2 146 Gentoo 210 0.26092984
+## 897: 2 147 Gentoo 210 0.04798334
+## 898: 2 148 Gentoo 210 0.07927359
+## 899: 2 149 Gentoo 210 0.84779971
+## 900: 2 150 Gentoo 210 0.11105143
There are two identifiers in the output:
id_variable
is an identifier for the current value of the
variable(s) that are in the data. It is redundant if you only have one
variable, but helpful if there are multiple variables.
@@ -270,13 +268,12 @@ ClassificationNote that predicted probabilities are returned for each class and each
observation in the data. Predicted probabilities for a given observation
and given variable value sum to 1. For example,
-
+
@@ -284,8 +281,7 @@ Regression
-set.seed(329)
+set.seed(329)
index_train <- sample(nrow(penguins_orsf), 150)
@@ -296,96 +292,92 @@ Regression= bill_length_mm ~ .)
Compute individual conditional expectation using new data for
flipper_length_mm = c(190, 210)
.
-
-pred_spec <- list(flipper_length_mm = c(190, 210))
+pred_spec <- list(flipper_length_mm = c(190, 210))
ice_new <- orsf_ice_new(fit_regr,
pred_spec = pred_spec,
new_data = penguins_orsf_test)
-ice_new
-#> id_variable id_row flipper_length_mm pred
-#> <int> <char> <num> <num>
-#> 1: 1 1 190 37.94483
-#> 2: 1 2 190 37.61595
-#> 3: 1 3 190 37.53681
-#> 4: 1 4 190 39.49476
-#> 5: 1 5 190 38.95635
-#> ---
-#> 362: 2 179 210 51.80471
-#> 363: 2 180 210 47.27183
-#> 364: 2 181 210 47.05031
-#> 365: 2 182 210 50.39028
-#> 366: 2 183 210 48.44774
+ice_new
+## id_variable id_row flipper_length_mm pred
+## <int> <char> <num> <num>
+## 1: 1 1 190 37.94483
+## 2: 1 2 190 37.61595
+## 3: 1 3 190 37.53681
+## 4: 1 4 190 39.49476
+## 5: 1 5 190 38.95635
+## ---
+## 362: 2 179 210 51.80471
+## 363: 2 180 210 47.27183
+## 364: 2 181 210 47.05031
+## 365: 2 182 210 50.39028
+## 366: 2 183 210 48.44774
You can also let pred_spec_auto
pick reasonable values like so:
-
-pred_spec = pred_spec_auto(species, island, body_mass_g)
+pred_spec = pred_spec_auto(species, island, body_mass_g)
ice_new <- orsf_ice_new(fit_regr,
pred_spec = pred_spec,
new_data = penguins_orsf_test)
-ice_new
-#> id_variable id_row species island body_mass_g pred
-#> <int> <char> <fctr> <fctr> <num> <num>
-#> 1: 1 1 Adelie Biscoe 3200 37.78339
-#> 2: 1 2 Adelie Biscoe 3200 37.73273
-#> 3: 1 3 Adelie Biscoe 3200 37.71248
-#> 4: 1 4 Adelie Biscoe 3200 40.25782
-#> 5: 1 5 Adelie Biscoe 3200 40.04074
-#> ---
-#> 8231: 45 179 Gentoo Torgersen 5300 46.14559
-#> 8232: 45 180 Gentoo Torgersen 5300 43.98050
-#> 8233: 45 181 Gentoo Torgersen 5300 44.59837
-#> 8234: 45 182 Gentoo Torgersen 5300 44.85146
-#> 8235: 45 183 Gentoo Torgersen 5300 44.23710
+ice_new
+## id_variable id_row species island body_mass_g pred
+## <int> <char> <fctr> <fctr> <num> <num>
+## 1: 1 1 Adelie Biscoe 3200 37.78339
+## 2: 1 2 Adelie Biscoe 3200 37.73273
+## 3: 1 3 Adelie Biscoe 3200 37.71248
+## 4: 1 4 Adelie Biscoe 3200 40.25782
+## 5: 1 5 Adelie Biscoe 3200 40.04074
+## ---
+## 8231: 45 179 Gentoo Torgersen 5300 46.14559
+## 8232: 45 180 Gentoo Torgersen 5300 43.98050
+## 8233: 45 181 Gentoo Torgersen 5300 44.59837
+## 8234: 45 182 Gentoo Torgersen 5300 44.85146
+## 8235: 45 183 Gentoo Torgersen 5300 44.23710
By default, all combinations of all variables are used. However, you can
also look at the variables one by one, separately, like so:
-
-ice_new <- orsf_ice_new(fit_regr,
+ice_new <- orsf_ice_new(fit_regr,
expand_grid = FALSE,
pred_spec = pred_spec,
new_data = penguins_orsf_test)
-ice_new
-#> id_variable id_row variable value level pred
-#> <int> <char> <char> <num> <char> <num>
-#> 1: 1 1 species NA Adelie 37.74136
-#> 2: 1 2 species NA Adelie 37.42367
-#> 3: 1 3 species NA Adelie 37.04598
-#> 4: 1 4 species NA Adelie 39.89602
-#> 5: 1 5 species NA Adelie 39.14848
-#> ---
-#> 2009: 5 179 body_mass_g 5300 <NA> 51.50196
-#> 2010: 5 180 body_mass_g 5300 <NA> 47.27055
-#> 2011: 5 181 body_mass_g 5300 <NA> 48.34064
-#> 2012: 5 182 body_mass_g 5300 <NA> 48.75828
-#> 2013: 5 183 body_mass_g 5300 <NA> 48.11020
+ice_new
+## id_variable id_row variable value level pred
+## <int> <char> <char> <num> <char> <num>
+## 1: 1 1 species NA Adelie 37.74136
+## 2: 1 2 species NA Adelie 37.42367
+## 3: 1 3 species NA Adelie 37.04598
+## 4: 1 4 species NA Adelie 39.89602
+## 5: 1 5 species NA Adelie 39.14848
+## ---
+## 2009: 5 179 body_mass_g 5300 <NA> 51.50196
+## 2010: 5 180 body_mass_g 5300 <NA> 47.27055
+## 2011: 5 181 body_mass_g 5300 <NA> 48.34064
+## 2012: 5 182 body_mass_g 5300 <NA> 48.75828
+## 2013: 5 183 body_mass_g 5300 <NA> 48.11020
And you can also bypass all the bells and whistles by using your own
data.frame
for a pred_spec
. (Just make sure you request values that
exist in the training data.)
-
-custom_pred_spec <- data.frame(species = 'Adelie',
+custom_pred_spec <- data.frame(species = 'Adelie',
island = 'Biscoe')
ice_new <- orsf_ice_new(fit_regr,
pred_spec = custom_pred_spec,
new_data = penguins_orsf_test)
-ice_new
-#> id_variable id_row species island pred
-#> <int> <char> <fctr> <fctr> <num>
-#> 1: 1 1 Adelie Biscoe 38.52327
-#> 2: 1 2 Adelie Biscoe 38.32073
-#> 3: 1 3 Adelie Biscoe 37.71248
-#> 4: 1 4 Adelie Biscoe 41.68380
-#> 5: 1 5 Adelie Biscoe 40.91140
-#> ---
-#> 179: 1 179 Adelie Biscoe 43.09493
-#> 180: 1 180 Adelie Biscoe 38.79455
-#> 181: 1 181 Adelie Biscoe 39.37734
-#> 182: 1 182 Adelie Biscoe 40.71952
-#> 183: 1 183 Adelie Biscoe 39.34501
+ice_new
+## id_variable id_row species island pred
+## <int> <char> <fctr> <fctr> <num>
+## 1: 1 1 Adelie Biscoe 38.52327
+## 2: 1 2 Adelie Biscoe 38.32073
+## 3: 1 3 Adelie Biscoe 37.71248
+## 4: 1 4 Adelie Biscoe 41.68380
+## 5: 1 5 Adelie Biscoe 40.91140
+## ---
+## 179: 1 179 Adelie Biscoe 43.09493
+## 180: 1 180 Adelie Biscoe 38.79455
+## 181: 1 181 Adelie Biscoe 39.37734
+## 182: 1 182 Adelie Biscoe 40.71952
+## 183: 1 183 Adelie Biscoe 39.34501
@@ -393,8 +385,7 @@ SurvivalBegin by fitting an oblique survival random forest:
-
-set.seed(329)
+set.seed(329)
index_train <- sample(nrow(pbc_orsf), 150)
@@ -407,56 +398,55 @@ SurvivalCompute individual conditional expectation using in-bag data for
bili = c(1,2,3,4,5)
:
ice_train <- orsf_ice_inb(fit_surv, pred_spec = list(bili = 1:5))
-ice_train
-#> id_variable id_row pred_horizon bili pred
-#> <int> <char> <num> <num> <num>
-#> 1: 1 1 1826.25 1 0.1290317
-#> 2: 1 2 1826.25 1 0.1242352
-#> 3: 1 3 1826.25 1 0.0963452
-#> 4: 1 4 1826.25 1 0.1172367
-#> 5: 1 5 1826.25 1 0.2030256
-#> ---
-#> 746: 5 146 1826.25 5 0.7868537
-#> 747: 5 147 1826.25 5 0.2012954
-#> 748: 5 148 1826.25 5 0.4893605
-#> 749: 5 149 1826.25 5 0.4698220
-#> 750: 5 150 1826.25 5 0.9557285
+ice_train
+## id_variable id_row pred_horizon bili pred
+## <int> <char> <num> <num> <num>
+## 1: 1 1 1826.25 1 0.1290317
+## 2: 1 2 1826.25 1 0.1242352
+## 3: 1 3 1826.25 1 0.0963452
+## 4: 1 4 1826.25 1 0.1172367
+## 5: 1 5 1826.25 1 0.2030256
+## ---
+## 746: 5 146 1826.25 5 0.7868537
+## 747: 5 147 1826.25 5 0.2012954
+## 748: 5 148 1826.25 5 0.4893605
+## 749: 5 149 1826.25 5 0.4698220
+## 750: 5 150 1826.25 5 0.9557285
If you don’t have specific values of a variable in mind, let
pred_spec_auto
pick for you:
ice_train <- orsf_ice_inb(fit_surv, pred_spec_auto(bili))
-ice_train
-#> id_variable id_row pred_horizon bili pred
-#> <int> <char> <num> <num> <num>
-#> 1: 1 1 1826.25 0.55 0.11728559
-#> 2: 1 2 1826.25 0.55 0.11728839
-#> 3: 1 3 1826.25 0.55 0.08950739
-#> 4: 1 4 1826.25 0.55 0.10064959
-#> 5: 1 5 1826.25 0.55 0.18736417
-#> ---
-#> 746: 5 146 1826.25 7.25 0.82600898
-#> 747: 5 147 1826.25 7.25 0.29156437
-#> 748: 5 148 1826.25 7.25 0.58395919
-#> 749: 5 149 1826.25 7.25 0.54202021
-#> 750: 5 150 1826.25 7.25 0.96391985
+ice_train
+## id_variable id_row pred_horizon bili pred
+## <int> <char> <num> <num> <num>
+## 1: 1 1 1826.25 0.55 0.11728559
+## 2: 1 2 1826.25 0.55 0.11728839
+## 3: 1 3 1826.25 0.55 0.08950739
+## 4: 1 4 1826.25 0.55 0.10064959
+## 5: 1 5 1826.25 0.55 0.18736417
+## ---
+## 746: 5 146 1826.25 7.25 0.82600898
+## 747: 5 147 1826.25 7.25 0.29156437
+## 748: 5 148 1826.25 7.25 0.58395919
+## 749: 5 149 1826.25 7.25 0.54202021
+## 750: 5 150 1826.25 7.25 0.96391985
Specify pred_horizon
to get individual conditional expectation at each
value:
-
-ice_train <- orsf_ice_inb(fit_surv, pred_spec_auto(bili),
+ice_train <- orsf_ice_inb(fit_surv, pred_spec_auto(bili),
pred_horizon = seq(500, 3000, by = 500))
-ice_train
-#> id_variable id_row pred_horizon bili pred
-#> <int> <char> <num> <num> <num>
-#> 1: 1 1 500 0.55 0.008276627
-#> 2: 1 1 1000 0.55 0.055724516
-#> 3: 1 1 1500 0.55 0.085091120
-#> 4: 1 1 2000 0.55 0.123423352
-#> 5: 1 1 2500 0.55 0.166380739
-#> ---
-#> 4496: 5 150 1000 7.25 0.837774757
-#> 4497: 5 150 1500 7.25 0.934536379
-#> 4498: 5 150 2000 7.25 0.967823286
-#> 4499: 5 150 2500 7.25 0.972059574
-#> 4500: 5 150 3000 7.25 0.980785643
+ice_train
+## id_variable id_row pred_horizon bili pred
+## <int> <char> <num> <num> <num>
+## 1: 1 1 500 0.55 0.008276627
+## 2: 1 1 1000 0.55 0.055724516
+## 3: 1 1 1500 0.55 0.085091120
+## 4: 1 1 2000 0.55 0.123423352
+## 5: 1 1 2500 0.55 0.166380739
+## ---
+## 4496: 5 150 1000 7.25 0.837774757
+## 4497: 5 150 1500 7.25 0.934536379
+## 4498: 5 150 2000 7.25 0.967823286
+## 4499: 5 150 2500 7.25 0.972059574
+## 4500: 5 150 3000 7.25 0.980785643
Multi-prediction horizon ice comes with minimal extra computational
cost. Use a fine grid of time values and assess whether predictors have
time-varying effects.
@@ -469,11 +459,11 @@ Survival
diff --git a/reference/orsf_pd_oob.html b/reference/orsf_pd_oob.html
index c57159cc..2699936a 100644
--- a/reference/orsf_pd_oob.html
+++ b/reference/orsf_pd_oob.html
@@ -7,7 +7,7 @@
using predictions for a new set of data
-See examples for more details">Partial dependence — orsf_pd_oob • aorsf Partial dependence — orsf_pd_oob • aorsf ClassificationBegin by fitting an oblique classification random forest:
-
-set.seed(329)
+set.seed(329)
index_train <- sample(nrow(penguins_orsf), 150)
@@ -268,32 +267,29 @@ Classification formula = species ~ .)
Compute partial dependence using out-of-bag data for
flipper_length_mm = c(190, 210)
.
-
-pred_spec <- list(flipper_length_mm = c(190, 210))
+pred_spec <- list(flipper_length_mm = c(190, 210))
pd_oob <- orsf_pd_oob(fit_clsf, pred_spec = pred_spec)
-pd_oob
-#> Key: <class>
-#> class flipper_length_mm mean lwr medn upr
-#> <fctr> <num> <num> <num> <num> <num>
-#> 1: Adelie 190 0.6176908 0.202278109 0.75856417 0.9810614
-#> 2: Adelie 210 0.4338528 0.019173811 0.56489202 0.8648110
-#> 3: Chinstrap 190 0.2114979 0.017643385 0.15211271 0.7215181
-#> 4: Chinstrap 210 0.1803019 0.020108201 0.09679464 0.7035053
-#> 5: Gentoo 190 0.1708113 0.001334861 0.02769695 0.5750201
-#> 6: Gentoo 210 0.3858453 0.068685035 0.20717073 0.9532853
+pd_oob
+## Key: <class>
+## class flipper_length_mm mean lwr medn upr
+## <fctr> <num> <num> <num> <num> <num>
+## 1: Adelie 190 0.6176908 0.202278109 0.75856417 0.9810614
+## 2: Adelie 210 0.4338528 0.019173811 0.56489202 0.8648110
+## 3: Chinstrap 190 0.2114979 0.017643385 0.15211271 0.7215181
+## 4: Chinstrap 210 0.1803019 0.020108201 0.09679464 0.7035053
+## 5: Gentoo 190 0.1708113 0.001334861 0.02769695 0.5750201
+## 6: Gentoo 210 0.3858453 0.068685035 0.20717073 0.9532853
Note that predicted probabilities are returned for each class and
probabilities in the mean
column sum to 1 if you take the sum over
each class at a specific value of the pred_spec
variables. For
example,
-
-sum(pd_oob[flipper_length_mm == 190, mean])
-#> [1] 1
+sum(pd_oob[flipper_length_mm == 190, mean])
+
But this isn’t the case for the median predicted probability!
-
-sum(pd_oob[flipper_length_mm == 190, medn])
-#> [1] 0.9383738
+sum(pd_oob[flipper_length_mm == 190, medn])
+
@@ -301,8 +297,7 @@ Regression
-set.seed(329)
+set.seed(329)
index_train <- sample(nrow(penguins_orsf), 150)
@@ -313,77 +308,73 @@ Regression= bill_length_mm ~ .)
Compute partial dependence using new data for
flipper_length_mm = c(190, 210)
.
-
-pred_spec <- list(flipper_length_mm = c(190, 210))
+pred_spec <- list(flipper_length_mm = c(190, 210))
pd_new <- orsf_pd_new(fit_regr,
pred_spec = pred_spec,
new_data = penguins_orsf_test)
-pd_new
-#> flipper_length_mm mean lwr medn upr
-#> <num> <num> <num> <num> <num>
-#> 1: 190 42.96571 37.09805 43.69769 48.72301
-#> 2: 210 45.66012 40.50693 46.31577 51.65163
+pd_new
+## flipper_length_mm mean lwr medn upr
+## <num> <num> <num> <num> <num>
+## 1: 190 42.96571 37.09805 43.69769 48.72301
+## 2: 210 45.66012 40.50693 46.31577 51.65163
You can also let pred_spec_auto
pick reasonable values like so:
-
-pred_spec = pred_spec_auto(species, island, body_mass_g)
+pred_spec = pred_spec_auto(species, island, body_mass_g)
pd_new <- orsf_pd_new(fit_regr,
pred_spec = pred_spec,
new_data = penguins_orsf_test)
-pd_new
-#> species island body_mass_g mean lwr medn upr
-#> <fctr> <fctr> <num> <num> <num> <num> <num>
-#> 1: Adelie Biscoe 3200 40.31374 37.24373 40.31967 44.22824
-#> 2: Chinstrap Biscoe 3200 45.10582 42.63342 45.10859 47.60119
-#> 3: Gentoo Biscoe 3200 42.81649 40.19221 42.55664 46.84035
-#> 4: Adelie Dream 3200 40.16219 36.95895 40.34633 43.90681
-#> 5: Chinstrap Dream 3200 46.21778 43.53954 45.90929 49.19173
-#> ---
-#> 41: Chinstrap Dream 5300 48.48139 46.36282 48.25679 51.02996
-#> 42: Gentoo Dream 5300 45.91819 43.62832 45.54110 49.91622
-#> 43: Adelie Torgersen 5300 42.92879 40.66576 42.31072 46.76406
-#> 44: Chinstrap Torgersen 5300 46.59576 44.80400 46.49196 49.03906
-#> 45: Gentoo Torgersen 5300 45.11384 42.95190 44.51289 49.27629
+pd_new
+## species island body_mass_g mean lwr medn upr
+## <fctr> <fctr> <num> <num> <num> <num> <num>
+## 1: Adelie Biscoe 3200 40.31374 37.24373 40.31967 44.22824
+## 2: Chinstrap Biscoe 3200 45.10582 42.63342 45.10859 47.60119
+## 3: Gentoo Biscoe 3200 42.81649 40.19221 42.55664 46.84035
+## 4: Adelie Dream 3200 40.16219 36.95895 40.34633 43.90681
+## 5: Chinstrap Dream 3200 46.21778 43.53954 45.90929 49.19173
+## ---
+## 41: Chinstrap Dream 5300 48.48139 46.36282 48.25679 51.02996
+## 42: Gentoo Dream 5300 45.91819 43.62832 45.54110 49.91622
+## 43: Adelie Torgersen 5300 42.92879 40.66576 42.31072 46.76406
+## 44: Chinstrap Torgersen 5300 46.59576 44.80400 46.49196 49.03906
+## 45: Gentoo Torgersen 5300 45.11384 42.95190 44.51289 49.27629
By default, all combinations of all variables are used. However, you can
also look at the variables one by one, separately, like so:
-
-pd_new <- orsf_pd_new(fit_regr,
+pd_new <- orsf_pd_new(fit_regr,
expand_grid = FALSE,
pred_spec = pred_spec,
new_data = penguins_orsf_test)
-pd_new
-#> variable value level mean lwr medn upr
-#> <char> <num> <char> <num> <num> <num> <num>
-#> 1: species NA Adelie 41.90271 37.10417 41.51723 48.51478
-#> 2: species NA Chinstrap 47.11314 42.40419 46.96478 51.51392
-#> 3: species NA Gentoo 44.37038 39.87306 43.89889 51.21635
-#> 4: island NA Biscoe 44.21332 37.22711 45.27862 51.21635
-#> 5: island NA Dream 44.43354 37.01471 45.57261 51.51392
-#> 6: island NA Torgersen 43.29539 37.01513 44.26924 49.84391
-#> 7: body_mass_g 3200 <NA> 42.84625 37.03978 43.95991 49.19173
-#> 8: body_mass_g 3550 <NA> 43.53326 37.56730 44.43756 50.47092
-#> 9: body_mass_g 3975 <NA> 44.30431 38.31567 45.22089 51.50683
-#> 10: body_mass_g 4700 <NA> 45.22559 39.88199 46.34680 51.18955
-#> 11: body_mass_g 5300 <NA> 45.91412 40.84742 46.95327 51.48851
+pd_new
+## variable value level mean lwr medn upr
+## <char> <num> <char> <num> <num> <num> <num>
+## 1: species NA Adelie 41.90271 37.10417 41.51723 48.51478
+## 2: species NA Chinstrap 47.11314 42.40419 46.96478 51.51392
+## 3: species NA Gentoo 44.37038 39.87306 43.89889 51.21635
+## 4: island NA Biscoe 44.21332 37.22711 45.27862 51.21635
+## 5: island NA Dream 44.43354 37.01471 45.57261 51.51392
+## 6: island NA Torgersen 43.29539 37.01513 44.26924 49.84391
+## 7: body_mass_g 3200 <NA> 42.84625 37.03978 43.95991 49.19173
+## 8: body_mass_g 3550 <NA> 43.53326 37.56730 44.43756 50.47092
+## 9: body_mass_g 3975 <NA> 44.30431 38.31567 45.22089 51.50683
+## 10: body_mass_g 4700 <NA> 45.22559 39.88199 46.34680 51.18955
+## 11: body_mass_g 5300 <NA> 45.91412 40.84742 46.95327 51.48851
And you can also bypass all the bells and whistles by using your own
data.frame
for a pred_spec
. (Just make sure you request values that
exist in the training data.)
-
-custom_pred_spec <- data.frame(species = 'Adelie',
+custom_pred_spec <- data.frame(species = 'Adelie',
island = 'Biscoe')
pd_new <- orsf_pd_new(fit_regr,
pred_spec = custom_pred_spec,
new_data = penguins_orsf_test)
-pd_new
-#> species island mean lwr medn upr
-#> <fctr> <fctr> <num> <num> <num> <num>
-#> 1: Adelie Biscoe 41.98024 37.22711 41.65252 48.51478
+pd_new
+
@@ -391,8 +382,7 @@ SurvivalBegin by fitting an oblique survival random forest:
-
-set.seed(329)
+set.seed(329)
index_train <- sample(nrow(pbc_orsf), 150)
@@ -404,43 +394,42 @@ Survival oobag_pred_horizon = 365.25 * 5)
Compute partial dependence using in-bag data for bili = c(1,2,3,4,5)
:
pd_train <- orsf_pd_inb(fit_surv, pred_spec = list(bili = 1:5))
-pd_train
-#> pred_horizon bili mean lwr medn upr
-#> <num> <num> <num> <num> <num> <num>
-#> 1: 1826.25 1 0.2566200 0.02234786 0.1334170 0.8918909
-#> 2: 1826.25 2 0.3121392 0.06853733 0.1896849 0.9204338
-#> 3: 1826.25 3 0.3703242 0.11409793 0.2578505 0.9416791
-#> 4: 1826.25 4 0.4240692 0.15645214 0.3331057 0.9591581
-#> 5: 1826.25 5 0.4663670 0.20123406 0.3841700 0.9655296
+pd_train
+## pred_horizon bili mean lwr medn upr
+## <num> <num> <num> <num> <num> <num>
+## 1: 1826.25 1 0.2566200 0.02234786 0.1334170 0.8918909
+## 2: 1826.25 2 0.3121392 0.06853733 0.1896849 0.9204338
+## 3: 1826.25 3 0.3703242 0.11409793 0.2578505 0.9416791
+## 4: 1826.25 4 0.4240692 0.15645214 0.3331057 0.9591581
+## 5: 1826.25 5 0.4663670 0.20123406 0.3841700 0.9655296
If you don’t have specific values of a variable in mind, let
pred_spec_auto
pick for you:
pd_train <- orsf_pd_inb(fit_surv, pred_spec_auto(bili))
-pd_train
-#> pred_horizon bili mean lwr medn upr
-#> <num> <num> <num> <num> <num> <num>
-#> 1: 1826.25 0.55 0.2481444 0.02035041 0.1242215 0.8801444
-#> 2: 1826.25 0.70 0.2502831 0.02045039 0.1271039 0.8836536
-#> 3: 1826.25 1.50 0.2797763 0.03964900 0.1601715 0.9041584
-#> 4: 1826.25 3.50 0.3959349 0.13431288 0.2920400 0.9501230
-#> 5: 1826.25 7.25 0.5351935 0.28064629 0.4652185 0.9783000
+pd_train
+## pred_horizon bili mean lwr medn upr
+## <num> <num> <num> <num> <num> <num>
+## 1: 1826.25 0.55 0.2481444 0.02035041 0.1242215 0.8801444
+## 2: 1826.25 0.70 0.2502831 0.02045039 0.1271039 0.8836536
+## 3: 1826.25 1.50 0.2797763 0.03964900 0.1601715 0.9041584
+## 4: 1826.25 3.50 0.3959349 0.13431288 0.2920400 0.9501230
+## 5: 1826.25 7.25 0.5351935 0.28064629 0.4652185 0.9783000
Specify pred_horizon
to get partial dependence at each value:
-
-pd_train <- orsf_pd_inb(fit_surv, pred_spec_auto(bili),
+pd_train <- orsf_pd_inb(fit_surv, pred_spec_auto(bili),
pred_horizon = seq(500, 3000, by = 500))
-pd_train
-#> pred_horizon bili mean lwr medn upr
-#> <num> <num> <num> <num> <num> <num>
-#> 1: 500 0.55 0.0617199 0.000443399 0.00865419 0.5907104
-#> 2: 1000 0.55 0.1418501 0.005793742 0.05572853 0.7360749
-#> 3: 1500 0.55 0.2082505 0.013609478 0.09174558 0.8556319
-#> 4: 2000 0.55 0.2679017 0.023047689 0.14574169 0.8910549
-#> 5: 2500 0.55 0.3179617 0.063797305 0.20254500 0.9017710
-#> ---
-#> 26: 1000 7.25 0.3264627 0.135343689 0.25956791 0.8884333
-#> 27: 1500 7.25 0.4641265 0.218208755 0.38787435 0.9702903
-#> 28: 2000 7.25 0.5511761 0.293367409 0.48427730 0.9812413
-#> 29: 2500 7.25 0.6200238 0.371965247 0.56954399 0.9845058
-#> 30: 3000 7.25 0.6803482 0.425128031 0.64642318 0.9888637
+pd_train
+## pred_horizon bili mean lwr medn upr
+## <num> <num> <num> <num> <num> <num>
+## 1: 500 0.55 0.0617199 0.000443399 0.00865419 0.5907104
+## 2: 1000 0.55 0.1418501 0.005793742 0.05572853 0.7360749
+## 3: 1500 0.55 0.2082505 0.013609478 0.09174558 0.8556319
+## 4: 2000 0.55 0.2679017 0.023047689 0.14574169 0.8910549
+## 5: 2500 0.55 0.3179617 0.063797305 0.20254500 0.9017710
+## ---
+## 26: 1000 7.25 0.3264627 0.135343689 0.25956791 0.8884333
+## 27: 1500 7.25 0.4641265 0.218208755 0.38787435 0.9702903
+## 28: 2000 7.25 0.5511761 0.293367409 0.48427730 0.9812413
+## 29: 2500 7.25 0.6200238 0.371965247 0.56954399 0.9845058
+## 30: 3000 7.25 0.6803482 0.425128031 0.64642318 0.9888637
vector-valued pred_horizon
input comes with minimal extra
computational cost. Use a fine grid of time values and assess whether
predictors have time-varying effects. (see partial dependence vignette
@@ -462,11 +451,11 @@
References
- Developed by Byron Jaeger.
+ Developed by Byron Jaeger.
diff --git a/reference/orsf_scale_cph.html b/reference/orsf_scale_cph.html
index bc835c5f..4b1f3da6 100644
--- a/reference/orsf_scale_cph.html
+++ b/reference/orsf_scale_cph.html
@@ -1,6 +1,6 @@
Scale input data — orsf_scale_cph • aorsf Scale input data — orsf_scale_cph • aorsf Univariate summary — orsf_summarize_uni • aorsf Univariate summary — orsf_summarize_uni • aorsf
@@ -216,11 +216,11 @@ Examples
- Developed by Byron Jaeger.
+ Developed by Byron Jaeger.
diff --git a/reference/orsf_time_to_train.html b/reference/orsf_time_to_train.html
index 8c73308d..43b315ee 100644
--- a/reference/orsf_time_to_train.html
+++ b/reference/orsf_time_to_train.html
@@ -1,5 +1,5 @@
-Estimate training time — orsf_time_to_train • aorsf Estimate training time — orsf_time_to_train • aorsf
@@ -96,7 +96,7 @@ Examplestime_estimated <- orsf_time_to_train(object, n_tree_subset=1)
print(time_estimated)
-#> Time difference of 0.04238844 secs
+#> Time difference of 0.04396915 secs
# let's see how close the approximation was
time_true_start <- Sys.time()
@@ -106,11 +106,11 @@ Examplestime_true <- time_true_stop - time_true_start
print(time_true)
-#> Time difference of 0.03808045 secs
+#> Time difference of 0.04506826 secs
# error
abs(time_true - time_estimated)
-#> Time difference of 0.004307985 secs
+#> Time difference of 0.00109911 secs
@@ -119,11 +119,11 @@ Examples
- Developed by Byron Jaeger.
+ Developed by Byron Jaeger.
diff --git a/reference/orsf_update.html b/reference/orsf_update.html
index 347bde5f..1e9b2de1 100644
--- a/reference/orsf_update.html
+++ b/reference/orsf_update.html
@@ -1,5 +1,5 @@
-Update Forest Parameters — orsf_update • aorsf Update Forest Parameters — orsf_update • aorsf
@@ -163,11 +163,11 @@ Examples
- Developed by Byron Jaeger.
+ Developed by Byron Jaeger.
n_predictors: the number of predictors used
stat_value: the out-of-bag statistic
variables_included: the names of the variables included
predictors_included: the names of the predictors included
predictor_dropped: the predictor selected to be dropped
tree_seeds
should be specified in object
so that each successive run
+
The difference between variables_included
and predictors_included
is
+referent coding. The variable
would be the name of a factor variable
+in the training data, while the predictor
would be the name of that
+same factor with the levels of the factor appended. For example, if
+the variable is diabetes
with levels = c("no", "yes")
, then the
+variable name is diabetes
and the predictor name is diabetes_yes
.
tree_seeds
should be specified in object
so that each successive run
of orsf
will be evaluated in the same out-of-bag samples as the initial
run.
Developed by Byron Jaeger.
+Developed by Byron Jaeger.
diff --git a/reference/pbc_orsf.html b/reference/pbc_orsf.html index 44ab7a30..b60efee4 100644 --- a/reference/pbc_orsf.html +++ b/reference/pbc_orsf.html @@ -1,6 +1,6 @@Developed by Byron Jaeger.
+Developed by Byron Jaeger.
diff --git a/reference/predict.ObliqueForest.html b/reference/predict.ObliqueForest.html index 09490041..8d8ba467 100644 --- a/reference/predict.ObliqueForest.html +++ b/reference/predict.ObliqueForest.html @@ -1,7 +1,7 @@Developed by Byron Jaeger.
+Developed by Byron Jaeger.
diff --git a/reference/print.orsf_summary_uni.html b/reference/print.orsf_summary_uni.html index 7f63d2a1..2446924c 100644 --- a/reference/print.orsf_summary_uni.html +++ b/reference/print.orsf_summary_uni.html @@ -1,5 +1,5 @@ -Developed by Byron Jaeger.
+Developed by Byron Jaeger.
diff --git a/search.json b/search.json index b218512e..e5a7031c 100644 --- a/search.json +++ b/search.json @@ -1 +1 @@ -[{"path":"https://bcjaeger.github.io/aorsf/CONTRIBUTING.html","id":null,"dir":"","previous_headings":"","what":"Contributing to aorsf","title":"Contributing to aorsf","text":"Want contribute aorsf? Great! aorsf initially stable state development, great deal active subsequent development envisioned. outline propose change aorsf. detailed info contributing , tidyverse packages, please see development contributing guide.","code":""},{"path":"https://bcjaeger.github.io/aorsf/CONTRIBUTING.html","id":"fixing-typos","dir":"","previous_headings":"","what":"Fixing typos","title":"Contributing to aorsf","text":"can fix typos, spelling mistakes, grammatical errors documentation directly using GitHub web interface, long changes made source file. generally means ’ll need edit roxygen2 comments .R, .Rd file. can find .R file generates .Rd reading comment first line.","code":""},{"path":"https://bcjaeger.github.io/aorsf/CONTRIBUTING.html","id":"bigger-changes","dir":"","previous_headings":"","what":"Bigger changes","title":"Contributing to aorsf","text":"want make bigger change, ’s good idea first file issue make sure someone team agrees ’s needed. ’ve found bug, please file issue illustrates bug minimal reprex (also help write unit test, needed).","code":""},{"path":"https://bcjaeger.github.io/aorsf/CONTRIBUTING.html","id":"pull-request-process","dir":"","previous_headings":"Bigger changes","what":"Pull request process","title":"Contributing to aorsf","text":"Fork package clone onto computer. haven’t done , recommend using usethis::create_from_github(\"ropensci/aorsf\", fork = TRUE). Install development dependencies devtools::install_dev_deps(), make sure package passes R CMD check running devtools::check(). R CMD check doesn’t pass cleanly, ’s good idea ask help continuing. Create Git branch pull request (PR). recommend using usethis::pr_init(\"brief-description--change\"). Make changes, commit git, create PR running usethis::pr_push(), following prompts browser. title PR briefly describe change. body PR contain Fixes #issue-number. user-facing changes, add bullet top NEWS.md (.e. just first header). Follow style described https://style.tidyverse.org/news.html.","code":""},{"path":"https://bcjaeger.github.io/aorsf/CONTRIBUTING.html","id":"code-style","dir":"","previous_headings":"Bigger changes","what":"Code style","title":"Contributing to aorsf","text":"New code follow tidyverse style guide. can use styler package apply styles, please don’t restyle code nothing PR. use roxygen2, Markdown syntax, documentation. use testthat unit tests. Contributions test cases included easier accept.","code":""},{"path":"https://bcjaeger.github.io/aorsf/CONTRIBUTING.html","id":"code-of-conduct","dir":"","previous_headings":"","what":"Code of Conduct","title":"Contributing to aorsf","text":"Please note aorsf project released Contributor Code Conduct. contributing project agree abide terms.","code":""},{"path":"https://bcjaeger.github.io/aorsf/LICENSE.html","id":null,"dir":"","previous_headings":"","what":"MIT License","title":"MIT License","text":"Copyright (c) 2022 aorsf authors (Byron C. Jaeger, Sawyer Welden, Nicholas M. Pajewski) Permission hereby granted, free charge, person obtaining copy software associated documentation files (“Software”), deal Software without restriction, including without limitation rights use, copy, modify, merge, publish, distribute, sublicense, /sell copies Software, permit persons Software furnished , subject following conditions: copyright notice permission notice shall included copies substantial portions Software. SOFTWARE PROVIDED “”, WITHOUT WARRANTY KIND, EXPRESS IMPLIED, INCLUDING LIMITED WARRANTIES MERCHANTABILITY, FITNESS PARTICULAR PURPOSE NONINFRINGEMENT. EVENT SHALL AUTHORS COPYRIGHT HOLDERS LIABLE CLAIM, DAMAGES LIABILITY, WHETHER ACTION CONTRACT, TORT OTHERWISE, ARISING , CONNECTION SOFTWARE USE DEALINGS SOFTWARE.","code":""},{"path":"https://bcjaeger.github.io/aorsf/articles/aorsf.html","id":"background","dir":"Articles","previous_headings":"","what":"Background","title":"Introduction to aorsf","text":"oblique random forest (RF) extension traditional (axis-based) RF. Instead using single variable split data grow new branches, trees oblique RF use weighted combination multiple variables.","code":""},{"path":"https://bcjaeger.github.io/aorsf/articles/aorsf.html","id":"oblique-rfs-for-survival-classification-and-regression","dir":"Articles","previous_headings":"","what":"Oblique RFs for survival, classification, and regression","title":"Introduction to aorsf","text":"purpose aorsf (‘’ short accelerated) provide unifying framework fit oblique RFs can scale adequately large data sets. fastest algorithms available package used default often equivalent prediction accuracy computational approaches. center piece aorsf orsf() function. initial versions aorsf, orsf() function fit oblique random survival forests, now allows classification, regression, survival forests. (may introduce orf() function future name orsf() misleading users.) classification, fit oblique RF predict penguin species using penguin data magnificent palmerpenguins R package regression, use data predict bill length penguins: personal favorite oblique survival RF accelerated Cox regression great combination prediction accuracy computational efficiency (see JCGS paper). , predict mortality risk following diagnosis primary biliary cirrhosis: may notice first input aorsf data. design choice makes easier use orsf pipes (.e., %>% |>). instance,","code":"# An oblique classification RF penguin_fit <- orsf(data = penguins_orsf, formula = species ~ .) penguin_fit #> ---------- Oblique random classification forest #> #> Linear combinations: Accelerated Logistic regression #> N observations: 333 #> N classes: 3 #> N trees: 500 #> N predictors total: 7 #> N predictors per node: 3 #> Average leaves per tree: 5.542 #> Min observations in leaf: 5 #> OOB stat value: 1.00 #> OOB stat type: AUC-ROC #> Variable importance: anova #> #> ----------------------------------------- # An oblique regression RF bill_fit <- orsf(data = penguins_orsf, formula = bill_length_mm ~ .) bill_fit #> ---------- Oblique random regression forest #> #> Linear combinations: Accelerated Linear regression #> N observations: 333 #> N trees: 500 #> N predictors total: 7 #> N predictors per node: 3 #> Average leaves per tree: 49.958 #> Min observations in leaf: 5 #> OOB stat value: 0.81 #> OOB stat type: RSQ #> Variable importance: anova #> #> ----------------------------------------- # An oblique survival RF pbc_fit <- orsf(data = pbc_orsf, n_tree = 5, formula = Surv(time, status) ~ . - id) pbc_fit #> ---------- Oblique random survival forest #> #> Linear combinations: Accelerated Cox regression #> N observations: 276 #> N events: 111 #> N trees: 5 #> N predictors total: 17 #> N predictors per node: 5 #> Average leaves per tree: 21.6 #> Min observations in leaf: 5 #> Min events in leaf: 1 #> OOB stat value: 0.77 #> OOB stat type: Harrell's C-index #> Variable importance: anova #> #> ----------------------------------------- library(dplyr) pbc_fit <- pbc_orsf |> select(-id) |> orsf(formula = Surv(time, status) ~ ., n_tree = 5)"},{"path":"https://bcjaeger.github.io/aorsf/articles/aorsf.html","id":"interpretation","dir":"Articles","previous_headings":"","what":"Interpretation","title":"Introduction to aorsf","text":"aorsf includes several functions dedicated interpretation ORSFs, estimation partial dependence variable importance.","code":""},{"path":"https://bcjaeger.github.io/aorsf/articles/aorsf.html","id":"variable-importance","dir":"Articles","previous_headings":"Interpretation","what":"Variable importance","title":"Introduction to aorsf","text":"multiple methods compute variable importance, can applied type oblique forest. compute negation importance, ORSF multiplies coefficient variable -1 re-computes --sample (sometimes referred --bag) accuracy ORSF model. can also compute variable importance using permutation, classical approach noises predictor assigned resulting degradation prediction accuracy importance predictor. faster alternative permutation negation importance ANOVA importance, computes proportion times variable obtains low p-value (p < 0.01) forest grown.","code":"orsf_vi_negate(pbc_fit) #> bili age copper ast sex #> 0.1468851774 0.0606952129 0.0246435580 0.0224269123 0.0175587328 #> trig alk.phos protime edema chol #> 0.0096895007 0.0093198869 0.0086039712 0.0006382134 -0.0015687436 #> ascites platelet hepato spiders trt #> -0.0060269468 -0.0102280228 -0.0108549805 -0.0113883544 -0.0201827916 #> stage albumin #> -0.0221462608 -0.0224072750 orsf_vi_permute(penguin_fit) #> bill_length_mm flipper_length_mm bill_depth_mm island #> 0.1724983056 0.1024126291 0.0751508005 0.0676077927 #> body_mass_g sex year #> 0.0626576714 0.0186787401 0.0009286133 orsf_vi_anova(bill_fit) #> species sex island flipper_length_mm #> 0.34861430 0.21055730 0.11626929 0.08843136 #> body_mass_g bill_depth_mm year #> 0.07642887 0.06077348 0.01475293"},{"path":"https://bcjaeger.github.io/aorsf/articles/aorsf.html","id":"partial-dependence-pd","dir":"Articles","previous_headings":"Interpretation","what":"Partial dependence (PD)","title":"Introduction to aorsf","text":"Partial dependence (PD) shows expected prediction model function single predictor multiple predictors. expectation marginalized values predictors, giving something like multivariable adjusted estimate model’s prediction. PD, see vignette","code":""},{"path":"https://bcjaeger.github.io/aorsf/articles/aorsf.html","id":"individual-conditional-expectations-ice","dir":"Articles","previous_headings":"Interpretation","what":"Individual conditional expectations (ICE)","title":"Introduction to aorsf","text":"Unlike partial dependence, shows expected prediction function one multiple predictors, individual conditional expectations (ICE) show prediction individual observation function predictor. ICE, see vignette","code":""},{"path":"https://bcjaeger.github.io/aorsf/articles/aorsf.html","id":"what-about-the-original-orsf","dir":"Articles","previous_headings":"","what":"What about the original ORSF?","title":"Introduction to aorsf","text":"original ORSF (.e., obliqueRSF) used glmnet find linear combinations inputs. aorsf allows users implement approach using orsf_control_survival(method = 'net') function: net forests fit lot faster original ORSF function obliqueRSF. However, net forests still much slower cph ones.","code":"orsf_net <- orsf(data = pbc_orsf, formula = Surv(time, status) ~ . - id, control = orsf_control_survival(method = 'net'))"},{"path":"https://bcjaeger.github.io/aorsf/articles/aorsf.html","id":"aorsf-and-other-machine-learning-software","dir":"Articles","previous_headings":"","what":"aorsf and other machine learning software","title":"Introduction to aorsf","text":"unique feature aorsf fast algorithms fit ORSF ensembles. RLT obliqueRSF fit oblique random survival forests, aorsf faster. ranger randomForestSRC fit survival forests, neither package supports oblique splitting. obliqueRF fits oblique random forests classification regression, survival. PPforest fits oblique random forests classification survival. Note: default prediction behavior aorsf models produce predicted risk specific prediction horizon, default ranger randomForestSRC. think change future, computing time independent predictions aorsf helpful.","code":""},{"path":"https://bcjaeger.github.io/aorsf/articles/aorsf.html","id":"learning-more","dir":"Articles","previous_headings":"","what":"Learning more","title":"Introduction to aorsf","text":"aorsf began dedicated package oblique random survival forests, papers published far focused survival analysis risk prediction. However, routines regression classification oblique RFs aorsf high overlap survival ones. See orsf details oblique random survival forests. see JCGS paper details algorithms used specifically aorsf.","code":""},{"path":"https://bcjaeger.github.io/aorsf/articles/fast.html","id":"go-faster","dir":"Articles","previous_headings":"","what":"Go faster","title":"Tips to speed up computation","text":"Analyses can slow crawl models need hours run. article find tricks prevent bottleneck using orsf().","code":""},{"path":"https://bcjaeger.github.io/aorsf/articles/fast.html","id":"dont-specify-a-control","dir":"Articles","previous_headings":"","what":"Don’t specify a control","title":"Tips to speed up computation","text":"default control orsf() NULL , unspecified, orsf() pick fastest possible control depending type forest grown. default control run-time compared approaches can striking. example:","code":"time_fast <- system.time( expr = orsf(pbc_orsf, formula = time+status~. -id, n_tree = 5) ) time_net <- system.time( expr = orsf(pbc_orsf, formula = time+status~. -id, control = orsf_control_survival(method = 'net'), n_tree = 5) ) # unspecified control is much faster time_net['elapsed'] / time_fast['elapsed'] #> elapsed #> 44.77273"},{"path":"https://bcjaeger.github.io/aorsf/articles/fast.html","id":"use-n_thread","dir":"Articles","previous_headings":"","what":"Use n_thread","title":"Tips to speed up computation","text":"n_thread argument uses multi-threading run aorsf functions parallel possible. know many threads want, e.g. want exactly 5, set n_thread = 5. aren’t sure many threads available want use feasible amount, using n_thread = 0 (default) tells aorsf . Note: sometimes multi-threading possible. example, R single threaded language, multi-threading applied orsf() needs call R functions C++, occurs customized R function used find linear combination variables compute prediction accuracy.","code":"# automatically pick number of threads based on amount available orsf(pbc_orsf, formula = time+status~. -id, n_tree = 5, n_thread = 0)"},{"path":"https://bcjaeger.github.io/aorsf/articles/fast.html","id":"do-less","dir":"Articles","previous_headings":"","what":"Do less","title":"Tips to speed up computation","text":"inputs orsf() can adjusted make run faster: set n_retry 0 set oobag_pred_type 'none' set importance 'none' increase split_min_events, split_min_obs, leaf_min_events, leaf_min_obs make trees stop growing sooner increase split_min_stat enforce strict requirements growing deeper trees. Applying tips: modifying inputs can make orsf() run faster, can also impact prediction accuracy.","code":"orsf(pbc_orsf, formula = time+status~., n_thread = 0, n_tree = 5, n_retry = 0, oobag_pred_type = 'none', importance = 'none', split_min_events = 20, leaf_min_events = 10, split_min_stat = 10)"},{"path":"https://bcjaeger.github.io/aorsf/articles/fast.html","id":"show-progress","dir":"Articles","previous_headings":"","what":"Show progress","title":"Tips to speed up computation","text":"Setting verbose_progress = TRUE doesn’t make anything run faster, can help make feel like things running less slow.","code":"verbose_fit <- orsf(pbc_orsf, formula = time+status~. -id, n_tree = 5, verbose_progress = TRUE) #> Growing trees: 100%. #> Computing predictions: 100%."},{"path":"https://bcjaeger.github.io/aorsf/articles/fast.html","id":"dont-wait--estimate","dir":"Articles","previous_headings":"","what":"Don’t wait. Estimate!","title":"Tips to speed up computation","text":"Instead running model hoping fast, can estimate long specification model take using no_fit = TRUE call orsf().","code":"fit_spec <- orsf(pbc_orsf, formula = time+status~. -id, control = orsf_control_survival(method = 'net'), n_tree = 2000, no_fit = TRUE) # how much time it takes to estimate training time: system.time( time_est <- orsf_time_to_train(fit_spec, n_tree_subset = 5) ) #> user system elapsed #> 0.287 0.004 0.291 # the estimated training time: time_est #> Time difference of 116.1021 secs"},{"path":"https://bcjaeger.github.io/aorsf/articles/oobag.html","id":"out-of-bag-data","dir":"Articles","previous_headings":"","what":"Out-of-bag data","title":"Out-of-bag predictions and evaluation","text":"random forests, tree grown bootstrapped version training set. bootstrap samples selected replacement, bootstrapped training set contains two-thirds instances original training set. ‘--bag’ data instances bootstrapped training set.","code":""},{"path":"https://bcjaeger.github.io/aorsf/articles/oobag.html","id":"out-of-bag-predictions-and-error","dir":"Articles","previous_headings":"","what":"Out-of-bag predictions and error","title":"Out-of-bag predictions and evaluation","text":"tree random forest can make predictions --bag data, --bag predictions can aggregated make ensemble --bag prediction. Since --bag data used grow tree, accuracy ensemble --bag predictions approximate generalization error random forest. --bag prediction error plays central role routines estimate variable importance, e.g. negation importance. fit oblique random survival forest plot distribution ensemble --bag predictions. Next, let’s check --bag accuracy fit: --bag estimate Harrell’s C-index (default method evaluate --bag predictions) 0.7419135.","code":"fit <- orsf(data = pbc_orsf, formula = Surv(time, status) ~ . - id, oobag_pred_type = 'surv', n_tree = 5, oobag_pred_horizon = 2000) hist(fit$pred_oobag, main = 'Out-of-bag survival predictions at t=2,000') # what function is used to evaluate out-of-bag predictions? fit$eval_oobag$stat_type #> [1] \"Harrell's C-index\" # what is the output from this function? fit$eval_oobag$stat_values #> [,1] #> [1,] 0.7419135"},{"path":"https://bcjaeger.github.io/aorsf/articles/oobag.html","id":"monitoring-out-of-bag-error","dir":"Articles","previous_headings":"","what":"Monitoring out-of-bag error","title":"Out-of-bag predictions and evaluation","text":"--bag data set contains one-third training set, --bag error estimate usually converges stable value trees added forest. want monitor convergence --bag error oblique random survival forest, can set oobag_eval_every compute --bag error every oobag_eval_every tree. example, let’s compute --bag error fitting tree forest 50 trees: general, least 500 trees recommended random forest fit. ’re just using 10 illustration.","code":"fit <- orsf(data = pbc_orsf, formula = Surv(time, status) ~ . - id, n_tree = 20, tree_seeds = 2, oobag_pred_type = 'surv', oobag_pred_horizon = 2000, oobag_eval_every = 1) plot( x = seq(1, 20, by = 1), y = fit$eval_oobag$stat_values, main = 'Out-of-bag C-statistic computed after each new tree is grown.', xlab = 'Number of trees grown', ylab = fit$eval_oobag$stat_type ) lines(x=seq(1, 20), y = fit$eval_oobag$stat_values)"},{"path":"https://bcjaeger.github.io/aorsf/articles/oobag.html","id":"user-supplied-out-of-bag-evaluation-functions","dir":"Articles","previous_headings":"","what":"User-supplied out-of-bag evaluation functions","title":"Out-of-bag predictions and evaluation","text":"cases, may want control --bag error estimated. example, let’s use Brier score SurvMetrics package: two ways apply function compute --bag error. First, can apply function --bag survival predictions stored ‘aorsf’ objects, e.g: Second, can pass function orsf(), used place Harrell’s C-statistic:","code":"oobag_brier_surv <- function(y_mat, w_vec, s_vec){ # use if SurvMetrics is available if(requireNamespace(\"SurvMetrics\")){ return( # output is numeric vector of length 1 as.numeric( SurvMetrics::Brier( object = Surv(time = y_mat[, 1], event = y_mat[, 2]), pre_sp = s_vec, # t_star in Brier() should match oob_pred_horizon in orsf() t_star = 2000 ) ) ) } # if not available, use a dummy version mean( (y_mat[,2] - (1-s_vec))^2 ) } oobag_brier_surv(y_mat = pbc_orsf[,c('time', 'status')], s_vec = fit$pred_oobag) #> Loading required namespace: SurvMetrics #> [1] 0.11869 # instead of copy/pasting the modeling code and then modifying it, # you can just use orsf_update. fit_brier <- orsf_update(fit, oobag_fun = oobag_brier_surv) plot( x = seq(1, 20, by = 1), y = fit_brier$eval_oobag$stat_values, main = 'Out-of-bag error computed after each new tree is grown.', sub = 'For the Brier score, lower values indicate more accurate predictions', xlab = 'Number of trees grown', ylab = \"Brier score\" ) lines(x=seq(1, 20), y = fit_brier$eval_oobag$stat_values)"},{"path":"https://bcjaeger.github.io/aorsf/articles/oobag.html","id":"specific-instructions-on-user-supplied-functions","dir":"Articles","previous_headings":"User-supplied out-of-bag evaluation functions","what":"Specific instructions on user-supplied functions","title":"Out-of-bag predictions and evaluation","text":"use oobag_fun note following: oobag_fun three inputs: y_mat, w_vec, s_vec survival trees, y_mat two column matrix first column named ‘time’ second named ‘status’. classification trees, y_mat matrix number columns = number distinct classes outcome. regression, y_mat matrix one column. s_vec numeric vector containing predictions oobag_fun return numeric output length 1","code":""},{"path":"https://bcjaeger.github.io/aorsf/articles/oobag.html","id":"notes","dir":"Articles","previous_headings":"","what":"Notes","title":"Out-of-bag predictions and evaluation","text":"evaluating --bag error: oobag_pred_horizon input orsf() determines prediction horizon --bag predictions. prediction horizon needs specified evaluate prediction accuracy cases, examples . sure check case using functions, , sure oobag_pred_horizon matches prediction horizon used custom function. functions expect predicted risk (.e., 1 - predicted survival), others expect predicted survival.","code":""},{"path":"https://bcjaeger.github.io/aorsf/articles/pd.html","id":"partial-dependence-pd","dir":"Articles","previous_headings":"","what":"Partial dependence (PD)","title":"PD and ICE curves with ORSF","text":"Partial dependence (PD) shows expected prediction model function single predictor multiple predictors. expectation marginalized values predictors, giving something like multivariable adjusted estimate model’s prediction. can compute PD individual conditional expectation (ICE) three ways: using -bag predictions training data. -bag PD indicates relationships model learned training. helpful goal interpret model. using --bag predictions training data. --bag PD indicates relationships model learned training using --bag data simulates application model new data. helpful want test model’s reliability fairness new data don’t access large testing set. using predictions new set data. New data PD shows model predicts outcomes observations seen. helpful want test model’s reliability fairness.","code":"library(aorsf) library(ggplot2)"},{"path":"https://bcjaeger.github.io/aorsf/articles/pd.html","id":"classification","dir":"Articles","previous_headings":"Partial dependence (PD)","what":"Classification","title":"PD and ICE curves with ORSF","text":"Begin fitting oblique classification random forest: Compute PD using --bag data flipper_length_mm = c(190, 210). Note predicted probabilities returned class probabilities mean column sum 1 take sum class specific value pred_spec variables. example, isn’t case median predicted probability!","code":"set.seed(329) index_train <- sample(nrow(penguins_orsf), 150) penguins_orsf_train <- penguins_orsf[index_train, ] penguins_orsf_test <- penguins_orsf[-index_train, ] fit_clsf <- orsf(data = penguins_orsf_train, formula = species ~ .) pred_spec <- list(flipper_length_mm = c(190, 210)) pd_oob <- orsf_pd_oob(fit_clsf, pred_spec = pred_spec) pd_oob #> Key: