Skip to content

Commit

Permalink
Add kfold
Browse files Browse the repository at this point in the history
  • Loading branch information
yoshidk6 committed Feb 17, 2025
1 parent 8849de5 commit 94f7bfb
Show file tree
Hide file tree
Showing 5 changed files with 41 additions and 3 deletions.
2 changes: 1 addition & 1 deletion _freeze/index/execute-results/html.json

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
{
"hash": "1ff2936cee198ac5f20ce8c1faa43eeb",
"hash": "f671c38ca978ddc112f1a55d39fabefb",
"result": {
"engine": "knitr",
"markdown": "# Model comparison between linear and E~max~\n\nThis page showcase how to compare model structures between linear and E~max~\nlogistic regression models\n\n## Setup and load\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(tidyverse)\nlibrary(BayesERtools)\nlibrary(loo)\nlibrary(here)\nlibrary(gt)\n\ntheme_set(theme_bw(base_size = 12))\n```\n:::\n\n\n\n\n\n\n\n## Load data\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndata(d_sim_binom_cov)\n\nd_sim_binom_cov_2 <-\n d_sim_binom_cov |>\n mutate(\n AUCss_1000 = AUCss / 1000, BAGE_10 = BAGE / 10,\n BWT_10 = BWT / 10, BHBA1C_5 = BHBA1C / 5,\n Dose = glue::glue(\"{Dose_mg} mg\")\n )\n\n# Grade 2+ hypoglycemia\ndf_er_ae_hgly2 <- d_sim_binom_cov_2 |> filter(AETYPE == \"hgly2\")\n\nvar_resp <- \"AEFLAG\"\nvar_exposure <- \"AUCss_1000\"\n```\n:::\n\n\n\n\n## Fit model\n\n::: {.panel-tabset}\n\n### Linear logistic regression\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nset.seed(1234)\nermod_bin_hgly2 <- dev_ermod_bin(\n data = df_er_ae_hgly2,\n var_resp = var_resp,\n var_exposure = var_exposure\n)\nermod_bin_hgly2\n```\n\n<pre class=\"r-output\"><code>\n<span style='color: #00BBBB;'>──</span> <span style='font-weight: bold;'>Binary ER model</span> <span style='color: #00BBBB;'>─────────────────────────────────────────────────────────────</span>\n<span style='color: #00BBBB;'>ℹ</span> Use `plot_er()` to visualize ER curve\n\n── <span style='font-weight: bold;'>Developed model</span> ──\n\nstan_glm\n family: binomial [logit]\n formula: AEFLAG ~ AUCss_1000\n observations: 500\n predictors: 2\n------\n Median MAD_SD\n(Intercept) -2.04 0.23 \nAUCss_1000 0.41 0.08 \n------\n* For help interpreting the printed output see ?print.stanreg\n* For info on the priors used see ?prior_summary.stanreg\n</code></pre>\n\n```{.r .cell-code}\nplot_er_gof(ermod_bin_hgly2, var_group = \"Dose\")\n```\n\n::: {.cell-output-display}\n![](mod_structure_comparison_files/figure-html/fig-bin-lin-1.png){#fig-bin-lin width=672}\n:::\n:::\n\n\n\n\n### E~max~ logistic regression\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nset.seed(1234)\nermod_bin_emax_hgly2 <- dev_ermod_bin_emax(\n data = df_er_ae_hgly2,\n var_resp = var_resp,\n var_exposure = var_exposure\n)\nermod_bin_emax_hgly2\n```\n\n<pre class=\"r-output\"><code>\n<span style='color: #00BBBB;'>──</span> <span style='font-weight: bold;'>Binary Emax model</span> <span style='color: #00BBBB;'>───────────────────────────────────────────────────────────</span>\n<span style='color: #00BBBB;'>ℹ</span> Use `plot_er()` to visualize ER curve\n\n── <span style='font-weight: bold;'>Developed model</span> ──\n\n---- Binary Emax model fit with rstanemax ----\n mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat\nemax 4.05 0.02 0.91 2.36 3.41 4.02 4.67 5.88 1563.98 1\ne0 -2.39 0.01 0.44 -3.40 -2.62 -2.33 -2.09 -1.68 1033.08 1\nec50 4.77 0.06 2.17 1.44 3.19 4.47 6.06 9.70 1325.28 1\ngamma 1.00 NaN 0.00 1.00 1.00 1.00 1.00 1.00 NaN NaN\n* Use `extract_stanfit()` function to extract raw stanfit object\n* Use `extract_param()` function to extract posterior draws of key parameters\n* Use `plot()` function to visualize model fit\n* Use `extract_obs_mod_frame()` function to extract raw data \n in a processed format (useful for plotting)\n</code></pre>\n\n```{.r .cell-code}\nplot_er_gof(ermod_bin_emax_hgly2, var_group = \"Dose\")\n```\n\n::: {.cell-output-display}\n![](mod_structure_comparison_files/figure-html/fig-bin-emax-1.png){#fig-bin-emax width=672}\n:::\n:::\n\n\n\n\n:::\n\n## Model comparison\n\nYou can perform model comparison based on expected log pointwise predictive density (ELPD).\nELPD is the Bayesian leave-one-out estimate (see ?`loo-glossary`).\n\nHigher ELPD is better, therefore linear logistic regression model appears better than E~max~ model.\nHowever, `elpd_diff` is small and similar to `se_diff` ([see here](https://stats.stackexchange.com/questions/608881/how-to-interpret-elpd-diff-of-bayesian-loo-estimate-in-bayesian-logistic-regress)),\ntherefore we can consider the difference to be not meaningful.\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nloo_bin_emax_hgly2 <- loo(ermod_bin_emax_hgly2)\nloo_bin_hgly2 <- loo(ermod_bin_hgly2)\n\nloo_compare(list(bin_emax_hgly2 = loo_bin_emax_hgly2, bin_hgly2 = loo_bin_hgly2))\n```\n\n<pre class=\"r-output\"><code> elpd_diff se_diff\nbin_hgly2 0.0 0.0 \nbin_emax_hgly2 -1.7 1.7 \n</code></pre>\n:::\n",
"markdown": "# Model comparison between linear and E~max~\n\nThis page showcase how to compare model structures between linear and E~max~\nlogistic regression models\n\n## Setup and load\n\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(tidyverse)\nlibrary(BayesERtools)\nlibrary(loo)\nlibrary(here)\nlibrary(gt)\n\ntheme_set(theme_bw(base_size = 12))\n```\n:::\n\n\n\n\n\n\n\n\n## Load data\n\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndata(d_sim_binom_cov)\n\nd_sim_binom_cov_2 <-\n d_sim_binom_cov |>\n mutate(\n AUCss_1000 = AUCss / 1000, BAGE_10 = BAGE / 10,\n BWT_10 = BWT / 10, BHBA1C_5 = BHBA1C / 5,\n Dose = glue::glue(\"{Dose_mg} mg\")\n )\n\n# Grade 2+ hypoglycemia\ndf_er_ae_hgly2 <- d_sim_binom_cov_2 |> filter(AETYPE == \"hgly2\")\n\nvar_resp <- \"AEFLAG\"\nvar_exposure <- \"AUCss_1000\"\n```\n:::\n\n\n\n\n\n## Fit model\n\n::: {.panel-tabset}\n\n### Linear logistic regression\n\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nset.seed(1234)\nermod_bin_hgly2 <- dev_ermod_bin(\n data = df_er_ae_hgly2,\n var_resp = var_resp,\n var_exposure = var_exposure\n)\nermod_bin_hgly2\n```\n\n<pre class=\"r-output\"><code>\n<span style='color: #00BBBB;'>──</span> <span style='font-weight: bold;'>Binary ER model</span> <span style='color: #00BBBB;'>─────────────────────────────────────────────────────────────</span>\n<span style='color: #00BBBB;'>ℹ</span> Use `plot_er()` to visualize ER curve\n\n── <span style='font-weight: bold;'>Developed model</span> ──\n\nstan_glm\n family: binomial [logit]\n formula: AEFLAG ~ AUCss_1000\n observations: 500\n predictors: 2\n------\n Median MAD_SD\n(Intercept) -2.04 0.23 \nAUCss_1000 0.41 0.08 \n------\n* For help interpreting the printed output see ?print.stanreg\n* For info on the priors used see ?prior_summary.stanreg\n</code></pre>\n\n```{.r .cell-code}\nplot_er_gof(ermod_bin_hgly2, var_group = \"Dose\")\n```\n\n::: {.cell-output-display}\n![](mod_structure_comparison_files/figure-html/fig-bin-lin-1.png){#fig-bin-lin width=672}\n:::\n:::\n\n\n\n\n\n### E~max~ logistic regression\n\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nset.seed(1234)\nermod_bin_emax_hgly2 <- dev_ermod_bin_emax(\n data = df_er_ae_hgly2,\n var_resp = var_resp,\n var_exposure = var_exposure\n)\nermod_bin_emax_hgly2\n```\n\n<pre class=\"r-output\"><code>\n<span style='color: #00BBBB;'>──</span> <span style='font-weight: bold;'>Binary Emax model</span> <span style='color: #00BBBB;'>───────────────────────────────────────────────────────────</span>\n<span style='color: #00BBBB;'>ℹ</span> Use `plot_er()` to visualize ER curve\n\n── <span style='font-weight: bold;'>Developed model</span> ──\n\n---- Binary Emax model fit with rstanemax ----\n mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat\nemax 4.05 0.02 0.91 2.36 3.41 4.02 4.67 5.88 1563.98 1\ne0 -2.39 0.01 0.44 -3.40 -2.62 -2.33 -2.09 -1.68 1033.08 1\nec50 4.77 0.06 2.17 1.44 3.19 4.47 6.06 9.70 1325.28 1\ngamma 1.00 NaN 0.00 1.00 1.00 1.00 1.00 1.00 NaN NaN\n* Use `extract_stanfit()` function to extract raw stanfit object\n* Use `extract_param()` function to extract posterior draws of key parameters\n* Use `plot()` function to visualize model fit\n* Use `extract_obs_mod_frame()` function to extract raw data \n in a processed format (useful for plotting)\n</code></pre>\n\n```{.r .cell-code}\nplot_er_gof(ermod_bin_emax_hgly2, var_group = \"Dose\")\n```\n\n::: {.cell-output-display}\n![](mod_structure_comparison_files/figure-html/fig-bin-emax-1.png){#fig-bin-emax width=672}\n:::\n:::\n\n\n\n\n\n:::\n\n## Model comparison\n\nYou can perform model comparison based on expected log pointwise predictive density (ELPD).\nELPD is the Bayesian leave-one-out estimate (see ?`loo-glossary`).\n\nHigher ELPD is better, therefore linear logistic regression model appears better than E~max~ model.\nHowever, `elpd_diff` is small and similar to `se_diff` ([see here](https://stats.stackexchange.com/questions/608881/how-to-interpret-elpd-diff-of-bayesian-loo-estimate-in-bayesian-logistic-regress)),\ntherefore we can consider the difference to be not meaningful.\n\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nloo_bin_emax_hgly2 <- loo(ermod_bin_emax_hgly2)\nloo_bin_hgly2 <- loo(ermod_bin_hgly2)\n\nloo_compare(list(bin_emax_hgly2 = loo_bin_emax_hgly2, bin_hgly2 = loo_bin_hgly2))\n```\n\n<pre class=\"r-output\"><code> elpd_diff se_diff\nbin_hgly2 0.0 0.0 \nbin_emax_hgly2 -1.7 1.7 \n</code></pre>\n:::\n\n\n\n\n\nSometimes, `loo()` shows warnings on Pareto k estimates, which indicates problems\nin approximation of ELPD.\nStarting from `BayesERtools` 0.2.2, ELPD can also be evaluated with k-fold cross-validation.\nWhile it tends to be slower than loo (especially the E~max~ models), \nthis will not face the challenge on approximation as written above.\n\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nset.seed(1234)\n\nkfold_bin_emax_hgly2 <- kfold(ermod_bin_emax_hgly2)\nkfold_bin_hgly2 <- kfold(ermod_bin_hgly2)\n\ncmp_bin_kfold <- loo_compare(list(bin_emax_hgly2 = kfold_bin_emax_hgly2, bin_hgly2 = kfold_bin_hgly2))\n```\n:::\n\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncmp_bin_kfold\n```\n\n<pre class=\"r-output\"><code> elpd_diff se_diff\nbin_hgly2 0.0 0.0 \nbin_emax_hgly2 -0.9 2.2 \n</code></pre>\n:::\n",
"supporting": [
"mod_structure_comparison_files"
],
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
35 changes: 35 additions & 0 deletions notebook/binary/mod_structure_comparison.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -100,3 +100,38 @@ loo_bin_hgly2 <- loo(ermod_bin_hgly2)
loo_compare(list(bin_emax_hgly2 = loo_bin_emax_hgly2, bin_hgly2 = loo_bin_hgly2))
```

Sometimes, `loo()` shows warnings on Pareto k estimates, which indicates problems
in approximation of ELPD.
Starting from `BayesERtools` 0.2.2, ELPD can also be evaluated with k-fold cross-validation.
While it tends to be slower than loo (especially the E~max~ models),
this will not face the challenge on approximation as written above.

```{r}
#| eval: FALSE
set.seed(1234)
kfold_bin_emax_hgly2 <- kfold(ermod_bin_emax_hgly2)
kfold_bin_hgly2 <- kfold(ermod_bin_hgly2)
cmp_bin_kfold <- loo_compare(list(bin_emax_hgly2 = kfold_bin_emax_hgly2, bin_hgly2 = kfold_bin_hgly2))
```

```{r}
#| eval: FALSE
#| include: FALSE
# Save the output in vignettes/data to avoid running it during the build.
saveRDS(cmp_bin_kfold, file = here("output", "cmp_bin_kfold.rds"))
```

```{r}
#| include: FALSE
cmp_bin_kfold <- readRDS(here("output", "cmp_bin_kfold.rds"))
```

```{r}
cmp_bin_kfold
```

3 changes: 3 additions & 0 deletions output/cmp_bin_kfold.rds
Git LFS file not shown

0 comments on commit 94f7bfb

Please sign in to comment.