Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use new k threshold #235

Merged
merged 48 commits into from
Feb 15, 2024
Merged
Show file tree
Hide file tree
Changes from 41 commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
09acf21
use new k threshold
avehtari Dec 28, 2023
51249bf
fix some of the tests (still some failures)
jgabry Jan 22, 2024
9778955
more updates
avehtari Jan 24, 2024
a54d577
Merge branch 'new-pareto-k-threshold' of github.com:stan-dev/loo into…
avehtari Jan 24, 2024
26e8a7b
fix some tests
avehtari Jan 24, 2024
5abf6e6
Apply suggestions from Noa's scode review
avehtari Jan 24, 2024
61c0725
explain 2200
avehtari Jan 24, 2024
6565f45
fix threshold in print
avehtari Jan 26, 2024
12373da
simplify use of ps_khat_threshold
avehtari Jan 26, 2024
7ffd67c
fix threshold argument handling
avehtari Jan 26, 2024
eaef505
fix tests
avehtari Jan 26, 2024
955b4d5
Apply suggestions from code review by jgabry
avehtari Jan 26, 2024
b36bc13
reduce r_eff warnings
avehtari Jan 28, 2024
a378f7f
doc and refs updates
avehtari Jan 28, 2024
57e9061
fix tests
avehtari Jan 28, 2024
77793cd
Merge branch 'new-pareto-k-threshold' of github.com:stan-dev/loo into…
avehtari Jan 28, 2024
0e03606
fixes suggested by Jonah
avehtari Jan 28, 2024
a329a13
regenerate doc
jgabry Jan 29, 2024
3366ad5
Apply suggestions from code review by n-kall
avehtari Jan 31, 2024
4013c15
another doc threshold update
avehtari Jan 31, 2024
2fb87a3
Merge branch 'new-pareto-k-threshold' of github.com:stan-dev/loo into…
avehtari Jan 31, 2024
e6fdfda
replace r_eff warning with informative message in print
avehtari Jan 31, 2024
eb68807
regenerate doc
jgabry Jan 31, 2024
8c0d449
a few minor doc edits
jgabry Jan 31, 2024
2af2fbf
update Rd files
jgabry Jan 31, 2024
35ca220
Merge branch 'master' into new-pareto-k-threshold
jgabry Jan 31, 2024
cf50f9f
save RDS files in a way compatible with older versions of R
jgabry Jan 31, 2024
be9dee1
update glossary and FAQ
avehtari Feb 1, 2024
448eb3e
fix r_eff default and doc in tis
avehtari Feb 1, 2024
52ce79e
print r_eff summary as part of mcse summary
avehtari Feb 1, 2024
bb7d3e2
Monte Carlo SE -> MCSE
avehtari Feb 1, 2024
4b6fdea
regenerate doc
jgabry Feb 1, 2024
900adb1
Merge branch 'new-pareto-k-threshold' of github.com:stan-dev/loo into…
avehtari Feb 1, 2024
be22253
Merge branch 'master' into new-pareto-k-threshold
jgabry Feb 2, 2024
90ab04a
fix print_reff_summary to work well with old objects
avehtari Feb 2, 2024
cfe1f74
4/9 vignettes fixed
avehtari Feb 2, 2024
122aa10
fixedd loo2-mixis vignette
avehtari Feb 2, 2024
df325d6
fixed loo2-non-factorized vignette
avehtari Feb 2, 2024
43632a8
fixed loo2-weights vignette
avehtari Feb 2, 2024
8cd84b0
fixed loo2-with-rstan vignette
avehtari Feb 2, 2024
133ea10
fixed loo2-lfo vignette
avehtari Feb 2, 2024
1a818f3
Remove forgotten testing line in vignettes/loo2-lfo.Rmd
avehtari Feb 5, 2024
2afe583
Merge branch 'master' into new-pareto-k-threshold
jgabry Feb 7, 2024
8d73b37
typo fix
avehtari Feb 7, 2024
5f78648
Add Aki's news items to NEWS.md with a few minor edits
jgabry Feb 7, 2024
372cb92
diagnostics.R: a few minor doc edits
jgabry Feb 7, 2024
bd33010
Merge branch 'master' into new-pareto-k-threshold
jgabry Feb 12, 2024
3c87b7f
Merge branch 'master' into new-pareto-k-threshold
jgabry Feb 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions R/crps.R
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ loo_crps.matrix <-
log_lik,
...,
permutations = 1,
r_eff = NULL,
r_eff = 1,
cores = getOption("mc.cores", 1)) {
validate_crps_input(x, x2, y, log_lik)
repeats <- replicate(permutations,
Expand Down Expand Up @@ -154,7 +154,7 @@ loo_scrps.matrix <-
log_lik,
...,
permutations = 1,
r_eff = NULL,
r_eff = 1,
cores = getOption("mc.cores", 1)) {
validate_crps_input(x, x2, y, log_lik)
repeats <- replicate(permutations,
Expand All @@ -175,7 +175,7 @@ EXX_compute <- function(x, x2) {
}


EXX_loo_compute <- function(x, x2, log_lik, r_eff = NULL, ...) {
EXX_loo_compute <- function(x, x2, log_lik, r_eff = 1, ...) {
S <- nrow(x)
shuffle <- sample (1:S)
x2 <- x2[shuffle,]
Expand Down
218 changes: 141 additions & 77 deletions R/diagnostics.R

Large diffs are not rendered by default.

7 changes: 3 additions & 4 deletions R/effective_sample_sizes.R
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,6 @@ psis_n_eff <- function(w, ...) {
psis_n_eff.default <- function(w, r_eff = NULL, ...) {
ss <- sum(w^2)
if (is.null(r_eff)) {
warning("PSIS n_eff not adjusted based on MCMC n_eff.", call. = FALSE)
return(1 / ss)
}
stopifnot(length(r_eff) == 1)
Expand All @@ -186,11 +185,11 @@ psis_n_eff.default <- function(w, r_eff = NULL, ...) {
psis_n_eff.matrix <- function(w, r_eff = NULL, ...) {
ss <- colSums(w^2)
if (is.null(r_eff)) {
warning("PSIS n_eff not adjusted based on MCMC n_eff.", call. = FALSE)
return(1 / ss)
}
if (length(r_eff) != length(ss))
stop("r_eff must have length ncol(w).", call. = FALSE)
if (length(r_eff) != length(ss) && length(r_eff) != 1) {
stop("r_eff must have length 1 or ncol(w).", call. = FALSE)
}
1 / ss * r_eff
}

Expand Down
2 changes: 1 addition & 1 deletion R/gpdfit.R
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ adjust_k_wip <- function(k, n) {
}


#' Inverse CDF of generalized pareto distribution
#' Inverse CDF of generalized Pareto distribution
#' (assuming location parameter is 0)
#'
#' @noRd
Expand Down
13 changes: 7 additions & 6 deletions R/importance_sampling.R
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ importance_sampling <- function(log_ratios, method, ...) {
importance_sampling.array <-
function(log_ratios, method,
...,
r_eff = NULL,
r_eff = 1,
cores = getOption("mc.cores", 1)) {
cores <- loo_cores(cores)
stopifnot(length(dim(log_ratios)) == 3)
Expand All @@ -36,7 +36,7 @@ importance_sampling.array <-
importance_sampling.matrix <-
function(log_ratios, method,
...,
r_eff = NULL,
r_eff = 1,
cores = getOption("mc.cores", 1)) {
cores <- loo_cores(cores)
assert_importance_sampling_method_is_implemented(method)
Expand All @@ -49,7 +49,7 @@ importance_sampling.matrix <-
#' @inheritParams psis
#' @export
importance_sampling.default <-
function(log_ratios, method, ..., r_eff = NULL) {
function(log_ratios, method, ..., r_eff = 1) {
stopifnot(is.null(dim(log_ratios)) || length(dim(log_ratios)) == 1)
assert_importance_sampling_method_is_implemented(method)
dim(log_ratios) <- c(length(log_ratios), 1)
Expand Down Expand Up @@ -128,7 +128,7 @@ implemented_is_methods <- function() c("psis", "tis", "sis")
#' but unnormalized.
#' @param pareto_k Vector of GPD k estimates.
#' @param tail_len Vector of tail lengths used to fit GPD.
#' @param r_eff Vector of relative MCMC n_eff for `exp(log lik)`
#' @param r_eff Vector of relative MCMC ESS (n_eff) for `exp(log lik)`
#' @template is_method
#' @return A list of class `"psis"` with structure described in the main doc at
#' the top of this file.
Expand All @@ -153,7 +153,7 @@ importance_sampling_object <-
out <- structure(
list(
log_weights = unnormalized_log_weights,
diagnostics = list(pareto_k = pareto_k, n_eff = NULL)
diagnostics = list(pareto_k = pareto_k, n_eff = NULL, r_eff = r_eff)
),
# attributes
norm_const_log = norm_const_log,
Expand Down Expand Up @@ -184,6 +184,7 @@ do_importance_sampling <- function(log_ratios, r_eff, cores, method) {
assert_importance_sampling_method_is_implemented(method)
N <- ncol(log_ratios)
S <- nrow(log_ratios)
k_threshold <- ps_khat_threshold(S)
tail_len <- n_pareto(r_eff, S)

if (method == "psis") {
Expand Down Expand Up @@ -223,7 +224,7 @@ do_importance_sampling <- function(log_ratios, r_eff, cores, method) {

log_weights <- psis_apply(lw_list, "log_weights", fun_val = numeric(S))
pareto_k <- psis_apply(lw_list, "pareto_k")
throw_pareto_warnings(pareto_k)
throw_pareto_warnings(pareto_k, k_threshold)

importance_sampling_object(
unnormalized_log_weights = log_weights,
Expand Down
71 changes: 52 additions & 19 deletions R/loo-glossary.R
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
#' @name loo-glossary
#'
#' @template loo-and-psis-references
#' @template loo-uncertainty-reference
#' @template bayesvis-reference
#'
#' @description
Expand Down Expand Up @@ -38,7 +39,8 @@
#' estimate is an accurate estimate for the scale, it ignores the skewness. When
#' making model comparisons, the SE of the component-wise (pairwise) differences
#' should be used instead (see the `se_diff` section below and Eq 24 in
#' VGG2017).
#' VGG2017). Sivula et al. (2022) discuss the conditions when the normal
#' approximation used for SE and `se_diff` is good.
#'
#' @section Monte Carlo SE of elpd_loo:
#'
Expand All @@ -62,41 +64,73 @@
#'
#' @section Pareto k estimates:
#'
#' The Pareto `k` estimate is a diagnostic for Pareto smoothed importance
#' The Pareto \eqn{k} estimate is a diagnostic for Pareto smoothed importance
#' sampling (PSIS), which is used to compute components of `elpd_loo`. In
#' importance-sampling LOO (the full posterior distribution is used as the
#' proposal distribution). The Pareto k diagnostic estimates how far an
#' importance-sampling LOO the full posterior distribution is used as the
#' proposal distribution. The Pareto k diagnostic estimates how far an
#' individual leave-one-out distribution is from the full distribution. If
#' leaving out an observation changes the posterior too much then importance
#' sampling is not able to give reliable estimate. If `k<0.5`, then the
#' corresponding component of `elpd_loo` is estimated with high accuracy.
#' If `0.5<k<0.7` the accuracy is lower, but still ok. If `k>0.7`,
#' then importance sampling is not able to provide useful estimate for that
#' component/observation. Pareto k is also useful as a measure of influence of
#' an observation. Highly influential observations have high k values. Very high
#' k values often indicate model misspecification, outliers or mistakes in data
#' processing. See Section 6 of Gabry et al. (2019) for an example.
#' sampling is not able to give a reliable estimate. Pareto smoothing stabilizes
#' importance sampling and guarantees a finite variance estimate at the
#' cost of some bias.
#'
#' The diagnostic threshold for Pareto \eqn{k} depends on sample size
#' \eqn{S} (sample size dependent threshold was introduced by Vehtari
#' et al., 2022, and before that fixed thresholds of 0.5 and 0.7 were
#' recommended). For simplicity, `loo` package uses the nominal sample
#' size \eqn{S} when computing the sample size specific
#' threshold. This provides an optimistic threshold if the effective
#' sample size is less than 2200, but even then if ESS/S > 1/2 the difference
#' is usually negligible. Thinning of MCMC draws can be used to improve
#' the ratio ESS/S.
#'
#' * If \eqn{k < min(1 - 1 / log10(S), 0.7)}, where \eqn{S} is the
#' sample size, the PSIS estimate and the corresponding Monte
#' Carlo standard error estimate are reliable.
#'
#' * If \eqn{1 - 1 / log10(S) <= k < 0.7}, the PSIS estimate and the
#' corresponding Monte Carlo standard error estimate are not
#' reliable, but increasing the (effective) sample size \eqn{S} above
#' 2200 may help (this will increase the sample size specific
#' threshold \eqn{(1 - 1 / log10(2200) > 0.7} and then the bias specific
#' threshold 0.7 dominates).
#'
#' * If \eqn{0.7 <= k < 1}, the PSIS estimate and the corresponding Monte
#' Carlo standard error have large bias and are not reliable. Increasing
#' the sample size may reduce the variability in the \eqn{k} estimate, which
#' may also result in a lower \eqn{k} estimate.
#'
#' * If \eqn{k \geq 1}{k >= 1}, the target distribution is estimated to
jgabry marked this conversation as resolved.
Show resolved Hide resolved
#' have non-finite mean. The PSIS estimate and the corresponding Monte
#' Carlo standard error are not well defined. Increasing the sample size
#' may reduce the variability in \eqn{k} estimate, which may also result in
#' a lower \eqn{k} estimate.
#'
#' Pareto \eqn{k} is also useful as a measure of influence of an
#' observation. Highly influential observations have high \eqn{k}
#' values. Very high \eqn{k} values often indicate model
#' misspecification, outliers or mistakes in data processing. See
#' Section 6 of Gabry et al. (2019) for an example.
#'
#' \subsection{Interpreting `p_loo` when Pareto `k` is large}{
#' If `k > 0.7` then we can also look at the `p_loo` estimate for
#' some additional information about the problem:
#' If \eqn{k > 0.7} then we can also look at
#' the `p_loo` estimate for some additional information about the problem:
#'
#' \itemize{
#' \item If `p_loo << p` (the total number of parameters in the model),
#' * If `p_loo << p` (the total number of parameters in the model),
#' then the model is likely to be misspecified. Posterior predictive checks
#' (PPCs) are then likely to also detect the problem. Try using an overdispersed
#' model, or add more structural information (nonlinearity, mixture model,
#' etc.).
#'
#' \item If `p_loo < p` and the number of parameters `p` is relatively
#' * If `p_loo < p` and the number of parameters `p` is relatively
#' large compared to the number of observations (e.g., `p>N/5`), it is
#' likely that the model is so flexible or the population prior so weak that it’s
#' difficult to predict the left out observation (even for the true model).
#' This happens, for example, in the simulated 8 schools (in VGG2017), random
#' effect models with a few observations per random effect, and Gaussian
#' processes and spatial models with short correlation lengths.
#'
#' \item If `p_loo > p`, then the model is likely to be badly misspecified.
#' * If `p_loo > p`, then the model is likely to be badly misspecified.
#' If the number of parameters `p<<N`, then PPCs are also likely to detect the
#' problem. See the case study at
#' <https://avehtari.github.io/modelselection/roaches.html> for an example.
Expand All @@ -106,7 +140,6 @@
#' may have few observations and other groups many), it is possible that PPCs won't
#' detect the problem.
#' }
#' }
#'
#' @section elpd_diff:
#' `elpd_diff` is the difference in `elpd_loo` for two models. If more
Expand Down
4 changes: 2 additions & 2 deletions R/loo-package.R
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
#' *Stan Development Team*
#'
#' This package implements the methods described in Vehtari, Gelman, and
#' Gabry (2017), Vehtari, Simpson, Gelman, Yao, and Gabry (2019), and
#' Gabry (2017), Vehtari, Simpson, Gelman, Yao, and Gabry (2022), and
#' Yao et al. (2018). To get started see the **loo** package
#' [vignettes](https://mc-stan.org/loo/articles/index.html), the
#' [loo()] function for efficient approximate leave-one-out
Expand All @@ -33,7 +33,7 @@
#' fast and stable computations for approximate LOO-CV laid out in Vehtari,
#' Gelman, and Gabry (2017). From existing posterior simulation draws, we
#' compute LOO-CV using Pareto smoothed importance sampling (PSIS; Vehtari,
#' Simpson, Gelman, Yao, and Gabry, 2019), a new procedure for stabilizing
#' Simpson, Gelman, Yao, and Gabry, 2022), a new procedure for stabilizing
#' and diagnosing importance weights. As a byproduct of our calculations,
#' we also obtain approximate standard errors for estimated predictive
#' errors and for comparing of predictive errors between two models.
Expand Down
Loading
Loading