use new k threshold #235

avehtari · 2023-12-28T12:30:56Z

Fixes #234

jgabry · 2024-01-16T18:30:29Z

I haven't had a chance to go through this in detail, but looks like it causes several tests to fail. Also I wonder if there will be any issue with reverse dependencies, but we can check that later.

jgabry · 2024-01-22T17:42:10Z

@avehtari CRAN wants a new release of loo before February 20th to fix some minor documentation issue that doesn't affect anything and is only flagged in the newest version of R. We can include this if we fix the tests and check reverse dependencies. I can check the reverse dependencies once everything is passing.

jgabry

These changes look good, except for a few issues I commented on.

R/diagnostics.R

avehtari · 2024-01-23T07:58:21Z

Yesterday, I started working on fixing the PR, and I'm sorry for being so sloppy the first time and wasting your time. As there were still some checks failing, I didn't push the changes that I had made so far. Thanks for the additional comments. l'll keep working on this today.

… new-pareto-k-threshold

avehtari · 2024-01-24T09:19:40Z

more updates on the code and docs, so a new review would be good
I used attribute to pass the threshold along the pareto k table for printing, is this ok?
moment matching test results have changed due to changed threshold, did not yet update
loo_subampling has some tests errors I haven't yet been able to solve, but I'll keep working on those, but I expect that the tests need to be fixed

n-kall

I went through the doc changes, and made a few minor suggestions

R/loo-glossary.R

R/diagnostics.R

R/loo-glossary.R

Co-authored-by: n-kall <[email protected]>

jgabry

Changes look good. I made a bunch of tiny suggestions.

R/psis.R

R/diagnostics.R

jgabry · 2024-01-24T20:19:35Z

R/diagnostics.R

-      graphics::points(x = if (use_n_eff) n_eff[k < 0.5] else k[k < 0.5],
-                       col = clrs[k < 0.5], pch = 3, cex = .6)
+      graphics::points(x = if (use_n_eff) n_eff[k < threshold] else k[k < threshold],
+                       col = clrs[k < threshold], pch = 3, cex = .6)


I haven't tested all the changes to the plotting code, but they look ok

Co-authored-by: Jonah Gabry <[email protected]>

avehtari · 2024-01-26T15:38:32Z

Tests are passing on my computer. I still need to write more documentation.

… new-pareto-k-threshold

codecov-commenter · 2024-01-28T14:48:54Z

Codecov Report

Attention: 9 lines in your changes are missing coverage. Please review.

Comparison is base (dab6e58) 93.00% compared to head (258adc8) 92.41%.

❗ Current head 258adc8 differs from pull request most recent head 372cb92. Consider uploading reports for the commit 372cb92 to get more accurate results

Files	Patch %	Lines
R/diagnostics.R	91.89%	3 Missing ⚠️
R/loo.R	71.42%	2 Missing ⚠️
R/psis.R	71.42%	2 Missing ⚠️
R/effective_sample_sizes.R	50.00%	1 Missing ⚠️
R/print.R	96.55%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #235      +/-   ##
==========================================
- Coverage   93.00%   92.41%   -0.59%     
==========================================
  Files          30       30              
  Lines        2788     2823      +35     
==========================================
+ Hits         2593     2609      +16     
- Misses        195      214      +19

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

avehtari · 2024-01-28T16:19:53Z

Improved docs, fixed a few more things in code

jgabry · 2024-01-29T21:56:33Z

Great, looks like tests are passing on GitHub too, it's just R cmd check is failing due to a documentation issue: https://github.com/stan-dev/loo/actions/runs/7686962192/job/20946444630?pr=235#step:6:167. It might actually be that the doc just needs to be regenerated to match the code.

avehtari · 2024-02-02T20:14:20Z

All the vignettes have been now updated. I couldn't run loo2-non-factorized vignette as installation of spdep package failed (only data used from that package), but there shouldn't be any issues with that vignette.

avehtari · 2024-02-02T20:14:40Z

Should be complete now

n-kall

Looked through the vignette changes. Looks good, only found one thing

vignettes/loo2-lfo.Rmd

Co-authored-by: n-kall <[email protected]>

jgabry

There's one problem with the changes to the mixis vignette that will cause an error (see review comment) but otherwise these vignette updates look ok to me

vignettes/loo2-mixis.Rmd

avehtari · 2024-02-07T18:21:16Z

Here is a proposal for a longer news item

There are two major changes in the new loo release

Use of new sample size specific diagnostic threshold for Pareto k. Pre-2022 PSIS paper recommended diagnostic thresholds as k<0.5 "good", 0.5 <= k <0.7 "ok", 0.7 <= k < 1 "bad", k>=1 "very bad". 2022 PSIS paper recommends k < min(1 - 1/log10(S), 0.7) "good", min(1 - 1/log10(S), 0.7) <= k < 1 "bad", k > 1 "very bad", where S is the sample size. There is one diagnostic threshold less, and the most important threshold now depends on the sample size S. With sample sizes 100, 320, 1000, 2200, 10000 the sample size specific part 1 - 1/log10(S) correspond to thresholds 0.5, 0.6, 0.67, 0.7, 0.75. Even if the sample size grows, the bias in PSIS estimate dominates if 0.7 <= k < 1, and thus the diagnostic threshold for good is capped at 0.7 (if k>1, mean does not exist and bias is not valid measure). The new recommended thresholds are based on more careful bias-variance analysis of PSIS based on truncated Pareto sums theory. For those who use the Stan default 4000 posterior draws the 0.7 threshold will be the same, but there will be less warnings as there will be no diagnostic message for 0.5 < k 0.7. Those who use small sample sizes, may see diagnostic messages with threshold less than 0.7, and they can simply increase the sample size to about 2200 to get the threshold 0.7.
There are no more warnings if r_eff argument is not provided, and the default is now 1. The summary print output showing MCSE and ESSs shows diagnostic information on the range of r_eff. The change was made to reduce unnecessary warnings. Use of r_eff does not change the expected value of elpd_loo, p_loo, and Pareto k', and is needed only to estimate MCSE and ESSs. Thus it is better to show the diagnostic information about the used r_eff only when MCSE and ESS values are shown.

jgabry · 2024-02-07T20:33:12Z

Here is a proposal for a longer news item

These look good. I made a few tiny edits and added them to the NEWS file. I will update the rest of the NEWS file to reflect some other changes we made in other PRs before releasing.

jgabry

I can't think of anything else we need to do before merging this. @avehtari or @n-kall, do you have anything else or should I go ahead and merge? (I will still wait about a week to do the release.)

n-kall · 2024-02-07T21:24:03Z

do you have anything else or should I go ahead and merge?

Nothing that I can think of from my side. I think the next related thing will be to update the documentation for posterior::pareto_khat (and others) to match these updates, but I don't see any necessary changes needed in loo for that

jgabry · 2024-02-08T00:33:41Z

Nothing that I can think of from my side.

Ok cool. And thanks for helping to review this.

I think the next related thing will be to update the documentation for posterior::pareto_khat (and others) to match these updates, but I don't see any necessary changes needed in loo for that

Sounds good, thanks.

jgabry · 2024-02-13T16:38:32Z

@avehtari If I submit a bayesplot release with Teemu's fix for his recent PR, is that sufficient to fix the issue with the pit plots in the loo vignette or does the vignettes need additional editing after that? I think I will go ahead and submit that patch release of bayesplot today or tomorrow (it only contains that one bug fix).

avehtari · 2024-02-14T10:27:20Z

Teemu's bayesplot PR and fix for discrete data PIT affects ppc_pit_ecdf*, but not ppc_loo_pit_qq. Teemu's rsantools PR stan-dev/rstantools#121 fixes loo_pit and thus fixes ppc_loo_pit_qq used in loo2-example.Rmd

jgabry · 2024-02-14T15:57:28Z

Oh ok thanks, that makes sense.

jgabry · 2024-02-15T16:38:17Z

Merging now!

use new k threshold

09acf21

avehtari requested a review from jgabry December 28, 2023 12:31

jgabry reviewed Jan 22, 2024

View reviewed changes

R/diagnostics.R Outdated Show resolved Hide resolved

R/diagnostics.R Outdated Show resolved Hide resolved

R/diagnostics.R Outdated Show resolved Hide resolved

jgabry reviewed Jan 22, 2024

View reviewed changes

R/diagnostics.R Outdated Show resolved Hide resolved

fix some of the tests (still some failures)

51249bf

avehtari added 3 commits January 24, 2024 10:01

more updates

9778955

Merge branch 'new-pareto-k-threshold' of github.com:stan-dev/loo into…

a54d577

… new-pareto-k-threshold

fix some tests

26e8a7b

n-kall reviewed Jan 24, 2024

View reviewed changes

avehtari and others added 2 commits January 24, 2024 15:11

Apply suggestions from Noa's scode review

5abf6e6

Co-authored-by: n-kall <[email protected]>

explain 2200

61c0725

jgabry reviewed Jan 24, 2024

View reviewed changes

avehtari and others added 5 commits January 26, 2024 16:01

fix threshold in print

6565f45

simplify use of ps_khat_threshold

12373da

fix threshold argument handling

7ffd67c

fix tests

eaef505

Apply suggestions from code review by jgabry

955b4d5

Co-authored-by: Jonah Gabry <[email protected]>

avehtari added 4 commits January 28, 2024 15:49

reduce r_eff warnings

b36bc13

doc and refs updates

a378f7f

fix tests

57e9061

Merge branch 'new-pareto-k-threshold' of github.com:stan-dev/loo into…

77793cd

… new-pareto-k-threshold

fixes suggested by Jonah

0e03606

avehtari added 6 commits February 2, 2024 20:58

4/9 vignettes fixed

cfe1f74

fixedd loo2-mixis vignette

122aa10

fixed loo2-non-factorized vignette

df325d6

fixed loo2-weights vignette

43632a8

fixed loo2-with-rstan vignette

8cd84b0

fixed loo2-lfo vignette

133ea10

n-kall reviewed Feb 5, 2024

View reviewed changes

vignettes/loo2-lfo.Rmd Outdated Show resolved Hide resolved

avehtari and others added 2 commits February 5, 2024 11:48

Remove forgotten testing line in vignettes/loo2-lfo.Rmd

1a818f3

Co-authored-by: n-kall <[email protected]>

Merge branch 'master' into new-pareto-k-threshold

2afe583

jgabry requested changes Feb 7, 2024

View reviewed changes

vignettes/loo2-mixis.Rmd Outdated Show resolved Hide resolved

typo fix

8d73b37

Add Aki's news items to NEWS.md with a few minor edits

5f78648

diagnostics.R: a few minor doc edits

372cb92

jgabry approved these changes Feb 7, 2024

View reviewed changes

jgabry added 2 commits February 12, 2024 16:40

Merge branch 'master' into new-pareto-k-threshold

bd33010

Merge branch 'master' into new-pareto-k-threshold

3c87b7f

jgabry merged commit ce0f540 into master Feb 15, 2024
6 checks passed

jgabry deleted the new-pareto-k-threshold branch February 15, 2024 16:39

avehtari mentioned this pull request Feb 17, 2024

loo.function should not require a r_eff argument #106

Open

avehtari mentioned this pull request Mar 5, 2024

Change n_eff to ESS #192

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use new k threshold #235

use new k threshold #235

avehtari commented Dec 28, 2023 •

edited

Loading

jgabry commented Jan 16, 2024

jgabry commented Jan 22, 2024

jgabry left a comment •

edited

Loading

avehtari commented Jan 23, 2024

avehtari commented Jan 24, 2024

n-kall left a comment

jgabry left a comment

jgabry Jan 24, 2024 •

edited

Loading

avehtari commented Jan 26, 2024

codecov-commenter commented Jan 28, 2024 •

edited

Loading

avehtari commented Jan 28, 2024

jgabry commented Jan 29, 2024

avehtari commented Feb 2, 2024

avehtari commented Feb 2, 2024

n-kall left a comment

jgabry left a comment •

edited

Loading

avehtari commented Feb 7, 2024

jgabry commented Feb 7, 2024

jgabry left a comment

n-kall commented Feb 7, 2024

jgabry commented Feb 8, 2024

jgabry commented Feb 13, 2024

avehtari commented Feb 14, 2024

jgabry commented Feb 14, 2024

jgabry commented Feb 15, 2024

use new k threshold #235

use new k threshold #235

Conversation

avehtari commented Dec 28, 2023 • edited Loading

jgabry commented Jan 16, 2024

jgabry commented Jan 22, 2024

jgabry left a comment • edited Loading

Choose a reason for hiding this comment

avehtari commented Jan 23, 2024

avehtari commented Jan 24, 2024

n-kall left a comment

Choose a reason for hiding this comment

jgabry left a comment

Choose a reason for hiding this comment

jgabry Jan 24, 2024 • edited Loading

Choose a reason for hiding this comment

avehtari commented Jan 26, 2024

codecov-commenter commented Jan 28, 2024 • edited Loading

Codecov Report

avehtari commented Jan 28, 2024

jgabry commented Jan 29, 2024

avehtari commented Feb 2, 2024

avehtari commented Feb 2, 2024

n-kall left a comment

Choose a reason for hiding this comment

jgabry left a comment • edited Loading

Choose a reason for hiding this comment

avehtari commented Feb 7, 2024

jgabry commented Feb 7, 2024

jgabry left a comment

Choose a reason for hiding this comment

n-kall commented Feb 7, 2024

jgabry commented Feb 8, 2024

jgabry commented Feb 13, 2024

avehtari commented Feb 14, 2024

jgabry commented Feb 14, 2024

jgabry commented Feb 15, 2024

avehtari commented Dec 28, 2023 •

edited

Loading

jgabry left a comment •

edited

Loading

jgabry Jan 24, 2024 •

edited

Loading

codecov-commenter commented Jan 28, 2024 •

edited

Loading

jgabry left a comment •

edited

Loading