Import and extend PosteriorStats #431

sethaxen · 2023-08-20T14:55:26Z

This PR makes the following replacements everywhere:

hpd -> PosteriorStats.hdi (hpd retains its old behavior and is now deprecated)
summarize -> PosteriorStats.summarize
ChainDataFrame -> PosteriorStats.SummaryStats

The replacement of ChainDataFrame and the slight change in API and behavior of the methods makes this a breaking change.

Implements #430

e.g.

julia> val = rand(500, 2, 3);

julia> chn
Chains MCMC chain (500×2×3 Array{Float64, 3}):

Iterations        = 1:1:500
Number of chains  = 3
Samples per chain = 500
parameters        = param_1, param_2


Summary Statistics
          mean    std    hdi_3%  hdi_97%  mcse_mean  mcse_std  ess_tail  ess_bulk  rhat 
 param_1  0.50  0.283  0.00745     0.929     0.0073    0.0034      1480      1531  1.00
 param_2  0.49  0.285  0.000140    0.926     0.0074    0.0034      1499      1506  1.00

Quantiles
            2.5%  25.0%  50.0%  75.0%  97.5% 
 param_1  0.0252  0.254  0.510  0.745  0.969
 param_2  0.0180  0.242  0.483  0.727  0.969

julia> hdi(chn)
HDI
             lower  upper 
 param_1  0.00745   0.929
 param_2  0.000140  0.926

julia> describe(chn)
2-element Vector{SummaryStats}:
 Summary Statistics (2 rows, 10 cols)
 Quantiles (2 rows, 6 cols)

sethaxen · 2023-08-20T14:57:21Z

src/stats.jl

    # Compute the change rates.
    changerates, mvchangerate = changerate(chains)

    # Summarize the results in a named tuple.
-    nt = (; zip(names_of_params, changerates)..., multivariate = mvchangerate)


Lacking a parameter column meant the show method was broken. But since there is a changerate for every parameter, it makes more sense to do the same thing as gelmandiag_multivariate and return a SummaryStats for the marginal values and return the multivariate changerate separately.

Genuine question: what is the change-rate in this context?

From inspecting the code, it's, for each parameter and chain, the fraction of draws that are different from the previous draw. I suppose it's similar to "acceptance rate."

sethaxen · 2023-12-24T07:32:48Z

@devmotion @torfjelde This is ready for final review.

sethaxen · 2024-02-01T08:53:00Z

Currently PosteriorStats has no try/catch mechanism for if a given statistic fails. That causes chains with less than 10 draws to error upon computation of summary stats (or display). For PosteriorStats that makes sense, but it's a major nuisance for MCMCChains. I actually think this has come up enough that it makes sense for MCMCDiagnosticTools to raise an informative warning and return NaNs in such cases. Will open an issue there.

devmotion · 2024-02-01T12:00:40Z

Oh, I missed #431 (comment), probably due to the Christmas break. I added the PR to my todo list 🙂

src/MCMCChains.jl

src/chains.jl

devmotion · 2024-02-03T01:16:43Z

src/plot.jl

@@ -64,7 +64,7 @@ const supportedplots = push!(collect(keys(translationdict)), :mixeddensity, :cor
        lags = 0:(maxlag === nothing ? round(Int, 10 * log10(length(range(c)))) : maxlag)
        # Chains are already appended in `c` if desired, hence we use `append_chains=false`
        ac = autocor(c; sections = nothing, lags = lags, append_chains=false)
-        ac_mat = convert(Array, ac)
+        ac_mat = stack(map(stack ∘ Base.Fix2(Iterators.drop, 1), ac))


This seems a bit inefficient (particular the use of stack(map(stack) - maybe we should preallocate the desired array and copy directly? Possibly a utility function could be added to PosteriorStats?

It may not be the most efficient, but compared to the actual autocor call, it should not be the computational bottleneck, and not using stack requires substantially more code.

It might make sense to add to PosteriorStats a function that converts a SummaryStats to a matrix, but this would effectively do the same thing as the inner stack. What kind of utility function do you have in mind?

What kind of utility function do you have in mind?

Basically a convert, Array/Matrix constructor, or other function that performs the same thing (if the first two alternatives are not desirable). So that downstream packages (such as MCMCChains) don't have to deal with internals of SummaryStats.

This line is not using any internals of SummaryStats, whose docstring explains it is an OrderedDict-like Table that can be iterated/indexed by its column names.

I think the methods we're missing in PosteriorStats are

function Base.getindex(stats::SummaryStats, cols::Union{Colon,AbstractVector{Int},AbstractVector{Symbol}}) cols isa Colon && return Tables.matrix(stats) return stack(Tables.getcolumn(stats, k) for k in cols) end Base.firstindex(s::SummaryStats) = 1 Base.lastindex(s::SummaryStats) = length(s)

Then this would be

stack(ac_i[2:end] for ac_i in ac)

but this is no more efficient.

This line is not using any internals of SummaryStats, whose docstring explains it is an OrderedDict-like Table that can be iterated/indexed by its column names.

I was referring to the 2:end which seems a bit special. I missed that the docstring mentions that the first column is reserved for parameter names.

devmotion · 2024-02-03T01:18:46Z

src/plot.jl

    ordered = false
 )

-    chain_dic = Dict(zip(quantile(chains)[:,1], quantile(chains)[:,4]))
+    chain_dic = Dict(zip(quantile(chains)[2], quantile(chains)[5]))


Maybe a good opportunity to make the code a bit more efficient:

Suggested change

chain_dic = Dict(zip(quantile(chains)[2], quantile(chains)[5]))

quantile_chains = quantile(chains)

chain_dic = Dict(zip(quantile_chains[2], quantile_chains[5]))

(it would also be nice if it would be possible to use something more descriptive than 2 and 5)

Hmm, yeah this and the code immediately below are a bit of a mess. Will rework to use the new methods.

Yeah it was able to be simplified a lot. But also, for n parameters, it was computing the median n^2 times. This has now been fixed.

torfjelde

Added a few comments!

Am very excited about this though; bloody great stuff @sethaxen ❤️

src/MCMCChains.jl

src/discretediag.jl

torfjelde · 2024-02-10T14:00:54Z

src/stats.jl


-Return the highest posterior density interval representing `1-alpha` probability mass.
+Return the unimodal highest density interval (HDI) representing `prob` probability mass.


What's the meaning of "unimodal" here?

The default estimator used by hdi assumes that the distribution is unimodal. The alternative (not yet in PosteriorStats but doable with HighestDensityRegions.jl) for multimodal distributions first fits a KDE and then partitions into one or more intervals by density. For fast summary statistics, the unimodal version is more useful.

torfjelde · 2024-02-10T14:05:54Z

src/stats.jl

    # Compute the change rates.
    changerates, mvchangerate = changerate(chains)

    # Summarize the results in a named tuple.
-    nt = (; zip(names_of_params, changerates)..., multivariate = mvchangerate)


Genuine question: what is the change-rate in this context?

src/stats.jl

Co-authored-by: David Widmann <[email protected]> Co-authored-by: Tor Erlend Fjelde <[email protected]>

…ains.jl into posteriorstats

sethaxen · 2024-02-13T09:47:24Z

This is ready for another review. Also, it would be nice to get input on arviz-devs/PosteriorStats.jl#25, since that's relevant for this PR.

yebai · 2024-08-28T15:21:19Z

@penelopeysm can you help review this PR, and get it merged?

devmotion · 2024-08-28T17:05:52Z

I thought arviz-devs/PosteriorStats.jl#25 holds back this PR?

sethaxen · 2024-08-28T17:58:19Z

@devmotion is right, I'd like to merge that breaking change and also some breaking changes to make the SummaryStats object easier to interact with (and a little more like ChainDataFrame). Otherwise when I make those changes to PosteriorStats, it would require another major version bump here and a minor one for Turing.

After a long break I'm back to working on arviz PRs so I should be able to wrap that up in the next couple of weeks.

sethaxen added 21 commits August 20, 2023 15:45

Add PosteriorStats as dependency

43a5630

Import and reexport PosteriorStats functions

2994a56

Forward to PosteriorStats.summarize

9ffb60b

Update docstring

a6d16e5

Update docstring

4e99fe8

Forward summarystats to summarize

38dbdac

Simplify mean implementation

395a3d8

Simplify quantile implementation

4fa204e

Replace hpd with hdi

5c0f35d

Deprecate hpd

3660d0e

Simplify autocor implementation

e3d2d16

Remove unused keyword etype

49af2d9

Explicitly build list of stats

bdde660

Simultaneously compute all quantiles

d851307

Print an extra newline

1717483

Use and export SummaryStats

bf06653

Use SummaryStats in place of ChainsDataFrame

147b56b

Update and repair changerate

706f29c

Remove ChainDataFrame

2c07794

Update docs

58ae52a

Increment major version

340a694

sethaxen commented Aug 20, 2023

View reviewed changes

sethaxen added 8 commits August 20, 2023 19:23

Increment MCMCChains compat for docs

a67ac11

Refer to processed chains

1af83c8

Fix doctest

4aed409

Add back append_chains keyword

eef9393

Compute all lags simultaneously

6a1b482

Vectorize before autocor

c7d2021

Correctly insert chain id into name

bfe8deb

Update diagnostic tests

646e108

sethaxen added 9 commits December 23, 2023 18:01

Make stack available for older Julia versions

02be9c7

Update SummaryStats constructor user

ea2548e

Use dict backing for cor summary

929c654

Remove unused kwargs

5f7aa7c

Refactor autocor to avoid large namedtuple

49fe1c8

Use stack for autocor to avoid large compile times

1cef3ba

Improve type inference of OrderedDict

17e08ed

Make doctest reproducible

1e3e96a

Add StableRNGs as test dependency

d001894

sethaxen mentioned this pull request Feb 1, 2024

Simple chains throw errors when displayed #447

Closed

devmotion reviewed Feb 3, 2024

View reviewed changes

Merge branch 'master' into posteriorstats

d6bb4ee

torfjelde reviewed Feb 10, 2024

View reviewed changes

sethaxen and others added 6 commits February 11, 2024 00:47

Apply suggestions from code review

b3a3143

Co-authored-by: David Widmann <[email protected]> Co-authored-by: Tor Erlend Fjelde <[email protected]>

Merge branch 'posteriorstats' of https://github.com/TuringLang/MCMCCh…

67f7eaa

…ains.jl into posteriorstats

Merge branch 'master' into posteriorstats

1fbe8ad

Avoid splatting

0fad7f9

Avoid recomputing all medians for every parameter

3579c55

Add test for show method

5332916

sethaxen mentioned this pull request Feb 11, 2024

Nicer representations of intervals arviz-devs/PosteriorStats.jl#25

Closed

yebai assigned penelopeysm Aug 28, 2024

sethaxen added 2 commits November 17, 2024 21:55

Merge branch 'master' into posteriorstats

d61c535

Update ess_rhat tests

d488220

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Import and extend PosteriorStats #431

Import and extend PosteriorStats #431

sethaxen commented Aug 20, 2023 •

edited

Loading

sethaxen Aug 20, 2023

torfjelde Feb 10, 2024

sethaxen Feb 10, 2024

sethaxen commented Dec 24, 2023 •

edited

Loading

sethaxen commented Feb 1, 2024

devmotion commented Feb 1, 2024

devmotion Feb 3, 2024

sethaxen Feb 6, 2024

devmotion Feb 6, 2024 •

edited

Loading

sethaxen Feb 6, 2024

devmotion Feb 10, 2024

devmotion Feb 3, 2024

sethaxen Feb 6, 2024

sethaxen Feb 11, 2024

torfjelde left a comment

torfjelde Feb 10, 2024

sethaxen Feb 10, 2024

torfjelde Feb 10, 2024

sethaxen commented Feb 13, 2024

yebai commented Aug 28, 2024

devmotion commented Aug 28, 2024

sethaxen commented Aug 28, 2024

	chain_dic = Dict(zip(quantile(chains)[2], quantile(chains)[5]))
	quantile_chains = quantile(chains)
	chain_dic = Dict(zip(quantile_chains[2], quantile_chains[5]))


		Return the highest posterior density interval representing `1-alpha` probability mass.
		Return the unimodal highest density interval (HDI) representing `prob` probability mass.

Import and extend PosteriorStats #431

Are you sure you want to change the base?

Import and extend PosteriorStats #431

Conversation

sethaxen commented Aug 20, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sethaxen commented Dec 24, 2023 • edited Loading

sethaxen commented Feb 1, 2024

devmotion commented Feb 1, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

devmotion Feb 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

torfjelde left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sethaxen commented Feb 13, 2024

yebai commented Aug 28, 2024

devmotion commented Aug 28, 2024

sethaxen commented Aug 28, 2024

sethaxen commented Aug 20, 2023 •

edited

Loading

sethaxen commented Dec 24, 2023 •

edited

Loading

devmotion Feb 6, 2024 •

edited

Loading