Fix handling of unconventional variable names #97

zeyus · 2024-08-11T15:08:21Z

If you have a variable named something["z"][()] the loo* functions will fail, because it tries to split, but will get the wrong part, changing this to rsplit and limiting the number of elemetns to 2 resolves this and will always take the rightmost [] which contains the index.

yebai · 2024-08-12T15:20:03Z

I am not familiar with regular expressions. @sunxd3 / @penelopeysm, can you take a look?

sunxd3 · 2024-08-12T17:39:58Z

@zeyus this looks good, thanks a lot for the contribution.

Just for the ease of future maintenance, maybe we can move the function ind_from_string out of ParetoSmooth.pointwise_log_likelihoods and write some doctest?

penelopeysm

It could just be my purist tendencies, but I have a personal preference for "doing the right thing": string processing does give the right answer, but it's not technically what we want to do. As the function name suggests, we're trying to extract the index from an expression and Julia gives us tools to do that:

function ind_from_string(x)
    expr = Meta.parse(x)
    Meta.isexpr(expr, :ref) && expr.args[2] isa Int || error("invalid expression for ind_from_string: '$(x)'")
    return expr.args[2]
end

julia> ind_from_string("var[1][2]")
2

julia> ind_from_string("var[1]")
1

I usually feel better writing code like this because I can feel more confident that it accomplishes what I want it to do, rather than having to reason about whether string splitting behaves correctly in all cases.

Just a suggestion and not at all meant as a criticism of your PR, after all the original code was string processing too :) won't be offended at all if we stick to the current implementation

devmotion · 2024-08-13T06:39:41Z

IIRC there are quite a few suboptimal implementations in ParetoSmooth (even a few more in the function alone that is touched by this PR), but I haven't really used the package a lot, so there hasn't been a strong incentive from my side to improve things. IMO it would be good to make the code more robust and better tested. Or maybe just join forces with https://github.com/arviz-devs/PSIS.jl which (AFAIK) has better code quality and is better maintained.

devmotion · 2024-08-13T06:47:03Z

Regarding this PR more concretely: Can't we avoid parsing strings completely and just work with VarNames (as shown in https://github.com/TuringLang/DynamicPPL.jl/blob/3b8dceab76ef37d57d0abdecb205a5c46550679f/src/loglikelihoods.jl#L201-L206)? And do we actually have to sort, given that (nowadays) DynamicPPL.pointwise_loglikelihoods returns an OrderedDict?

zeyus · 2024-08-19T12:54:44Z

Regarding this PR more concretely: Can't we avoid parsing strings completely and just work with VarNames (as shown in https://github.com/TuringLang/DynamicPPL.jl/blob/3b8dceab76ef37d57d0abdecb205a5c46550679f/src/loglikelihoods.jl#L201-L206)? And do we actually have to sort, given that (nowadays) DynamicPPL.pointwise_loglikelihoods returns an OrderedDict?

I'm happy to change to this if that's the better option. I just made the PR based on the minor change I made to get it working. I'm not entirely sure yet how much work it would be to convert it to using VarName as I'm not familiar with the interface.

sethaxen · 2024-08-19T14:06:22Z

IIRC there are quite a few suboptimal implementations in ParetoSmooth (even a few more in the function alone that is touched by this PR), but I haven't really used the package a lot, so there hasn't been a strong incentive from my side to improve things. IMO it would be good to make the code more robust and better tested. Or maybe just join forces with https://github.com/arviz-devs/PSIS.jl which (AFAIK) has better code quality and is better maintained.

Sorry, slightly off-topic response. There were early attempts to unify ParetoSmooth and PSIS (offline discussions and #41, #42, #43, #44, #51), but it didn't go anywhere because I'm quite pleased with the PSIS.jl API and code quality, and it would be more work to unify the packages than it was to write PSIS.jl to begin with.

There's also the difference in scope; PSIS.jl is only an ultra-lightweight implementation of the PSIS method designed to be used by downstream packages (e.g. any VI package could use this; Pathfinder.jl uses it as a diagnostic). LOO is implemented in PosteriorStats.jl. PSIS.jl and PosteriorStats.jl provide no integration with PPLs or objects that store MCMCChains and won't in the future, being deliberately PPL-agnostic. PPL maintainers are expected to add integrations to their own packages, so that they can ensure these integrations are kept up-to-date. (e.g. InferenceObjects integrates with PosteriorStats, and MCMCChains will as well once I finish TuringLang/MCMCChains.jl#430). For loo or compare with Model inputs, DynamicPPL would need to add a PosteriorStats integration.

yebai · 2024-08-19T15:04:18Z

@sethaxen, apart from the integration with PPLs, are there any other differences between ParetoSmooth and PSIS?

sethaxen · 2024-08-19T15:51:38Z

@sethaxen, apart from the integration with PPLs, are there any other differences between ParetoSmooth and PSIS?

It's been a long time since I made a comparison, but besides the differences mentioned in #97 (comment) (scope, package weight, API, code quality, test coverage), there are a few differences in functionality:

they use different ESS estimates. ParetoSmooth's is more pessimistic, while PSIS's uses the one from the paper, which is connected with MCSE like the usual ESS.
PSIS also includes a diagnostic plot recipe: https://julia.arviz.org/PSIS/stable/plotting/
PosteriorStats includes Pseudo-BMA, bootstrapped Pseudo-BMA, and stacking model weighting methods for model comparison (see https://julia.arviz.org/PosteriorStats/stable/api/#Model-comparison). Stacking is currently the recommended approach in the literature. IIRC ParetoSmooth only implements Pseudo-BMA.
PosteriorStats also includes loo_pit for posterior predictive checks: https://julia.arviz.org/PosteriorStats/stable/api/#PosteriorStats.loo_pit

Both packages are outdated in their use of the diagnostics. PSIS is shortly getting updates from the recent PSIS paper revision.

yebai · 2024-08-19T16:05:39Z

Maybe we can make ParetoSmooth dependent on PSIS?

yebai · 2024-08-19T18:47:31Z

I think we should merge this PR, and move the other discussions to an issue.

devmotion · 2024-08-19T18:53:46Z

But is it clear whether the sorting is actually needed? #97 (comment)

sunxd3 · 2024-09-27T11:40:17Z

@devmotion if I understand your point correctly, are you suggesting we should give some kind of ordering to VarNames?

Without a isless, OrderedDict should still work (it just use the insertion order as the order), but if there is a need to do sort!(d::OrderedDict; kwargs), then we definitely need to add a isless implementation.

Fix handling of unconventional variable names

56e8523

yebai requested a review from penelopeysm August 11, 2024 17:39

penelopeysm reviewed Aug 12, 2024

View reviewed changes

yebai approved these changes Aug 19, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix handling of unconventional variable names #97

Fix handling of unconventional variable names #97

zeyus commented Aug 11, 2024

yebai commented Aug 12, 2024

sunxd3 commented Aug 12, 2024

penelopeysm left a comment •

edited

Loading

devmotion commented Aug 13, 2024

devmotion commented Aug 13, 2024

zeyus commented Aug 19, 2024

sethaxen commented Aug 19, 2024

yebai commented Aug 19, 2024

sethaxen commented Aug 19, 2024

yebai commented Aug 19, 2024

yebai commented Aug 19, 2024

devmotion commented Aug 19, 2024

sunxd3 commented Sep 27, 2024

Fix handling of unconventional variable names #97

Are you sure you want to change the base?

Fix handling of unconventional variable names #97

Conversation

zeyus commented Aug 11, 2024

yebai commented Aug 12, 2024

sunxd3 commented Aug 12, 2024

penelopeysm left a comment • edited Loading

Choose a reason for hiding this comment

devmotion commented Aug 13, 2024

devmotion commented Aug 13, 2024

zeyus commented Aug 19, 2024

sethaxen commented Aug 19, 2024

yebai commented Aug 19, 2024

sethaxen commented Aug 19, 2024

yebai commented Aug 19, 2024

yebai commented Aug 19, 2024

devmotion commented Aug 19, 2024

sunxd3 commented Sep 27, 2024

penelopeysm left a comment •

edited

Loading