-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question/feature: perform posterior inference #83
Comments
Yes, our main use for this library is posterior inference and it currently works very well by training on samples on the joint distribution as in your setup. Im in the middle of travel, but tomorrow I will send a script on how to train on the training pairs you made. |
Awesome. I would very much appreciate an example. Also, does it support multivariate distributions? |
An example would be great. I've been looking into doing Variational Inference with Normalizing Flows but I have yet to find a complete Julia example. |
Started a PR so that we can add some more examples. Started with the setup that @itsdfish gave. The basic script is here https://github.com/slimgroup/InvertibleNetworks.jl/pull/84/files#diff-231dfb51cc0355965e85e0fedb61956dd6e52e86eb974de512ede26feb55d8b6
And after two epochs, the posterior samples look like this for mu=1 std=1 @itsdfish yes it is quite good for multivariate distributions. I actually think that the networks we have shine best on posterior distributions over images because they are currently mostly conv net based. Although, we are working on dense network for 1d inputs in #77 Will be happy to help anyone get these working for their specific applications and then we can mold them to nice examples. |
Thank you for providing an example. This is very helpful. While looking through the code, I thought of a few high level questions. I hope you don't mind me asking some basic questions (machine learning is outside of my expertise).
Thanks again! |
Dont mind at all! thank you for the interest and good questions: 1.) Yes it is possible. In general you need a summary statistic that transorms arbitrary sized observations to a single fixed size. Either learned summary statistic (summary network) like in BayesFlow or a transformation related to the forward operator like in: The learned summary networks will be merged soon in #82 2.) This isn't anything inherent. The 4d arrays are just a result of our lab applications being mostly on structured images that have width height dimension and 5d tensors for volume inputs. So to hack in 1d inputs with the current code we put the data into the channel dimension and the rest are singletons. It looks like the community is also interested in data with 1 dimension so we are adding dense layers and logic to handle 1 dimension in the PR of dense layers #77 3.) Yes, currently it should work for any parameter and data dimension. I played around with a few without errors but if you run into anything let me know and I might be able to point to a solution quicker. 4.) I cant think of anything that doesnt work off of the top of my head. There is some technical details on the limits of normalizing flows for learning distributions. It has to do with there existing a diffeomorphism between the data distribution and the base distribution but in my experience this is mostly a technical detail and they have been practically used on essentially every type of data application. so AFAIK, the package should work for any marginal distribution and any conditional distribution. The marginal you just need samples x~p(x). But the conditional distributions can be trained with different types of samples depending on the scenario. For the above example (amortized posterior) you need samples from the joint distribution. But you can also do non-amortized Variational inference that only requires a single observation and access to the forward model and its adjoint (if that was part of the process used to make the observation). This method is used in tandem with traditional Amortized variational inference in one of the papers in this package readme: |
Awesome! Thanks for your answers. The approaches you described in your answer to my first question will be very useful to the Julia community because there are so many types of models for which Bayesian inference is challenging. I would like to take you up on your offer to develop some tutorial examples. If you would be willing to adapt your gaussian example to use the summary statistic technique once #82 is merged, I can use it to develop tutorials for different types of models. Here are some initial thoughts:
When all is said and done, it might be worth wrapping these procedures into useful functions and putting them in a separate package in the slimgroup organization. If that turns out to be valuable, I can also assist with that. |
@rafaelorozco, I would like to make an tutorial example for model of discrete data. The constructor for Update It seems like the problem that |
I came to answer the question but you arrived to exactly the correct conclusion in the update! It is also cool that you came up with the hack that I typically use when doing invertible layers on arrays of length 1. Thank you so much for the interest and the cool ideas for examples. Im working with @mloubout to merge the summary network PR soon. We have been using them successfully for a while, just want to make the interface a little bit more user friendly. The network saving and loading is super important so I will make that example right now for the PR. |
Awesome! I look forward to digging in and making some tutorial examples once the summary network is complete. |
One of my goals is to define a user-friendly interface for training and sampling. Maybe with a bit more thought, it can be flexible, general and extendable. Here is the first iteration. See |
I spent some time looking at the code but had a hard time coming up with suggestions. Looks good! I think it is a matter of increasing complexity of examples and seeing where things break and need to be more generalizable. There was an error with BSON so you need to be on that PR for it to work. |
Awesome. Thanks for adding functionality for saving and loading trained networks. This will be very useful! Thanks for looking over the code. I did encounter an issue while trying to develop a multivariate gaussian model. The constructor for I do have another question. When extending the update I forgot to add a link above to the training data: https://github.com/itsdfish/NormalizingFlowsTutorials.jl/blob/066c32a2030bc8c7a67f6d434a1b92a57d1b15f6/sandbox/mv_gaussian.jl#L46 |
aah great catch thank you for bringing that up. I thought I had fixed it but I missed it in that layer. It is a simple problem related to the invertible coupling layer. It is an easy fix but I forgot to also do it for the conditional layers. With regards to y_train, we will need to go to proper 1dim implementation that looks like (nx, n_chan, n_train). Where nx is the main size of the variable (in your case n_obs). and n_chan would be used for multiple channels or 2 variables in this case. Again, this is working for non-conditional layer, just need to port it to conditional layers. Should be able to get it done by tomorrow. Thank you for the very useful input! |
No problem. It's part of the process. Thanks for looking into that bug! I tried restructuring In either case, I was wondering whether there is a general representation that can be used. For example, in the univariate Gaussian example you made, |
To get (n_obs, n_dim, n_train) working You might need to change some of the book keeping that happens in train! and how you make the network (hopefully just adding the kwarg ndim=1 since default is ndim=2). I will take a look and maybe make a PR with the needed changes. The way we implemented it is with 5d, 4d and 3d tensors. 5d is for volumes (nx,ny,nz,n_channel,nbatch) So yes, we agree with your conclusion for 1d arrays. We just call n_dim, n_channel -> (n_obs, n_dim, n_train) -> (nx, n_channel, n_train). We need the 5d and 4d tensors because for structured volume/image data we want the object to maintain its shape and pixel structure as it goes through the network. |
I am slowly starting to understand. Keyword slowly. My understanding is that:
If these are true, then I think the primary
There would be two other methods---one for the 4D case and one for the 5D case. I think this could work, unless one of my assumptions is wrong, or something changes when the summary network is added. If what I have described seems viable, I think the missing piece in my understanding is how the dimensions in |
You can yes, |
Thanks! That is a very elegant solution. This is my first time coming across |
For coupling layers to work, In general, you want x and y to agree on their main dimensionality. For example, if x and y are images you want the images to be of the same size. With summary networks you can do away with this assumption because you can learn a transformation that brings them to the same size (a great advantage of summary networks that isnt talked about as much because not as flashy). With the hacky x_train (1,1,nparams,ntrain) y_train (1,1,nobs,ntrain) things worked because the main dimensionality of the first two dimensions is the same (number of channels can be different without a problem). So when moving to the correct implementation of arrays with 3D tensors So in this case, Im pretty sure we need a summary network that will bring n_obs to be the same size as either n_params or 1 so that you can have dimensions work. So we are back to waiting on me to finish the summary network PR. I will work on it today and should be ready pretty soon! |
Got it. Thank you for explaining how this works. I'm starting to wrap my head around it. I agree with you. It's better to wait until the summary network is finished since it will have different requirements. Awesome! I look forward to the PR whenever it's ready. |
@rafaelorozco, I noticed that you merged the PR for the summary network a few weeks ago. I was wondering if you would be able to help me adapt the Gaussian example (or other example) to be compatible with the summary network? Thanks! |
Hey! Sorry for the late reply I have been busy with a conference. Yes I would be happy to help getting it setup. I tried implementing the invariant layer as a summary network as described in Bayesflow but was not getting good results. Maybe it needs to be a fancier layer like the deepset that they have in their code. To have the amortizednet you need to be on the newest master branch but this is the updated code.
Note that during training we change the number of observations so that it can hopefully learn to generalize over that. |
@rafaelorozco, no problem at all. I completely understand and appreciate your time. Thanks for the example above. I should have time to look through it later today or tomorrow. By the way, I enjoyed reading your pre-print Refining Amortized Posterior Approximations using Gradient-Based Summary Statistics . It looks like you are getting a nice improvement with your new method. |
I can see the problem you mentioned. The posterior distribution does not change much from the prior distribution, even if the sample size is at max (100). I didn't see much information about DeepSets or SetTransformer (the new approach used in BayesFlow) in Julia. I'm not sure if DeepSets and SetTransforms are variations on existing nn architectures in Julia. Do have a sense for whether this is even possible? |
Oh thank you for reading the paper! We will put out a full length journal related to that technique which will bring together a lot of the things we have been working on using this package. I think it is definitely possible to implement the set based NN layers. I looked again at the bayesflow code and the papers they cite for deepset. They seem to be the same in principal to what I was attempting in that code. I suspect that it is a matter of playing with hyperparameters and the architecture. I might ask some people in the lab to look into it and will report back if we have success. |
Hello,
I am interested in performing posterior interference with various simulation models, such as agent based models. I was wondering whether that is possible with your package. One example in Python is BayesFlow. If this is not possible currently, I think it would be a very useful feature to add.
As a simple example, suppose I have a Gaussian model and I want to update the prior on mu and sigma after observing 50 observations. Would that be possible? Here is partial code:
The text was updated successfully, but these errors were encountered: