Exercise in data analysis for CosmoLab
Goals for this excercise:
- Practice programming & organizing code
- Practice collaborating using git & github
- Leran about (or gain more understanding about) probabilistic modeling
Each group should work together to do the following, for one (or both) of the two datasets. Some of these steps have specific associated functions that should be implemented, as indicated below; other steps are more open-ended.
- Load & visualize the data. (Figure it out!)
- Come up with a parameterized physical analytic model that could describe the data in the absence of uncertainty.
- Plot a few different versions of this physical model (for a few different fixed values of the parameters) on top of the data.
- Write down a probabilistic generative model to describe the data.
- Implement a likelhood function according to this generative model.
- Define the prior probability distribution function for your parameter space.
- Overplot your data with N versions of your physical model evaluated with N samples from your prior.
- Implement a posterior probability function.
- Find the values of your model parameters that maximize the posterior probability, and make a model/data plot using those parameters.
- Generate samples from your posterior according to an MCMC package of your choice.
- Visualize the distribution and correlations of your posterior samples.
- Overplot your data with N versions of your physical model evaluated with N samples from your posterior.
Some of the above steps should be encapsulated into the following functions you should implement, in a submodule named according to your group, which you should submit to this repository via a pull request, such that anyone could execute the following code from the top level of the repo:
import analysis_group_X as analysis
analysis.plot_data()
analysis.plot_model_exploration()
analysis.plot_model_prior_samples(N=25)
analysis.plot_model_max_posterior()
samples = analysis.sample_posterior()
analysis.plot_posterior_samples(samples)
analysis.plot_model_posterior_samples(samples, N=25)
The master.ipynb notebook contains such cells, and can be used to test and experiment. However, please only submit the code in your group's directory in your pull request, not any notebooks.