Skip to content
This repository has been archived by the owner on May 29, 2018. It is now read-only.
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: TheoreticalEcosystemEcology/BayesSummer
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: master
Choose a base ref
...
head repository: EcoNumUdS/Bayes2017
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: master
Choose a head ref
Able to merge. These branches can be automatically merged.
  • 3 commits
  • 3 files changed
  • 2 contributors

Commits on Jun 21, 2018

  1. Copy the full SHA
    1a2c7b8 View commit details
  2. removed redundant pdf

    mtalluto committed Jun 21, 2018
    Copy the full SHA
    b4f86ef View commit details

Commits on Aug 21, 2018

  1. Merge pull request #2 from mtalluto/MissingDay5

    Missing day5
    SteveViss authored Aug 21, 2018
    Copy the full SHA
    151ad99 View commit details
Showing with 333 additions and 0 deletions.
  1. +304 −0 Theory/Day5.Rmd
  2. BIN Theory/Image/fig_d5_param_estimates.pdf
  3. +29 −0 Theory/metrop-mcmc.r
304 changes: 304 additions & 0 deletions Theory/Day5.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,304 @@
---
title: "Model Comparison"
author: Matthew Talluto
institute: CNRS
date: August 18, 2017
documentclass: eecslides
babel-lang: english
output:
beamer_presentation:
highlight: tango
includes:
in_header: header.tex
# Rscript -e 'rmarkdown::render("Day5.Rmd", "all")'
# (am: 1h20 + 1h20 / pm 2h + 15min)
---

*Douter de tout ou tout croire sont deux solutions également commodes, qui nous dispensent de réfléchir.*

\hspace{1cm}--Henri Poincaré






# Introduction to Model Comparison

Why compare models?

\pause

* All models are imperfect
* How good is our model \emph{given the modelling goals?}



# Comparing models

Before beginning, evaluate the goals of the comparison

* Predictive performance
* Hypothesis testing
* Reduction of overfitting

If you are asking yourself, "should I use A/B/DIC?"

Remember Betteridge's law...

\pause

> Any headline that ends in a question mark can be answered with the word "NO"



# Informal model comparison
![](Image/fig_d5_param_estimates.pdf)



# Comparison through evaluation

If the goal is predictive performance, evaluate directly.

* Cross-validation
* k-fold cross validation

Cost: can be computationally intensive (especially for Bayesian). But you are already paying this cost (you ARE evaluating your models, right?)

\pause

Requires selecting an evaluation score

* ROC/TSS (classification)
* RMSE (continuous)
* Goodness of fit
* ...


# Bayesian predictive performance

Consider a regression model

\begin{align*}
pr(\theta | y, x) & \propto pr(y, x, | \theta) pr(\theta) \\
y & \sim \mathcal{N}(\alpha + \beta x, \sigma)
\end{align*}

From a new value $\hat{x}$ we can compute a posterior prediction $\hat{y} = \alpha + \beta x$




# Bayesian predictive performance

We can then compute the \emph{log posterior predictive density} (lppd):

\begin{align*}
lppd = pr(\hat{y} | \theta)
\end{align*}

\pause

Where is the prior?





# Bayesian predictive performance

We want to summarize lppd taking into account:

* an entire set of prediction points $\hat{x} = \{x_1, x_2, \dots x_n \}$
* the entire posterior distribution of $\theta$
* (or, realistically, a set of $S$ draws from the posterior distribution)

\pause

\begin{align*}
lppd = \sum_{i=1}^n \log \left ( \frac{1}{S} \sum_{s=1}^S pr(\hat{y} | \theta^s) \right )
\end{align*}

\pause

To compare two competing models $\theta_1$ and $\theta_2$, simply compute $lppd_{\theta_1}$ and $lppd_{\theta_2}$, the "better" model (for prediction) is the one with a larger lppd.



# Information criteria

What do we do when $\theta_1$ and $\theta_2$ are very different?

\pause

Considering the lpd (using the calibration data), it can be proven, when $\theta_2$ is \emph{strictly nested} within $\theta_1$, that $lpd_{\theta_1} > lpd_{\theta_2}$.

\pause

Thus, we require a method for penalizing the larger (or more generally, more flexible) model to avoid simply overfitting, especially when validation data are unavailable.



# AIC

\begin{align*}
AIC = 2k - 2 \log pr(\hat{x | \theta})
\end{align*}

* $pr(\hat{x | \theta}) = \max (pr(x | \theta))$ and $k$ is the number of parameters.
* AIC increases as the model gets worse or the number of parameters gets larger
* $- 2 \log pr(\hat{x | \theta})$ is sometimes referred to as *deviance*

\pause

What is the number of parameters in a hierarchical model?


# DIC

\begin{align*}
D(\theta) & = -2 \log (pr(x | \theta))
\end{align*}

\pause

We still penalize the model based on complexity, but we must estimate how many *effective* parameters there are:

\begin{align*}
p_D & = \mathbb{E}[D(\theta)] - D( \mathbb{E}[\theta])
\end{align*}

\pause

\begin{align*}
DIC & = D( \mathbb{E}[\theta]) + 2p_D
\end{align*}




# DIC
**Pros:**

* Easy to estimate
* Widely used and understood
* Effective for a variety of models regardless of nestedness or model size

**Cons**

* Not Bayesian
* Assume $\theta \sim \mathcal{MN}$
* Modest computational cost




# Bayes factor

Consider two competing models $\theta_1$ and $\theta_2$

In classical likelihood statistics, we can compute the likelihood ratio:

\begin{align*}
LR = \frac{MLE(X | \theta_1)}{MLE(X | \theta_2)}
\end{align*}

\pause

A fully Bayesian approach is to take into account the entire posterior distribution of both models:

\begin{align*}
K = \frac{pr(\theta_1 | X)}{pr(\theta_2 | X)}
\end{align*}





# Bayes factor

For a single posterior estimate of each model:

\begin{align*}
K & = \frac{pr(\theta_1 | X)}{pr(\theta_2 | X)} \\
& = \frac{pr(X | \theta_1 ) pr(\theta_1)}{pr(X | \theta_2) pr(\theta_2)}
\end{align*}

# Bayes factor

To account for the entire distribution:
\begin{align*}
K & = \frac{\int pr(\theta_1 | X) d\theta_1}{\int pr(\theta_2 | X)d\theta_2} \\
& = \frac{\int pr(X | \theta_1 ) pr(\theta_1)d\theta_1}{\int pr(X | \theta_2) pr(\theta_2)d\theta_2}
\end{align*}






# And others
* Bayesian model averaging
* Reversible jump MCMC



# Software

```{r, fig.width =4, fig.height=6}
library(mcmc)
suppressMessages(library(bayesplot))
logposterior <- function(params, dat)
{
if(params[2] <= 0)
return(-Inf)
mu <- params[1]
sig <- params[2]
lp <- sum(dnorm(dat, mu, sig, log=TRUE)) +
dnorm(mu, 16, 0.4, log = TRUE) +
dgamma(sig, 1, 0.1, log = TRUE)
return(lp)
}
```

# Software

```{r, fig.width =4, fig.height=6}
X <- c(15, 19.59, 15.06, 15.71, 14.65, 21.4, 17.64, 18.31,
15.12, 14.40)
inits <- c(5, 2)
tuning <- c(1.5, 0.5)
model <- metrop(logposterior, initial = inits,
nbatch = 10000, dat = X, scale = tuning)
model$accept
colnames(model$batch) = c('mu', 'sigma')
colMeans(model$batch)
```


# Software

```{r, fig.width =6, fig.height=4, echo=FALSE}
mcmc_dens(model$batch)
```

# Software

```{r, fig.width =6, fig.height=4, echo=FALSE}
mcmc_trace(model$batch)
```


# Other software

* mcmc
* LaplacesDemon
* JAGS
* Stan
Binary file added Theory/Image/fig_d5_param_estimates.pdf
Binary file not shown.
29 changes: 29 additions & 0 deletions Theory/metrop-mcmc.r
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
library(mcmc)


logposterior <- function(params, dat)
{
if(params[2] <= 0)
return(-Inf)

mu <- params[1]
sig <- params[2]

lp <- sum(dnorm(dat, mu, sig, log=TRUE)) + dnorm(mu, 16, 0.4, log = TRUE) + dgamma(sig, 1, 0.1, log = TRUE)
return(lp)
}

X <- c(15, 19.59, 15.06, 15.71, 14.65, 21.4, 17.64, 18.31, 15.12, 14.40)
inits <- c(5, 2)
tuning <- c(1.5, 0.5)


model <- metrop(logposterior, initial = inits, nbatch = 10000, dat = X, scale = tuning)

model$accept
colnames(model$batch) = c('mu', 'sigma')
colMeans(model$batch)
library(bayesplot)
mcmc_dens(model$batch)
mcmc_intervals()
mcmc_trace(model$batch)