-
Notifications
You must be signed in to change notification settings - Fork 8
/
Copy pathassignment-04.Rmd
461 lines (366 loc) · 15.7 KB
/
assignment-04.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
# Assignment 4
2021-09-23
**[Assignment 4](`r paste0(CM_URL, "assignment-04.pdf")`)**
## Setup
```{r setup, warning=FALSE, message=FALSE}
knitr::opts_chunk$set(echo = TRUE, comment = "#>", dpi = 300)
for (f in list.files(here::here("src"), pattern = "R$", full.names = TRUE)) {
source(f)
}
library(glue)
library(tidyverse)
theme_set(theme_classic())
light_blue <- "#7AAED1"
dark_blue <- "#011F4B"
```
## Exercise 1. Bioassay model
**In this exercise, you will use a dose-response relation model that is used in Section 3.7.**
**The used likelihood is the same, but instead of uniform priors, we will use a bivariate normal distribution as the joint prior distribution of the parameters $\alpha$ and $\beta$.**
Below is a description of the bioassay from the reading instructions (with some minor changes to help the grammar):
> The example is from Racine *et al.* (1986) (see ref in the end of the BDA3).
> [This] Swiss company makes classification[s] of chemicals to different toxicity categories defined by [governmental] authorities (like [the] EU).
> Toxicity classification is based on [the] lethal dose 50% (LD50) which [indicates] what amount of [a] chemical [that] kills 50% of the subjects.
> [The] [s]maller the LD50, [the] more lethal the chemical is.
> The original paper mentions "1983 Swiss poison Regulation" which defines [the] following categories for chemicals orally given to rats (mg/ml):
| Class | LD50 |
|:-----:|:---------:|
| 1 | <5 |
| 2 | 5-50 |
| 3 | 50-500 |
| 4 | 500-2000 |
| 5 | 2000-5000 |
> To reduce the number of rats needed in the experiments, the company started to use Bayesian methods.
> The paper mentions that in those days, [the] use of just 20 rats to define the classification was very little.
> [The] book gives the LD50 in log(g/ml).
> When the result from demo3_6 is transformed to mg/ml, we see that the mean LD50 is about 900 and $p(500 < LD50 < 2000) \approx 0.99$.
> Thus, the tested chemical can be classified as category 4 toxic.
**a) In the prior distribution for $\alpha, \beta)$, the marginal distributions are $\alpha \sim N(0, 2^2)$ and $\beta \sim N(10, 10^2)$, and the correlation between them is $\text{corr}(\alpha, \beta) = 0.6$.**
**Report the mean (vector of two values) and covariance (two by two matrix) of the bivariate normal distribution.**
The definition of a bivariate normal distribution is
$$
\begin{pmatrix} X_1 \\ X_2 \end{pmatrix}
\sim N
\begin{bmatrix}
\begin{pmatrix} \mu_1 \\ \mu_2 \end{pmatrix} ,
\begin{pmatrix}
\sigma_1^2 & \rho \sigma_2 \sigma_1 \\
\rho \sigma_1 \sigma_2 & \sigma_2^2 \\
\end{pmatrix}
\end{bmatrix}
$$
Because $\alpha$ and $\beta$ are both normal distributions, but the mean vector is
$$
\begin{pmatrix} \mu_\alpha \\ \mu_\beta \end{pmatrix} =
\begin{pmatrix} 0 \\ 10 \end{pmatrix}
$$
and the covariance matrix is
$$
\begin{aligned}
\begin{pmatrix}
\sigma_\alpha^2 & \rho \sigma_\beta \sigma_\alpha \\
\rho \sigma_\alpha \sigma_\beta & \sigma_\beta^2 \\
\end{pmatrix} &=
\begin{pmatrix}
2^2 & 0.6 \times 10 \times 2 \\
0.6 \times 2 \times 10 & 10^2 \\
\end{pmatrix} \\
&= \begin{pmatrix}
4 & 12 \\
12 & 100 \\
\end{pmatrix}
\end{aligned}
$$
```{r}
get_prior_parameters <- function() {
mu_a <- 0
sigma_a <- 2
mu_b <- 10
sigma_b <- 10
rho <- 0.6
cov_value <- rho * sigma_a * sigma_b
mu <- c(mu_a, mu_b)
cov_mat <- matrix(
c(sigma_a^2, cov_value, cov_value, sigma_b^2),
nrow = 2,
byrow = FALSE
)
return(list(mu = mu, cov_mat = cov_mat))
}
sample_from_prior <- function(n) {
params <- get_prior_parameters()
bvn <- MASS::mvrnorm(n, mu = params$mu, Sigma = params$cov_mat)
return(bvn)
}
plot_df <- as.data.frame(sample_from_prior(1e5)) %>%
as_tibble() %>%
set_names("alpha", "beta")
plot_points_and_density <- function(df,
x,
y,
n_samples = 1000,
point_alpha = 0.6,
density_bins = 5) {
df %>%
ggplot(aes(x = {{ x }}, y = {{ y }})) +
geom_point(
data = sample_n(df, size = min(c(nrow(df), n_samples))),
alpha = point_alpha,
color = light_blue
) +
geom_density_2d(bins = density_bins, linetype = 2, color = dark_blue)
}
plot_points_and_density(plot_df, alpha, beta)
```
source: [PennState, Elberl College of Science: "4.2 - Bivariate Normal Distribution"](https://online.stat.psu.edu/stat505/lesson/4/4.2)
**b) You are given 4000 independent draws from the posterior distribution of the model.**
**Report the mean as well as 5% and 95% quantiles separately for both $\alpha$ and $\beta$.**
**Report also the Monte Carlo standard errors (MCSEs) for the mean and quantile estimates.**
**Explain in words what does Monte Carlo standard error mean and how you decided the number of digits to show.**
```{r}
bioassay_posterior <- read_data(
"bioassay_posterior.txt",
read_tsv,
col_names = FALSE
)
colnames(bioassay_posterior) <- c("alpha", "beta")
head(bioassay_posterior)
```
```{r}
p <- plot_points_and_density(
bioassay_posterior,
alpha,
beta,
n_samples = Inf,
point_alpha = 0.3
)
p
```
MCSE accuracy of averages: $s_\theta / \sqrt(S)$ where $s_\theta$ is the standard deviation of the draws and $S$ is the number of draws.
```{r}
apply(bioassay_posterior, 2, function(s) {
sd(s) / sqrt(length(s))
})
```
Calculate the means of the posterior distributions.
```{r}
print("Posterior means:")
apply(bioassay_posterior, 2, mean)
```
Calculate the MCSE of the quantiles.
```{r}
apply(bioassay_posterior, 2, function(x) {
unlist(c(
aaltobda::mcse_quantile(x, 0.05),
aaltobda::mcse_quantile(x, 0.95)
))
})
```
```{r}
print("Posterior 5% and 95% quantiles:")
apply(bioassay_posterior, 2, function(x) {
quantile(x, c(0.05, 0.95))
})
```
Accounting for the precisions of MCSE:
- $\mu_\alpha = 1.0$ with 5% and 95% CI: $(-0.5, 2.6)$
- $\mu_\beta = 10.6$ with 5% and 95% CI: $(4.0, 19)$
MCSE is the expected error from the stochastic process of Monte Carlo simulations.
Here, it is used to determine the number of digits to report where we report the digits where the MCSE is 0.
```{r}
p +
geom_vline(xintercept = 1.0, color = "red") +
geom_ribbon(aes(xmin = -0.5, xmax = 2.6), alpha = 0.1, fill = "red") +
geom_hline(yintercept = 10.6, color = "orange") +
geom_ribbon(aes(ymin = 4.0, ymax = 19), alpha = 0.1, fill = "orange") +
scale_x_continuous(expand = expansion(0)) +
scale_y_continuous(expand = expansion(0))
```
**c) Implement a function for computing the log importance ratios (log importance weights) when the importance sampling target distribution is the posterior distribution, and the proposal distribution is the prior distribution from question a).**
**Explain in words why it’s better to compute log ratios instead of ratios.**
It is better to compute log ratios to avoid overflow or underflow that can occur more easily when using the ratios.
As hinted in the assignment (see PDF), because we are using the prior distribution as the proposal distribution, the calculation of the importance weights is much easier.
The importance weight $w(\theta^s)$ is calculated as follows:
$$
w(\theta^s) = \frac{q(\theta^s|y)}{g(\theta^s)}
$$
With the proposal distribution $g$ set as the prior distribution, $g(\theta) = p(\theta)$, the equation simplifies to:
$$
\begin{aligned}
w(\theta^s) &= \frac{q(\theta^s|y)}{g(\theta^s)} \\
&= \frac{p(\theta^s) p(y|\theta^s)}{p(\theta^s)} \\
&= p(y|\theta^s)
\end{aligned}
$$
Thus, the importance weight is just the likelihood for the model.
In the question(again, see the PDF), we are asked to use `aaltobda::bioassaylp()` to calculate the logarithm of the likelihood.
```{r}
log_importance_weights <- function(a, b) {
bioassay_data <- read_bioassay_data(show_col_types = FALSE)
aaltobda::bioassaylp(
alpha = a,
beta = b,
x = bioassay_data$x,
y = bioassay_data$y,
n = bioassay_data$n
)
}
# Test data.
alpha <- c(1.896, -3.6, 0.374, 0.964, -3.123, -1.581)
beta <- c(24.76, 20.04, 6.15, 18.65, 8.16, 17.4)
test_results <- round(log_importance_weights(alpha, beta), 2)
stop_if_not_all_close_to(
test_results,
c(-8.95, -23.47, -6.02, -8.13, -16.61, -14.57)
)
```
**d) Implement a function for computing normalized importance ratios from the unnormalized log ratios in question c).**
**In other words, exponentiate the log ratios and scale them such that they sum to one.**
**Explain in words what is the effect of exponentiating and scaling so that sum is one.**
The normalized importance weights are just the exponentiated log importance weights divided by the total to sum to one.
This is conducted in `normalized_importance_weights()`, below.
```{r}
normalized_importance_weights <- function(alpha, beta) {
weights <- exp(log_importance_weights(alpha, beta))
return(weights / sum(weights))
}
# Test data.
test_results <- round(
normalized_importance_weights(alpha = alpha, beta = beta), 3
)
stop_if_not_all_close_to(
test_results,
c(0.045, 0.000, 0.852, 0.103, 0.000, 0.000)
)
```
The exponentiation is to put the importance weights into a linear-scale instead of a log-scale in which they are calculated.
They are normalized such that they sum to one so they can be used to scale the values of $\theta$ without changing the mean of $\theta$ for importance sampling.
Also, normalizing to sum to 1 makes them comparable regardless of the values of $\theta$.
**e) Sample 4000 draws of $\alpha$ and $\beta$ from the prior distribution from a).**
**Compute and plot a histogram of the 4000 normalized importance ratios.**
**Use the functions you implemented in c) and d).**
```{r}
prior_draws <- sample_from_prior(4000)
norm_importance_ratios <- normalized_importance_weights(
alpha = prior_draws[, 1],
beta = prior_draws[, 2]
)
plot_single_hist(
x = norm_importance_ratios, bins = 30, alpha = 0.3, color = "black"
) +
labs(x = "normalized importance ratios")
```
**f) Using the importance ratios, compute the importance sampling effective sample size $S_\text{eff}$ and report it.**
From equation 10.4 in *BDA3*:
$$
S_\text{eff} = \frac{1}{\sum^S (\tilde{w}(\theta^s))^2}
$$
where $\tilde{w}(\theta^s)$ are the *normalized* importance weights.
```{r}
S_eff <- function(alpha, beta) {
w_tilde <- normalized_importance_weights(alpha, beta)
s_eff <- 1 / sum(w_tilde^2)
return(s_eff)
}
# Test data.
test_results <- round(
S_eff(alpha = alpha, beta = beta), 3
)
stop_if_not_close_to(test_results, 1.354)
```
```{r}
bioassy_s_eff <- S_eff(bioassay_posterior$alpha, bioassay_posterior$beta)
glue("S_eff for posterior draws of bioassy model: {round(bioassy_s_eff, 3)}")
```
**g) Explain in your own words what the importance sampling effective sample size represents.**
**Also explain how the effective sample size is seen in the histogram of the weights that you plotted in e).**
The importance sampling effective sample size indicates how well the samples from the importance sampling cover the density of the target distribution $q(\theta|y)$.
Ideally, the weights are uniform, resulting in a $S_\text{eff}$ near the true number of draws (i.e. 4000 in this case).
If there are some samples with very high importance, then $S_\text{eff}$ gets smaller to indicate that the sampling did accurately cover the target distribution.
```{r}
prior_s_eff <- S_eff(prior_draws[, 1], prior_draws[, 2])
glue("S_eff for prior draws of bioassy model: {round(prior_s_eff, 3)}")
```
The $S_\text{eff}$ for the prior draws is calculated above.
We can see that even though there were 4000 draws, the effective sampling size is about `r round(prior_s_eff)`.
This means that the prior samples do no accurately sample from the target distribution.
We can see this in the histogram with most importance weights near 0 and a thick tail of higher values.
This is also shown in the plot below of the prior samples colored by their importance ratios.
There is a density of higher importance ratios in one region of the samples, likely indicating where there is a lot of posterior density for the model.
```{r}
p <- tibble(
alpha = prior_draws[, 1],
beta = prior_draws[, 2],
norm_imp_ratio = norm_importance_ratios
) %>%
ggplot(aes(x = alpha, y = beta, color = norm_imp_ratio)) +
geom_point(size = 1, alpha = 0.5) +
scale_color_viridis_b() +
labs(x = "alpha", y = "beta", color = "normalized\nimportance\nratio")
p
```
**h) Implement a function for computing the posterior mean using importance sampling, and compute the mean using your 4000 draws.**
**Explain in your own words the computation for importance sampling.**
**Report the means for $\alpha$ and $\beta$, and also the Monte Carlo standard errors (MCSEs) for the mean estimates.**
**Report the number of digits for the means based on the MCSEs.**
**Hint. Use the same equation for the MCSE of $\text{E}[\theta]$ as earlier $\sqrt{\text{Var}[\theta]/S}$, but now replace $S$ with $S_\text{eff}$.**
**To compute $\text{Var}[\theta]$ with importance sampling, use the identity $\text{Var}[\theta] = \text{E}[\theta^2] − \text{E}[\theta]^2$.**
The posterior mean (expectation) of $\theta = (\alpha, \beta)$ can be computed using importance sampling using the following equation (eq. 10.3 in *BDA3*):
$$
\text{E}(h(\theta|y)) = \frac{\frac{1}{S} \sum^S h(\theta^s) w(\theta^s)}{\frac{1}{S} \sum^S w(\theta^s)} \\
w(\theta^s) = \frac{q(\theta^s|y)}{g(\theta^s)}
$$
where $w(\theta^s)$ is the importance weight calculated as the quotient of the target distribution $q$ and the proposal distribution $g$.
The importance weights are calculated in `log_importance_weights()` in question c).
In this case, $h(\theta|y)$ is the posterior distribution of $\theta$ and $\text{E}(h(\theta|y))$ is the mean of the posterior distribution of $\theta$.
$h(\theta)$ is the prior distribution.
Thus, the algorithm is weight prior draws by their importance weight.
```{r}
posterior_mean <- function(alpha, beta) {
weights <- exp(log_importance_weights(alpha, beta))
theta <- tibble(alpha = alpha, beta = beta)
# Remove values that have NA weights (lazy solution).
if (any(is.na(weights))) {
num_missing <- sum(is.na(weights))
warning(glue("NA weights - removing {num_missing} sample(s)."))
theta <- theta[!is.na(weights), ]
weights <- weights[!is.na(weights)]
}
denominator <- sum(weights)
mu_theta_post <- apply(theta, 2, function(x) {
sum(x * weights) / sum(weights)
})
return(mu_theta_post)
}
test_results <- round(posterior_mean(alpha = alpha, beta = beta), 3)
stop_if_not_all_close_to(test_results, c(0.503, 8.275))
```
Importance sampling is really just a method for weighting possible values of $\theta$ by their importance ratio.
In this case, because we used the prior distribution at the proposal distribution, the importance ratio is the probability of $\theta$ under the unnormalized posterior distribution.
Thus, this procedure weights possible values of $\theta$ by how likely they are given the posterior distribution.
```{r}
prior_a <- prior_draws[, 1]
prior_b <- prior_draws[, 2]
post_means <- posterior_mean(prior_a, prior_b)
post_means
```
As stated in the hint with the question, we can calculate the MCSE of $\text{E}[\theta]$ as $\sqrt{\text{Var}[\theta]/S},$, using $S_\text{eff}$ for $S$, and $\text{Var}[\theta]$ as $\text{E}[\theta^2] − \text{E}[\theta]^2$
```{r}
e_theta_squared <- posterior_mean(prior_a^2, prior_b^2)
var_theta <- abs(e_theta_squared - (post_means^2))
s_eff <- S_eff(prior_a, prior_b)
mcse <- sqrt(var_theta / s_eff)
mcse
```
```{r}
a_post_mean <- round(post_means[[1]], 2)
b_post_mean <- round(post_means[[2]], 0)
glue("alpha post mean: {a_post_mean}, beta post mean: {b_post_mean}")
```
The posterior means for $\alpha$ and $\beta$ are indicated by the red dot over-layed on the distribution of importance weights from a previous question.
```{r}
p + geom_point(x = a_post_mean, y = b_post_mean, color = "red", shape = 4, size = 3)
```
---
```{r}
sessionInfo()
```