examples/restrictedVAR.Rmd

---
title: "Restricted Vector Autoregression"
author: "Benjamin Christoffersen"
date: "`r Sys.Date()`"
output: html_document
bibliography: bibliography.bib
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
  echo = TRUE, 
  fig.height = 4, fig.width = 7, dpi = 128,
  cache.path = "cache/restrictedVAR-cache/", fig.path = "fig/restrictedVAR-fig/", 
  error = FALSE)
options(digits = 4, scipen = 10, width = 70)
```

```{r set_cols, echo = FALSE}
palette(
  c("#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", 
    "#D55E00", "#CC79A7"))
```


We will simulate and estimate a first order vector auto-regression model in 
this example using the particle filter and smoother. For details see 
[this vignette](../vignettes/Particle_filtering.pdf) which can also be found by 
calling `vignette("Particle_filtering", package = "dynamichazard")`. The models 
we are going to simulate from and estimate are of the form 

$$
\begin{split}
 y_{it} &\sim g(\cdot\vert\eta_{it}) &  \\
 	\eta_{it} &= x_i\alpha_{th(i)} + \beta_0 +  z_i\beta_{h(i)} +  o_{it} \\
 	\vec{\alpha}_t &= F\vec{\alpha}_{t - 1} + R\vec{\epsilon}_t &
 		\quad \vec{\epsilon}_t \sim N(\vec{0}, Q) \\
	& &	\quad \vec{\alpha}_0 \sim N(\vec{a}_0, Q_0)
\end{split}, \qquad
\begin{array}{l} i \in \{1, \dots, n\} \\ t = 1, \dots, d \end{array}
$$

where the $y_{it}$ is individual $i$'s indicator at time $t$ for whether he dies 
between time $(t - 1, t]$ and $h:\,\{1,\dots n\} \rightarrow \{1,\dots,4\}$ 
returns the group index individual $i$ belongs to. The indicators, $y_{it}$, 
are binomial distributed with the complementary log-log link function conditional on 
knowing the log time at risk, $o_{it}$, covariates, $x_t$ and $z_t$, and latent states, $\vec{\alpha}_1,\dots,\vec{\alpha}_d$. 
The total survival time of individual $i$ is $T_i$ which is 
piecewise constant exponentially distributed
conditional on knowing the latent states. Further, we set $z_i = x_i$ so the 
states have a non-zero mean. The true values are 

$$
F = \begin{pmatrix}
    \theta_1 & . & . & \theta_3 \\ 
    . & \theta_2 & . & . \\
    . & . & \theta_2 & . \\
    \theta_3 & . & . & \theta_1 \end{pmatrix}, \quad
Q = \begin{pmatrix}
      0.49 & . & 0.175 & . \\
      . & 0.49 & . & . \\
      0.175 & . & 0.25 & . \\ 
      . & . & . & 0.25 
\end{pmatrix}, \quad 
R = I_4
$$

$$
\vec{\beta} = (-6.5, -1, -0.5, 0.5, 1)^\top, \quad
\vec{\theta} =(0.66, 0.8, 0.2)^\top
$$

```{r assign_kit_info, echo = FALSE}
git_key <- system("git rev-parse --short HEAD", intern = TRUE)
git_bra <- system("git branch", intern = TRUE)
regexp <- "^(\\*\ )(.+)$"
git_bra <- git_bra[grepl(regexp, git_bra)]
git_bra <- gsub(regexp, "\\2", git_bra)
```

where $I_4$ is the four-dimensional identity matrix, $.$ is used instead of $0$ 
to put emphasis on the non-zero elements, and 
$\vec{a}_0$ and $Q_0$ are given by the invariant distribution. The unknown 
parameters to be estimated is everything but $Q_0$ and $R$ (since we fix $Q_0$ doing the estimation and we set $\vec{a}_0 = (0, 0,0,0)^\top$). This 
example is run on the git branch "`r git_bra`" with ID "`r git_key`". The code
can be found on 
[the Github site for the package](../examples).
All functions which assignments are not shown and are not in the 
`dynamichazard` package can be found on the Github site.

We are going to estimate the parameters in an unrestricted model where we 
estimate the $Q$ and $F$ and a restricted model where we estimate 
$\vec{\theta}$, parameters for standard deviations which we denote by 
$\vec{\psi}$, and parameters for the correlations which we denote by 
$\vec{\phi}$. In the restricted case, we let  

$$
\begin{align*}
\text{vec}(R^+F) &= G\vec{\theta} \\
Q &= VCV 
  = \begin{pmatrix}
      \sigma_1 & . & . & . \\
      . & \sigma_2 & . & . \\
      . & . & \sigma_3 & . \\ 
      . & . & . & \sigma_4 
    \end{pmatrix}
    \begin{pmatrix}
      1 & \rho_{21} & \rho_{31} & \rho_{41} \\
      \rho_{21} & 1 & \rho_{32} & \rho_{42} \\
      \rho_{31} & \rho_{32} & 1 & \rho_{43} \\ 
      \rho_{41} & \rho_{42} & \rho_{43} & 1 
    \end{pmatrix}
    \begin{pmatrix}
      \sigma_1 & . & . & . \\
      . & \sigma_2 & . & . \\
      . & . & \sigma_3 & . \\ 
      . & . & . & \sigma_4 
    \end{pmatrix} \\
\sigma_i &= \exp(s_i), \qquad
    (s_1,s_2,s_3,s_4)^\top = J (\psi_1, \psi_2)^\top \\ 
\rho_{ij} &= \frac{2}{1 + \exp(-q_{ij})} - 1, \qquad
(q_{21}, q_{31}, q_{41},q_{32}, q_{42},q_{43})^\top =K\phi_1
\end{align*}
$$

where $\text{vec}(\cdot)$ is the [vectorization function](https://en.wikipedia.org/wiki/Vectorization_(mathematics)) and $G$, $J$,
and $L$ are 

$$
\begin{split}
G &= \begin{pmatrix} 
    1 & . & . & . & . & . & . & . & . & . & . & . & . & . & . & 1 \\
    . & . & . & . & . & 1 & . & . & . & . & 1 & . & . & . & . & . \\
    . & . & . & 1 & . & . & . & . & . & . & . & . & 1 & . & . & .
\end{pmatrix}^\top \\
J &= \begin{pmatrix} 
    1 & 1 & . & .  \\
    . & . & 1 & 1 
\end{pmatrix}^\top \\
K &= \begin{pmatrix} \cdot & 1 & \cdot& \cdot& \cdot & \cdot \end{pmatrix}^\top
\end{split}
$$

## Simulation

```{r load_dynam}
library(dynamichazard)
```

We start by simulating the data. Feel free to skip this part as the specifications 
are given above. First we assign the parameters for the simulation

```{r set_params}
# assign G, theta, and F
G <- matrix(0., 4^2, 3)
idx <- 1 + (0:(4 - 1)) * 4 + 0:(4 - 1)
G[idx[-(2:3)], 1] <- 1
G[idx[  2:3 ], 2] <- 1
G[c(4, 13)   , 3] <- 1
theta <- c(.66, .8, .2)
(F. <- matrix(as.vector(G %*% theta), 4, 4))

# assign J, K, psi, phi, and Q
J <- matrix(0., 4, 2)
J[1:2, 1] <- J[3:4, 2] <- 1
psi <- log(sqrt(c(.49, .25)))
K <- matrix(0., 6, 1)
K[2, 1] <- 1
phi <- log(- (.5 + 1) / (.5 - 1))

V <- diag(exp(drop(J %*% psi)))
C <- diag(1, ncol(V))
C[lower.tri(C)] <- 2/(1 + exp(-drop(K %*% phi))) - 1
C[upper.tri(C)] <- t(C)[upper.tri(C)]
(Q <- V %*% C %*% V)

# assign Q_0 and beta
Q_0  <- get_Q_0(Q, F.)
beta <- c(-6.5, -1, -0.5, 0.5, 1)
```

`get_Q_0` is a function to get the covariance matrix for the invariant distribution.
Then we simulate and plot the latent states

```{r sim_plot_latent}
set.seed(54432125)
n_periods <- 200
alphas <- matrix(nrow = n_periods + 1, ncol = 4)
alphas[1, ] <- rnorm(4) %*% chol(Q_0)
for(i in 1:n_periods + 1)
  alphas[i, ] <- F. %*% alphas[i - 1, ] + drop(rnorm(4) %*% chol(Q))

alphas <- t(t(alphas) + beta[-1])

# plot of latent variables
matplot(alphas, type = "l", lty = 1)
for(i in 1:4)
  abline(h = beta[i + 1], lty = 2, col = i)
```

We simulate the observations as follows

```{r sim_obs}
n_obs <- 1000
df <- sapply(1:n_obs, function(i){
  # find the group
  grp <- (i - 1L) %/% (n_obs / 4L) + 1L
  
  # left-censoring
  tstart <- max(0L, sample.int((n_periods - 1L) * 2L, 1) - n_periods + 1L)

  # covariates
  x <- runif(1, 0, 2)

  # outcome (stop time and event indicator)
  y <- FALSE
  for(tstop in (tstart + 1L):n_periods){
    fail_time <- rexp(1) / exp(beta[1] + x * alphas[tstop + 1L, grp])
    if(fail_time <= 1){
      y <- TRUE
      tstop <- tstop - 1L + fail_time
      break
    }
  }

  c(tstart = tstart, tstop = tstop, x = x, y = y, grp = grp)
})
df <- data.frame(t(df))
df$grp <- factor(df$grp)

# prepare data. Needed to avoid error in `ddFixed` 
df <- within(df, {
  x1 <- x * (grp == 1)
  x2 <- x * (grp == 2)
  x3 <- x * (grp == 3)
  x4 <- x * (grp == 4)
})
```

We left-censor the observations since we otherwise may end up with a low number
of observations towards the end. We show a few properties of the sample

```{r show_sims}
# how many die in each group
xtabs(~ grp + y, df)

# at what time do we have events?
tmp <- aggregate(y ~ grp + as.integer(df$tstop), df, sum, subset = df$y == 1)
tmp$time <- tmp$`as.integer(df$tstop)` + 1L
plot(c(1, n_periods), range(tmp$y, 0), type = "n", xlab = "time", 
     ylab = "# events")
for(i in 1:4){
  dat <- subset(tmp, grp == i)
  x <- dat$time + .2 * (i - 2.5)
  segments(x, rep(0, length(x)), x, dat$y, col = i)
}
```

### Model without latent variables
We can fit a model without the latent variables (i.e., a constant coefficient
model) as follows

```{r survreg_fit}
surv_fit <- survreg(Surv(tstop - tstart, y) ~ x1 + x2 + x3 + x4, df, 
                    dist = "exponential")
summary(surv_fit) # signs are flipped
logLik(surv_fit)
```

The signs are flipped as stated in `help("survreg")`. We can compare the 
log-likelihood with this models with the log-likelihood approximation we get 
from the particle filter in the next section.

## Particle filter and smoother
We use the generalized two-filter smoother from @fearnhead10 were we estimate 
the full $F$ and $Q$ matrix.

```{r confs, cache = 1}
n_threads <- 6 

# you can replace this with e.g.,  
# max(parallel::detectCores(logical = FALSE), 2)))

# control object
dd_ctrl <- PF_control(
  N_fw_n_bw = 200, N_smooth = 500, N_first = 1000, eps = 1e-4,
  method = "AUX_normal_approx_w_cloud_mean", 
  n_max = 500, smoother = "Fearnhead_O_N", averaging_start = 150L,
  nu = 6L, covar_fac = 1.1, n_threads = n_threads)
```

<!-- 
  knitr::opts_knit$set(output.dir = ".")
  knitr::load_cache("pf_first", "pf_Fear", path = "examples/cache/restrictedVAR-cache/") 
-->

```{r pf_first, cache = 1, dependson = c("assign_sim_params", "confs")}
set.seed(30520116)
system.time(pf_Fear <- PF_EM(
  Surv(tstart, tstop, y) ~ ddFixed_intercept() + x1 + x2 + x3 + x4 + 
    ddFixed(x1) + ddFixed(x2) + ddFixed(x3) + ddFixed(x4), 
  df, Q_0 = diag(1, 4), Q = diag(1, 4), Fmat = diag(.1, 4), 
  by = 1, type = "VAR", model = "exponential", max_T = n_periods,
  control = dd_ctrl))
```

`system.time` is used to show the computation time. The estimates are

```{r show_ests}
pf_Fear$Q
show_covar <- function(Q){
  cat("Standard deviations\n")
  print(sqrt(diag(Q)))
  cat("Lower correlation matrix\n")
  tmp <- cov2cor(Q)
  tmp[upper.tri(tmp, diag = TRUE)] <- NA_real_
  print(tmp[-1, -ncol(tmp)], na.print = "")
}
show_covar(pf_Fear$Q)
show_covar(Q) # the true values

pf_Fear$F
pf_Fear$fixed_effects # beta
```

We can plot the smoothed state estimates as follows

```{r plot_smooth, fig.width = 10, fig.height = 6.66}
tmp <- t(t(alphas) - beta[-1])[-1, ]
par(mfcol = c(2, 2), mar = c(5, 4, 1, 1))
for(i in 1:4){
  plot(pf_Fear, qlvls = c(0.025, 0.975), ylim = range(tmp),
       cov_index = i, col = i)
  points(1:nrow(tmp), tmp[, i], pch = 16, col = i)
}
```

The crosses are the point-wise confidence bounds, the lines are the smoothed 
means, and the full dots are the actual values. The approximate log-likelihood at 
each iteration of the EM algorithm are

```{r show_log_like}
logLik(pf_Fear)
plot(pf_Fear$log_likes, type = "l", ylab = "log-likelihood")
# last elements
plot(tail(pf_Fear$log_likes, 50), type = "l", ylab = "log-likelihood")
```

### Restricted model 
We fit the model where we only estimate $\vec{\theta}$, $\vec{\psi}$, and $\vec{\phi}$ as 
follows

<!-- 
  knitr::opts_knit$set(output.dir = ".")
  knitr::load_cache("pf_restrict", "pf_Fear_restric", path = "examples/cache/restrictedVAR-cache/") 
-->

```{r pf_restrict, cache = 1, dependson = c("assign_sim_params", "confs")}
set.seed(30520116)
pf_Fear_restric <- PF_EM(
  Surv(tstart, tstop, y) ~ ddFixed_intercept() + x1 + x2 + x3 + x4 + 
    ddFixed(x1) + ddFixed(x2) + ddFixed(x3) + ddFixed(x4), 
  df, Q_0 = diag(1, 4), G = G, J = J, theta = c(.1, .1, 0), K = K, 
  psi = c(0, 0), phi = 0,
  by = 1, type = "VAR", model = "exponential", max_T = n_periods,
  control = dd_ctrl)
```

Notice that we pass `G`, `J`, `K`, `theta`, `psi`, and `phi` instead of `Q` and `Fmat`. 
The estimates are

```{r pf_Fear_restric_ests}
pf_Fear_restric$Q
show_covar(pf_Fear_restric$Q)

pf_Fear_restric$F
pf_Fear_restric$fixed_effects # beta
```

The smoothed state estimates looks as follows

```{r pf_Fear_restric_plot_smooth, fig.width = 10, fig.height = 6.66}
tmp <- t(t(alphas) - beta[-1])[-1, ]
par(mfcol = c(2, 2), mar = c(5, 4, 1, 1))
for(i in 1:4){
  plot(pf_Fear_restric, qlvls = c(0.025, 0.975), ylim = range(tmp),
       cov_index = i, col = i)
  points(1:nrow(tmp), tmp[, i], pch = 16, col = i)
}
```

The approximate log-likelihood at each iteration of the EM algorithm are

```{r pf_Fear_restric_show_log_like}
logLik(pf_Fear_restric)
plot(pf_Fear_restric$log_likes, type = "l", ylab = "log-likelihood")
# last elements
plot(tail(pf_Fear_restric$log_likes, 50), type = "l", ylab = "log-likelihood")
```

We can get a better estimate of the log-likelihood at the final estimate for 
the two models by running the particle filter with more particles as follows

```{r better_est_log_like_before, echo = FALSE}
old_dig <- getOption("digits")
options(digits = 7)
```

```{r better_est_log_like, cache = 1, dependson = c("pf_restrict", "pf_first")}
logLik(PF_forward_filter(pf_Fear        , N_fw = 10000, N_first = 10000)) 
logLik(PF_forward_filter(pf_Fear_restric, N_fw = 10000, N_first = 10000))
```

```{r better_est_log_like_after, echo = FALSE}
options(digits = old_dig)
```

We expect the former to have a higher log-likelihood since we have 
$4^2 + 4(4 + 1) / 2 - 2 \cdot 3 = 20$ more parameters in the former model. 

### Misspecified model
Let suppose that we think an AR(1) process is appropriate for each state variable 
and the state variables are independent. That is, we want to estimate 

$$
F = \begin{pmatrix}
    \theta_1 & . & . & . \\ 
    . & \theta_2 & . & . \\
    . & . & \theta_3 & . \\
    . & . & . & \theta_4 \end{pmatrix}, \quad
Q = \begin{pmatrix}
    \exp(2\psi_1) & . & . & . \\
    . & \exp(2\psi_2) & . & . \\
    . & . & \exp(2\psi_3) & . \\ 
    . & . & . & \exp(2\psi_4) 
    \end{pmatrix}
$$
We can estimate such a model by making the following calls

<!-- 
  knitr::opts_knit$set(output.dir = ".")
  knitr::load_cache("est_miss", "pf_Fear_miss", path = "examples/cache/restrictedVAR-cache/") 
-->

```{r est_miss, cache = 1, dependson = c("assign_sim_params", "confs")}
# setup G and J
G_miss <- matrix(0., 4^2, 4)
for(i in 0:3)
  G_miss[1 + i * 4 + i, i + 1] <- 1
J_miss <- diag(4)
K_miss <- matrix(nrow = 4 * (4 - 1) / 2, ncol = 0)

# estimate
set.seed(30520116)
pf_Fear_miss <- PF_EM(
  Surv(tstart, tstop, y) ~ ddFixed_intercept() + x1 + x2 + x3 + x4 + 
    ddFixed(x1) + ddFixed(x2) + ddFixed(x3) + ddFixed(x4), 
  df, Q_0 = diag(1, 4), G = G_miss, J = J_miss, K = K_miss, 
  theta = rep(.1, 4), psi = rep(0, 4), phi = vector(),
  by = 1, type = "VAR", model = "exponential", max_T = n_periods,
  control = dd_ctrl)
```

The estimates from this model are

```{r pf_miss_ests}
pf_Fear_miss$Q
show_covar(pf_Fear_miss$Q)
pf_Fear_miss$F
pf_Fear_miss$fixed_effects
logLik(pf_Fear_miss)
```

An approximation of the log-likelihood is

```{r pf_miss_log_likeli_before, echo = FALSE}
old_dig <- getOption("digits")
options(digits = 7)
```

```{r pf_miss_log_likeli, cache = 1, dependson = "est_miss"}
logLik(PF_forward_filter(pf_Fear_miss, N_fw = 10000, N_first = 10000)) 
```

```{r pf_miss_log_likeli_after, echo = FALSE}
options(digits = old_dig)
```

# References