examples/firstOrderVAR.Rmd

---
title: "First Order Vector Autoregression"
author: "Benjamin Christoffersen"
date: "`r Sys.Date()`"
output: html_document
bibliography: bibliography.bib
nocite: | 
  @kingma14
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
  echo = TRUE, 
  fig.height = 4, fig.width = 7, dpi = 128,
  cache.path = "cache/firstOrderVAR-cache/", fig.path = "fig/firstOrderVAR-fig/", 
  error = FALSE)
options(digits = 4, scipen = 10, width = 70)
```


## Introduction

We will simulate and estimate a first order vector auto-regression model in 
this example using the particle filter and smoother. For details see 
[this vignette](../vignettes/Particle_filtering.pdf) which can also be found by 
calling `vignette("Particle_filtering", package = "dynamichazard")`. The models 
we are going to simulate from and estimate are of the form 

$$
\begin{split}
 	y_{it} &\sim g(\cdot\vert\eta_{it}) &  \\
 	\vec{\eta}_t &= X_tR^+\vec{\alpha}_t + Z_t\vec{\beta} +  \vec{o}_t \\
 	\vec{\alpha}_t &= F\vec{\alpha}_{t - 1} + R\vec{\epsilon}_t &
 		\quad \vec{\epsilon}_t \sim N(\vec{0}, Q) \\
	& &	\quad \vec{\alpha}_0 \sim N(\vec{a}_0, Q_0)
\end{split}, \qquad
\begin{array}{l} i = 1, \dots, n_t \\ t = 1, \dots, d \end{array}
$$

where the $y_{it}$ is individual $i$'s indicator at time $t$ for whether he dies 
between time $(t - 1, t]$. The indicators, $y_{it}$, 
are binomial distributed with the complementary log-log link function conditional on 
knowing the log times at risk, $\vec{o}_1,\cdots,\vec{o}_d$, covariates, $X_t$ and $Z_t$, and latent states, $\vec{\alpha}_1,\dots,\vec{\alpha}_d$. 
The total survival time of individual $i$ is $T_i$ which is 
piecewise constant exponentially distributed
conditional on knowing the latent states. Further, we set $Z_t = X_t$ so the 
states have a non-zero mean. The true values are 

$$
F = \begin{pmatrix}0.9 & 0 \\ 0 & 0.9 \end{pmatrix}, \quad
Q = \begin{pmatrix}0.33^2 & 0 \\ 0 & 0.33^2 \end{pmatrix}, \quad
R = I_2, \quad 
\vec{\beta} = (-6.5, -2)^\top
$$

```{r assign_kit_info, echo = FALSE}
git_key <- system("git rev-parse --short HEAD", intern = TRUE)
git_bra <- system("git branch", intern = TRUE)
regexp <- "^(\\*\ )(.+)$"
git_bra <- git_bra[grepl(regexp, git_bra)]
git_bra <- gsub(regexp, "\\2", git_bra)
```

where $I_2$ is the two-dimensional identity matrix and 
$\vec{a}_0$ and $Q_0$ are given by the invariant distribution. The unknown 
parameters to be estimated is everything but $Q_0$ and $R$ (since we fix $Q_0$ doing the estimation and we set $\vec{a}_0 = (0, 0)^\top$). This 
example is run on the git branch "`r git_bra`" with ID "`r git_key`". The code
can be found on 
[the github site for the package](https://github.com/boennecd/dynamichazard/tree/master/examples).
All functions which assignments are not shown and are not in the 
`dynamichazard` package can be found on the github site.

## Simulation

We start by simulating the data. Feel free to skip this part as the specifications 
are given above. First we assign the parameters for the simulation

```{r load_dynam}
library(dynamichazard)
```

```{r assign_sim_params, cache = 1}
n_obs     <- 1000L
n_periods <- 200L

Fmat <- matrix(c(.9, 0, 0, .9), 2)
Rmat <- diag(1    , 2)
Qmat <- diag(.33^2, 2)
Q_0  <- get_Q_0(Qmat, Fmat)
beta <- c(-6.5, -2)
```

`get_Q_0` is a function to get the covariance matrix for the invariant distribution.
Then we simulate and plot the latent states

```{r sim_plot_latent}
set.seed(54432125)
betas <- matrix(nrow = n_periods + 1, ncol = 2)
betas[1, ] <- rnorm(2) %*% chol(Q_0)
for(i in 1:n_periods + 1)
  betas[i, ] <- Fmat %*% betas[i - 1, ] + drop(rnorm(2) %*% chol(Qmat))

betas <- t(t(betas) + beta)

# plot of latent variables
cols <- c("black", "darkblue")
matplot(betas, type = "l", lty = 1, col = cols)
for(i in 1:2)
  abline(h = beta[i], lty = 2, col = cols[i])
```

We simulate the observations as follows

```{r sim_obs}
df <- replicate(n_obs, {
  # left-censoring
  tstart <- max(0L, sample.int((n_periods - 1L) * 2L, 1) - n_periods + 1L)

  # covariates
  x <- runif(1, -1, 1)
  covars <- c(1, x)

  # outcome (stop time and event indicator)
  y <- FALSE
  for(tstop in (tstart + 1L):n_periods){
    fail_time <- rexp(1) / exp(covars %*% betas[tstop + 1L, ])
    if(fail_time <= 1){
      y <- TRUE
      tstop <- tstop - 1L + fail_time
      break
    }
  }

  c(tstart = tstart, tstop = tstop, x = x, y = y)
})
df <- data.frame(t(df))
head(df, 10)
```

We left-censor the observations since we otherwise may end up with a low number
of observations towards the end. 

### Model without latent variables
We can fit a model without the latent variables (i.e., a constant coefficient
model) as follows

```{r survreg_fit}
surv_fit <- survreg(Surv(tstop - tstart, y) ~ x, df, dist = "exponential")
summary(surv_fit) # signs are flipped
logLik(surv_fit)
```

The signs are flipped as stated in `help("survreg")`. They are though close to 
$\vec{\beta}$ as expected. Further, we can compare the log-likelihood with this
models with the log-likelihood approximation we get from the particle filter
in the next section.

## Particle filter and smoother

We start off by using the generalized two-filter smoother from @briers09. We 
estimate the parameters with an EM algorithm by calling `PF_EM` as 
follows

<!-- 
  knitr::opts_knit$set(output.dir = ".")
  knitr::load_cache("brier_far_F", "pf_Brier", path = "examples/cache/firstOrderVAR-cache/") 
-->

```{r confs, cache = 1}
n_threads <- 6

# you can replace this with e.g.,  
# max(parallel::detectCores(logical = FALSE), 2)))
```

```{r brier_far_F, cache = 1, dependson = c("assign_sim_params", "confs")}
set.seed(30520116)
system.time(pf_Brier <- PF_EM(
  Surv(tstart, tstop, y) ~ x + ddFixed(x) + ddFixed_intercept(TRUE), df,
  Q_0 = diag(1, 2), Q = diag(1, 2), Fmat = matrix(c(.1, 0, 0, .1), 2), 
  by = 1, type = "VAR", model = "exponential", max_T = n_periods,
  control = PF_control(
    N_fw_n_bw = 750, N_smooth = 1, N_first = 2000, eps = .001,
    method = "AUX_normal_approx_w_cloud_mean", fix_seed = FALSE,
    n_max = 100, smoother = "Brier_O_N_square",
    # we add some extra variation to the proposal distributions
    nu = 8, covar_fac = 1.1, n_threads = n_threads)))
```

`system.time` is used to show the computation time. The estimated parameters are

```{r brier_ests_far_F}
pf_Brier$F
show_covar <- function(Q){
  cat("Standard deviations\n")
  print(sqrt(diag(Q)))
  cat("Lower correlation matrix\n")
  tmp <- cov2cor(Q)
  print(tmp[-1, -ncol(tmp)])
}
show_covar(pf_Brier$Q)
pf_Brier$fixed_effects
```

The effective sample size in the smoothing step at each time point is 

```{r brier_eff_far_F}
plot(pf_Brier$effective_sample_size$smoothed_clouds, type = "h", 
     ylim = range(0, pf_Brier$effective_sample_size$smoothed_clouds), 
     ylab = "Effective sample size")
```

```{r plot_clouds, echo = FALSE}
plot_cloud <- function(
  pf_fit, type = "smoothed_clouds", start_at_zero = FALSE){
  par_old <- par(no.readonly = TRUE)
  on.exit(par(par_old))
  
  # plot random effects
  jpeg(tempfile()) # avoid output from the plot
  out <- plot(pf_fit, type = type, col = c("black", "blue"))
  dev.off()
  par(par_old)

  # then we use it and add the fixed effects
  y <- t(t(out$mean) + pf_fit$fixed_effects)
  ylim <- range(betas, y)
  x <- if(start_at_zero) 1:nrow(out$mean) - 1  else 1:nrow(out$mean)
  matplot(x, t(t(out$mean) + pf_fit$fixed_effects), type = "l",
          lty = 1, col = c("black", "blue"), ylim = ylim)

  # add actual points
  matplot(0:n_periods, betas, type = "l", add = TRUE,
          lty = 2, col = c("black", "blue"), ylim = ylim)

  # show places where we had a hard time sampling
  ess <- pf_fit$effective_sample_size[[type]]
  idx <- which(ess <= 100)
  for(i in idx - start_at_zero)
    abline(v = i, lty = 2)
  invisible(ess)
}
```

The smoothed estimates of the latent states looks as follows (the vertical dashed lines are points where 
we did not sample well)

```{r brier_plot_far_F}
plot_cloud(pf_Brier)
```

The dashed non-vertical lines are the true curve and the continuous is the smoothed estimate. The 
approximate log-likelihoods from the particle filter at each EM iteration are

```{r brier_log_like_far_F}
plot(pf_Brier$log_likes)
```

We can compare the output above with the smoother from @fearnhead10

<!-- 
  knitr::opts_knit$set(output.dir = ".")
  knitr::load_cache("fear_far_F", "pf_Fear", path = "examples/cache/firstOrderVAR-cache/") 
-->

```{r fear_far_F, cache = 1, dependson = c("assign_sim_params", "confs")}
set.seed(30520116)
system.time(pf_Fear <- PF_EM(
  Surv(tstart, tstop, y) ~ x + ddFixed(x) + ddFixed_intercept(TRUE), df,
  Q_0 = diag(1, 2), Q = diag(1, 2), Fmat = matrix(c(.1, 0, 0, .1), 2), 
  by = 1, type = "VAR", model = "exponential", max_T = n_periods,
  control = PF_control(
    N_fw_n_bw = 1000, N_smooth = 2000, N_first = 2000, eps = .001,
    method = "AUX_normal_approx_w_cloud_mean", fix_seed = FALSE,
    n_max = 100, smoother = "Fearnhead_O_N",
    nu = 8, covar_fac = 1.1, n_threads = n_threads)))
```

The estimates are similar

```{r fear_ests_far_F}
pf_Fear$F
show_covar(pf_Fear$Q)
pf_Fear$fixed_effects
```

The effective sample size are better now though

```{r fear_eff_far_F}
plot(pf_Fear$effective_sample_size$smoothed_clouds, type = "h", 
     ylim = range(0, pf_Fear$effective_sample_size$smoothed_clouds), 
     ylab = "Effective sample size")
```

However, there are many options to reduce the computation time for the former smoother as e.g, 
mentioned in @briers09 but those are not implemented. The smoothed estimates are 
also rather close to what we saw before 

```{r fear_plot_far_F}
plot_cloud(pf_Fear)
```

The log-likelihoods are almost the same

```{r fear_log_like_far_F}
plot(pf_Fear$log_likes)
```

The log-likelihood where flat toward the end and not monotonically increasing

```{r fear_log_like_far_F_tail}
plot(tail(pf_Fear$log_likes, 20))
```

This will not happen in an EM algorithm but may happen in an MCEM algorithm due
to the Monte Carlo error. We may want to take a few more EM iterations with
more particles. We can do this as follows

<!-- 
  knitr::opts_knit$set(output.dir = ".")
  knitr::load_cache("more_parts", "pf_Fear_more", path = "examples/cache/firstOrderVAR-cache/") 
-->

```{r more_parts, cache = 1, dependson = "fear_far_F"}
set.seed(30520116)
pf_Fear_more <- PF_EM(
  Surv(tstart, tstop, y) ~ x + ddFixed(x) + ddFixed_intercept(TRUE), df,
  Q_0 = diag(1, 2), Q = pf_Fear$Q, Fmat = pf_Fear$F, 
  fixed_effects = pf_Fear$fixed_effects, by = 1, type = "VAR", 
  model = "exponential", max_T = n_periods,
  control = PF_control(
    N_fw_n_bw = 5000, N_smooth = 10000, N_first = 5000, eps = .001,
    method = "AUX_normal_approx_w_cloud_mean", fix_seed = FALSE,
    n_max = 10, smoother = "Fearnhead_O_N",   
    nu = 8, covar_fac = 1.1, n_threads = n_threads))
```

The log-likelihoods from these iterations are 

```{r more_parts_pf_Fear_log_like}
plot(pf_Fear_more$log_likes)
```

and the final estimates are

```{r more_parts_pf_Fear_show_params}
pf_Fear_more$F
show_covar(pf_Fear_more$Q)
pf_Fear_more$fixed_effects
```

### Averaging
Averaging is sometimes argued for in MCEM algorithm. E.g., see @cappe05 
section 11.1.2.2. The idea is to take an average the estimates or sufficient 
sufficient statistic after some iteration number. As of this writing, 
the present implementation does not allow one to change the number of particles
at each iteration as suggested in @cappe05.

<!-- 
  knitr::opts_knit$set(output.dir = ".")
  knitr::load_cache("averaging", "pf_Fear_avg", path = "examples/cache/firstOrderVAR-cache/") 
-->

```{r averaging, cache = 1, dependson = c("assign_sim_params", "confs")}
set.seed(30520116)
system.time(pf_Fear_avg <- PF_EM(
  Surv(tstart, tstop, y) ~ x + ddFixed(x) + ddFixed_intercept(TRUE), df,
  Q_0 = diag(1, 2), Q = diag(1, 2), Fmat = matrix(c(.1, 0, 0, .1), 2), 
  by = 1, type = "VAR", model = "exponential", max_T = n_periods, 
  control = PF_control(
    N_fw_n_bw = 200, N_smooth = 400, N_first = 1000, eps = 1e-5,
    method = "AUX_normal_approx_w_cloud_mean", fix_seed = FALSE, 
    n_max = 1000L, smoother = "Fearnhead_O_N", averaging_start = 150L,
    nu = 8, covar_fac = 1.1, n_threads = n_threads)))
```

```{r show_averaging_res}
pf_Fear_avg$F
show_covar(pf_Fear_avg$Q)
pf_Fear_avg$fixed_effects

plot(pf_Fear_avg$log_likes, type = "l")
```

### Stochastic gradient descent

We will run stochastic gradient descent as in @polyak92 which is described 
in @cappe05 [page 414]. 

<!-- 
  knitr::opts_knit$set(output.dir = ".")
  knitr::load_cache("sgd", path = "examples/cache/firstOrderVAR-cache/") 
-->

```{r assg_pf_start}
#####
# fit to start from
set.seed(30520116)
pf_start <- PF_EM(
  Surv(tstart, tstop, y) ~ x + ddFixed(x) + ddFixed_intercept(TRUE), df,
  Q_0 = diag(1, 2), Q = diag(1, 2), Fmat = matrix(c(.1, 0, 0, .1), 2), 
  by = 1, type = "VAR", model = "exponential", max_T = n_periods, 
  control = PF_control(
    N_fw_n_bw = 200, N_smooth = 400, N_first = 1000, eps = 1e-5,
    method = "AUX_normal_approx_w_cloud_mean", fix_seed = FALSE,
    n_max = 1L, smoother = "Fearnhead_O_N",
    nu = 8, covar_fac = 1.1, n_threads = n_threads))
```

```{r sgd, cache=1}
# Function to perform stochastic gradient descent. Uses Adam algorithm 
# suggested by Kingma et al. (2015).
# 
# Args: 
#   object: an object of class PF_EM. 
#   n_runs: number of iterations. 
#   ns: number of particles at each iteration. 
#   debug: TRUE if information should be printed during estimation. 
#   use_O_n_sq: TRUE if O(N^2) method should be used. 
#   lr: learning rate. 
#   mp: decay rate for first moment. 
#   vp: decay rate for second moment.
# 
# Returns: 
#   list with final estimates, log-likelihood approximations at each iteration,
#   and estimates at each iteration.
sgd <- function(object, n_runs, ns, debug = FALSE, use_O_n_sq = FALSE, 
                lr = .001, mp = .9, vp = .999)
{
  #####
  # get object to perform computation and matrices and vectors for output
  comp_obj <- PF_get_score_n_hess(object, use_O_n_sq = use_O_n_sq)
  state <- matrix(
    NA_real_, n_runs + 1L, length(object$F) + length(object$Q))
  
  # setup matrix to map from gradient to the state (the covariance part is only
  # the lower diagonal)
  dF <- length(object$F)
  dQ <- NCOL(object$Q)
  nQ <- (dQ * (dQ + 1L)) / 2L
  K <- matrix(0., ncol(state), length(object$F) + nQ)
  K[  1:dF ,   1:dF] <- diag(dF)
  # can use e.g., matrixcal instead to get the duplication matrix
  K[-(1:dF), -(1:dF)] <- dynamichazard:::.get_dup_mat(dQ)
  
  dfix <- length(object$fixed_effects)
  obs   <- matrix(
    NA_real_, n_runs + 1L, dfix)
  state[1, ] <- c(object$F, object$Q)
  obs  [1, ] <- object$fixed_effects
  lls <- rep(NA_real_, n_runs)
  
  mv <- vv <- rep(0., ncol(K) + dfix)
  
  for(i in 1:n_runs){
    comp_obj$set_n_particles(N_fw = ns[i], N_first = ns[i])
    fw <- comp_obj$run_particle_filter()
    lls[i] <- logLik(fw)
    score <- comp_obj$get_get_score_n_hess(only_score = TRUE)
    sc <- score$score
    
    mv <- mp * mv + (1 - mp) * sc
    vv <- vp * vv + (1 - vp) * sc^2
    muse <- mv / (1 - mp^i)
    vuse <- vv / (1 - vp^i)
    direc <- muse / (sqrt(vuse) + 1e-8)
    
    # step-half until we have a valid set a of parameters 
    n_halfs <- 0L
    n_halfs_max <- 25L
    lri <- lr
    while((n_halfs <- n_halfs + 1L) <= n_halfs_max){
      state_i <- 
        state[i + 1L, ]  <- state[i, ]  + lri * drop(K %*% direc[-(1:dfix)])
      obs_i   <- 
        obs  [i + 1L, ]  <- obs  [i, ]  + lri * direc[1:dfix]
      lri <- lri / 2
      
      # check that system is stationary
      Ftmp <- matrix(state_i[1:dQ^2], dQ, dQ)
      if(any(Mod(eigen(Ftmp)$values) >= 1))
        next
      
      # set parameters
      o <- comp_obj$set_parameters(state = state_i, obs = obs_i)
      
      # Q is positive definite 
      if(all(eigen(o$Q)$values > 0.))
        break
    }
    
    if(n_halfs > n_halfs_max)
      stop("step halvings failed")
    
    if(debug){
      msg <- paste0(sprintf("It %4d Log-likelihood %10.2f (max %10.2f)", i, 
                            lls[i], max(lls, na.rm = TRUE))) 
      cat(msg, "\n", rep("-", nchar(msg)), "\n", sep = "")
      cat(sprintf("Gradient norm %14.4f\n", norm(t(sc))))
      print(o$Fmat)
      print(o$Q)
      print(o$fixed_effects)
      cat("\n")
    }
  }
  
  list(o = o, lls = lls, state = state, obs = obs)
}

# number of iterations
n_runs <- 400L
# number of particles to use
ns <- ceiling(exp(seq(log(100), log(1000), length.out = n_runs)))

out <- sgd(n_runs = n_runs, ns = ns, object = pf_start, lr = .005)
lls <- out$lls
o <- out$o
```

The results are illustrated below

```{r show_sgd_res}
# approximate log-likelihood at each iteration
par(mar = c(5, 4, 1, 1))
plot(lls, xlab = "Iteration", ylab = "Log-likelihood", type = "l")
print(max(lls), digits = 5)

# the estimates
o$Fmat
show_covar(o$Q)
o$fixed_effects
```

We can also use the method from @Poyiadjis11. The present implementation 
scales poorly in the number of particles but the method has a variance 
that may be linear in the number of time periods instead of at least quadratic. 
This is important for long time series.

<!-- 
  knitr::opts_knit$set(output.dir = ".")
  knitr::load_cache("O_N_sq_sgd", path = "examples/cache/firstOrderVAR-cache/") 
-->

```{r O_N_sq_sgd, cache = 1, dependson = "sgd"}
# number of iterations
n_runs <- 400L
# number of particles to use
ns <- ceiling(exp(seq(log(200), log(1000), length.out = n_runs)))

out <- sgd(n_runs = n_runs, ns = ns, use_O_n_sq = TRUE,  object = pf_start, 
           lr = .005)
lls <- out$lls
o <- out$o
```

The results are illustrated below (the log-likelihood approximations from 
the forward particle filter are poor approximations due to few particles).

```{r O_N_sq_show_sgd_res}
# approximate log-likelihood at each iteration
par(mar = c(5, 4, 1, 1))
plot(lls, xlab = "Iteration", ylab = "Log-likelihood", type = "l")
print(max(lls), digits = 5)

# the estimates
o$Fmat
show_covar(o$Q)
o$fixed_effects
```

We can approximate standard errors as follows

<button onclick="show_obs_info()" style="
    color: #494949 !important;
    background: #ffffff;
    padding: 5px;
    border: 4px solid #494949 !important;
    border-radius: 6px;
    display: inline-block;
    margin: 5px 0;"
>Show/hide code and result</button>

<script>
function show_obs_info() {
  var x = document.getElementById("obsInfo");
  if (x.style.display === "none") {
    x.style.display = "block";
  } else {
    x.style.display = "none";
  }
}
</script>

<div id="obsInfo" style = "display: none;">

<!-- 
  knitr::opts_knit$set(output.dir = ".")
  knitr::load_cache("obs_info_sgd", path = "examples/cache/firstOrderVAR-cache/") 
-->

```{r obs_info_sgd, cache = 1, dependson = "O_N_sq_sgd"}
set.seed(24680530)
aprxs <- replicate(10, {
  # make approximation
  sink(tempfile())
  comp_obj <- PF_get_score_n_hess(pf_start, use_O_n_sq = TRUE)
  sink()
  comp_obj$set_parameters(
    state = c(o$Fmat, o$Q), obs = o$fixed_effects)
  comp_obj$set_n_particles(N_fw = 1000L, N_first = 10000L)
  comp_obj$run_particle_filter()
  comp_out <- comp_obj$get_get_score_n_hess()
  
  # return approximate observed information matrix, score and standard erors 
  list(
    obs_info = comp_out$obs_info,
    score = comp_out$score, 
    ses = suppressWarnings(rbind(
      full = sqrt(diag(solve(comp_out$obs_info))),
      # assuming expected gradient is zero
      `w/o gradient` = 
        sqrt(diag(solve(comp_out$obs_info - tcrossprod(comp_out$score)))))))
}, simplify = FALSE)
```

```{r show_obs_info_sgd}
# may be quite noisy...
aperm(simplify2array(lapply(aprxs, "[[", "ses")), c(1, 3, 2))

# the gradient estimates are not exaclty zero in each run (may seem like it 
# has not converged from score below and the `sgd` output)
t(sapply(aprxs, "[[", "score"))

# we can consider expanding averages instead
run_means <- apply(sapply(aprxs, "[[", "score"), 1L, function(x)
  sapply(seq_along(x), function(i) mean(x[1:i])))
matplot(run_means, type = "l", xlab = "Elements in expanding mean", 
        ylab = "score", lty = 1, col = "black")
abline(h = 0, lty = 2)
tail(run_means, 1) # final values

run_means_info <- simplify2array(lapply(aprxs, "[[", "obs_info"))
run_means_info <- apply(run_means_info, 1:2, function(x)
  sapply(seq_along(x), function(i) mean(x[1:i])))
run_means_info_se <- apply(run_means_info, 1L, function(x)
  sqrt(diag(solve(x))))
matplot(t(run_means_info_se), type = "l", xlab = "Elements in expanding mean", 
        ylab = "Approximate standard errors", lty = 1, col = "black", 
        ylim = c(0, max(run_means_info_se, na.rm = TRUE)))
run_means_info_se[, ncol(run_means_info_se)] # final values
```

</div>

### Log-likelihood evaluation 
We may question what the log-likelihood is at the true parameters. We can use 
the `PF_forward_filter` function to perform approximate log-likelihood evaluation

<!-- 
  knitr::opts_knit$set(output.dir = ".")
  knitr::load_cache("pF_forward_filter_true", path = "examples/cache/firstOrderVAR-cache/") 
-->

```{r pF_forward_filter_true, cache = 1, dependson = c("assign_sim_params", "confs")}
fw_precise <- PF_forward_filter( 
  Surv(tstart, tstop, y) ~ x + ddFixed(x) + ddFixed_intercept(TRUE),
  N_fw = 100000, N_first = 100000, df, type = "VAR",
  model = "exponential", max_T = n_periods, by = 1,
  control = PF_control(
    N_fw_n_bw = 1, N_smooth = 1, N_first = 1,
    method = "AUX_normal_approx_w_cloud_mean",
    smoother = "Fearnhead_O_N",
    n_threads = n_threads, nu = 8, covar_fac = 1.1),
  Fmat = Fmat, a_0 = c(0, 0), Q = Qmat, Q_0 = Q_0, R = diag(1, 2),
  fixed_effects = beta)
logLik(fw_precise)
```

The log-likelihood from the final model we estimated is

```{r end_log_like_before}
print(tail(pf_Fear_more$log_likes, 1), digits = 7)
print(tail(pf_Brier$log_likes, 1), digits = 7)
```

or we can get a more precise estimate by calling (though the log-likelihood 
is evaluated at the final parameter estimates and not the iteration one
step prior as above)

```{r end_log_like_before_precise, cache = 1, dependson = "fear_far_F"}
print( 
  logLik(PF_forward_filter(pf_Fear_more, N_fw = 20000, N_first = 20000)), 
  digits = 7)
```

A question is what happens if we start at the true parameter values? We may expect that 
we only take a few EM iterations and end up at the MLE or another local 
maximum not to far from the true parameters

<!-- 
  knitr::opts_knit$set(output.dir = ".")
  tmp <- knitr::load_cache("fear_true", "pf_Fear_close", path = "examples/cache/firstOrderVAR-cache/") 
-->

```{r fear_true, cache = 1, dependson = c("assign_sim_params", "confs")}
set.seed(30520116)
pf_Fear_close <- PF_EM(
  Surv(tstart, tstop, y) ~ x + ddFixed(x) + ddFixed_intercept(TRUE), df,
  Fmat = Fmat, a_0 = c(0, 0), Q = Qmat, Q_0 = Q_0, fixed_effects = beta,
  by = 1, type = "VAR", model = "exponential", max_T = n_periods,
  control = PF_control(
    N_fw_n_bw = 500, N_smooth = 2000, N_first = 2000, eps = .001,
    method = "AUX_normal_approx_w_cloud_mean", fix_seed = FALSE,
    n_max = 100, smoother = "Fearnhead_O_N",
    nu = 8, covar_fac = 1.1, n_threads = n_threads))
```

The final estimates are

```{r fear_true_log_like_ests}
pf_Fear_close$F
show_covar(pf_Fear_close$Q)
pf_Fear_close$fixed_effects
```

The log-likelihoods at the EM iterations look as follows

```{r fear_true_log_like}
plot(pf_Fear_close$log_likes, type = "l")
```

### Restricted model 
The model can be written as in terms of 2 parameters for the state model by 
writing it as 

$$
F = \begin{pmatrix}
      \theta & 0 \\ 
      0 & \theta
    \end{pmatrix}, \quad
Q = \begin{pmatrix}
    \exp(2\psi) & 0 \\
    0 & \exp(2\psi)
    \end{pmatrix}
$$

We can estimate the restricted model as follows (see the vignette mentioned in 
the beginning for details about the arguments to `PF_EM`).

<!-- 
  knitr::opts_knit$set(output.dir = ".")
  knitr::load_cache("est_restrict", "pf_Fear_restrict", path = "examples/cache/firstOrderVAR-cache/") 
-->

```{r est_restrict, cache = 1}
G <- matrix(0, 2^2, 1)
G[1, 1] <- G[4, 1] <- 1
J <- matrix(1, 2, 1)
K <- matrix(nrow = 2 * (2 - 1) / 2, ncol = 0)

pf_Fear_restrict <- PF_EM(
  Surv(tstart, tstop, y) ~ x + ddFixed(x) + ddFixed_intercept(TRUE), df,
  Q_0 = diag(1, 2), G = G, J = J, K = K, theta = .1, psi = 0, phi = numeric(),
  by = 1, type = "VAR", model = "exponential", max_T = n_periods,
  control = PF_control(
    N_fw_n_bw = 500, N_smooth = 2000, N_first = 2000, eps = .001,
    method = "AUX_normal_approx_w_cloud_mean", fix_seed = FALSE,
    n_max = 100, smoother = "Fearnhead_O_N",
    nu = 8, covar_fac = 1.1, n_threads = n_threads))
```

The final estimates are

```{r est_restrict_show_ests}
pf_Fear_restrict$F
show_covar(pf_Fear_restrict$Q)
pf_Fear_restrict$fixed_effects
```

and a log-likelihood approximation is

```{r pf_Fear_restrict_ll, cache = 1, dependson = "est_restrict"}
print( 
  logLik(PF_forward_filter(pf_Fear_restrict, N_fw = 20000, N_first = 20000)), 
  digits = 7)
```

We have $2^2 + 2(2 + 1) / 2 - 4 = 3$ fewer parameters so the difference in 
log-likelihood to the full model seems reasonable.

## References