---
title: "16 - Mixed models with `caracas`"
date: "`r date()`"
output: 
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 3
    number_sections: true
vignette: >
  %\VignetteIndexEntry{16 - Mixed models with `caracas`}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
  )
options("digits"=4)
```

# Introduction

```{r, message=FALSE, echo=FALSE}
library(caracas)
library(Matrix)
##packageVersion("caracas")
```

```{r, echo=FALSE}
if (!has_sympy()) {
  # SymPy not available, so the chunks shall not be evaluated
  knitr::opts_chunk$set(eval = FALSE)
}
```

This vignette is based on `caracas` version 
`r packageVersion("caracas")`. `caracas` is avavailable on CRAN at
[https://cran.r-project.org/package=caracas] and on github at 
[https://github.com/r-cas/caracas].


# Linear mixed model

A linear mixed model can be written in general form as

$$
y = Xb + Zu + e
$$


Here, $y$ is a vector of observables, $X$ is a model matrix and $b$ is
a corresponding vector of regression coefficients. Also $Z$ is a model
matrix and $u$ is a corresponding vector of random effects. Lastly,
$e$ is also a vector of random errors. Because there are two random
effects ($u$ and $e$) such models are often called mixed models or
variance component models. 

It is assumed that $u$ and $e$ are
independent and that $u \sim N(0, G)$ and $e \sim N(0, R)$. Typically
$G$ and $R$ depend on unknown parameters $\theta$, so we may write
$G(\theta)$ and $R(\theta)$ instead.
 
Consequently,

$$
E(y)=\mu=Xb, \quad Var(y) = V(\theta) =ZG(\theta)Z'+R(\theta).
$$

As such, linear mixed model, is just a general model for the normal distribution,

$$
 y \sim N(Xb, V(\theta))
$$

where $V=V\theta$ has the special structure given above. 
In the following

1. We illustrate fitting a mixed model based on a subset of the shoes
data available, e.g. in the MASS and doBy packages. 

1. We also compare
the results with the output from lmer() if the lme4 package is
installed and with gls() if the nlme package is installed.


## The likelihood

The log-likelihood is

$$
logL(b, \theta) = -\frac 1 2 (\log|V(\theta)| + (y-Xb)'V(\theta)^{-1}(y-Xb)).
$$

For any value of $\theta$, the MLE for $b$ is

$$
\hat b =\hat b(\theta) = (X'V(\theta)^{-1}X)^{-1}X'V(\theta)^{-1}y.
$$

Plugging this estimate into the log-likelihood gives the profile
log-likelihood which is a function of $\theta$ only:

$$
plogL(\theta) = - \frac 1 2 (\log|V(\theta)| + (y-X\hat b(\theta))'V(\theta)^{-1}(y-X\hat b(\theta))).
$$

This function must typically be maximized with numerical methods. 


## Shoes data

```{r}
shoes_long <- data.frame(
    stringsAsFactors = FALSE,
    type = c("A", "B", "A", "B", "A", "B", "A", "B"),
    wear = c(13.2, 14, 8.2, 8.8, 10.9, 11.2, 14.3, 14.2),
    boy = as.factor(c("1", "1", "2", "2", "3", "3", "4", "4")))

y <- shoes_long$wear
X <- model.matrix(~type, data=shoes_long) |> head(10)
Z <- model.matrix(~-1+boy, data=shoes_long) |> head(10)
y |> head()
X |> head()
Z |> head()
```


# Fitting model with `caracas`

We define the following symbols in caracas. All caracas symbols are
postfixed with an underscore.

```{r}
y_ <- as_sym(y)
X_ <- as_sym(X)
Z_ <- as_sym(Z)
b_ <- vector_sym(ncol(X_), "b")
def_sym(tau2, sigma2)
```

```{r}
## Covariance matrices for random effects
G_  <- diag_("tau^2", ncol(Z))
R_  <- diag_("sigma^2", nrow(Z))
```

So for this example, $\theta=(\tau, \sigma)$. The variance of $y$ is

```{r}
## Variance etc of y
ZGZt_ <- Z_ %*% G_ %*% t(Z_) 
V_  <- ZGZt_ + R_
detV_ <- determinant(V_, log=FALSE)
```

Various methods are available in `caracas` for inverting such a matrix


The inverse of a block diagonal matrix is block diagonal. The first
blocks of $V^{-1}$ are:

```{r}
Vi_  <- inv_woodbury(R_, Z_, G_) |> simplify() ## or
Vi_  <- inv(V_, method="block") |> simplify()
```


## Programmatic approach

A programmatic approach is to define a function returning the
log-likelihood and the profile log-likelihood as a caracas
symbol. 

```{r}
get_logL0 <- function(y, X, V, b=NULL) {
    if (is.null(b))
        b <- vector_sym(ncol(X_), "b")        
    Vi   <- inv(V, method="block")
    detV <- determinant(V, log=FALSE)    
    res  <- y - X %*% b
    Q    <- t(res) %*% Vi %*% res
    aux  <- list(b=b, Q=Q)    
    out  <- -0.5 * (log(detV) + Q)
    attr(out, "aux") <- aux    
    return(out)
}

get_logLp <- function(y, X, V) {
    Vi   <- inv(V, method="block")
    detV <- determinant(V, log=FALSE)        
    b    <- solve(t(X) %*% Vi %*% X, t(X) %*% Vi) %*% y
    res  <- y - X %*% b
    Q    <- t(res) %*% Vi %*% res
    aux  <- list(b=b, Q=Q)
    out  <- -0.5 * (log(detV) + Q)
    attr(out, "aux") <- aux
    return(out)
}
```

The following functions also return caracas symbols:

```{r}
get_b_est <- function(y, X, V) {
    Vi <- inv(V, method="block")
    return(solve(t(X) %*% Vi %*% X, t(X) %*% Vi) %*% y)
}

get_V <- function(Z, G, R) {
    return(Z %*% G %*% t(Z) + R)
}

get_hessian <- function(logL, b) {
    return(-der2(logL, b))
}
```

## Maximizing the profile likelihood

```{r}
V_     <- get_V(Z_, G_, R_)
b_     <- vector_sym(ncol(X_), "b")
b_est_ <- get_b_est(y_, X_, V_) |> simplify()
```


Notice that $\hat b$ is a function of $V$ and hence of $\theta$. If
$\hat b$ is substituted into the log-likelihood, we get the profile
log-likelihood. On the other hand, $b$ is a parameter of fixed effect
and the log-likelihood is a function of both $b$ and $\theta$.

Get logL as a symbol which depends on some variables
```{r}
logLp <- get_logLp(y_, X_, V_) 
all_vars(logLp)

logL0 <- get_logL0(y_, X_, V_)
all_vars(logL0)
```

```{r}
out <- optim_sym(c(.1, .1),
                 logLp,
                 method="L-BFGS-B", 
                 control=list(fnscale=-1), hessian=TRUE)
par <- out$par
par
subs(attr(logLp, "aux")$b, par)
b_est2_ <- subs(b_est_, par)
b_est2_
as_expr(b_est2_)
```

A technicality: A caracas symbol can be coerced to an R
function. Therefor $\hat b$ can be evaluated at the MLE as a function
of $\theta$.

```{r}
as(b_est_, "function")
as(b_est_, "function")(par)
```

## Asymptotic variance of the MLE

To obtain the asymptotic variance of $\hat b$, we need to compute the
Hessian of the log-likelihood at the MLE. This is done by
differentiating the log-likelihood with respect to $b$ and then
evaluating at the MLE. The Hessian is then inverted to obtain the
asymptotic variance of $\hat b$.

```{r}
logL_ <- get_logL0(y_, X_, V_) |> simplify()
J_    <- get_hessian(logL_, b_) 

J2_   <- subs(J_, par)
Vb_   <- solve(J2_)

as_expr(b_est2_)
as_expr(Vb_)
```


## Maximizing the full likelihood

An alternative is to maximize the full likelihood, we need to maximize
with respect to both the variance parameters and the fixed
effects. For this to work we often need to impose restrictions on the
parameter space (not necessary for this specific example, though)

```{r}
eps <- 1e-6
out <- optim_sym(c(0, 0, .1, .1), logL_, method="L-BFGS-B", 
                 lower=c(-Inf, -Inf, eps, eps), upper=c(Inf, Inf, Inf, Inf),
                 control=list(fnscale=-1), hessian=TRUE)
out$par
solve(-out$hessian)[1:2, 1:2] |> round(4)
```


# Comparison with `gls()` and `lmer()`

## Comparison with gls() - if available

```{r}
if (require(nlme)) {
    model <- gls(wear ~ type, correlation = corCompSymm(form = ~ 1 | boy), 
                 method="ML",
                 data = shoes_long)

    sigma2 <- model$sigma^2
    C <- corMatrix(model$modelStruct$corStruct)
    V <- sigma2 * as.matrix(bdiag(C))
    as(V, "sparseMatrix")
}
```

## Comparison with lmer() - if available

```{r}
if (require(lme4)) {
    lmm_fit  <- lmer(wear ~ type + (1|boy), data=shoes_long, REML=FALSE)
    print(VarCorr(lmm_fit))
    print(fixef(lmm_fit))
    print(vcov(lmm_fit))

    var.d <- crossprod(getME(lmm_fit,"Lambdat"))
    Zt <- getME(lmm_fit, "Zt")
    vr <- sigma(lmm_fit)^2
    var.b <- vr*(t(Zt) %*% var.d %*% Zt)
    sI <- vr * Diagonal(ncol(Zt))
    var.y <- var.b + sI
    var.y
}
```