Day 2 Mixed models I

January 29th, 2026

2.1 Recall the most common statistical model

\[\mathbf{y} \sim N(\boldsymbol\mu, \boldsymbol\Sigma),\]

where:

  • \(\mathbf{y} \equiv [y_1, y_2, \dots, y_n]'\) contains the response data,
  • \(\boldsymbol{\mu} \equiv [\mu_1, \mu_2, \dots, \mu_n]'\) contains the expected values of said data,
  • \(\boldsymbol\Sigma\) is the variance-covariance matrix.

The most typical model typically has:

  • \(\boldsymbol\mu = \mathbf{X}\boldsymbol{\beta}\) and
  • \(\boldsymbol\Sigma = \sigma^2\mathbf{I}\).

We can write the default model in most software written above as:

\[\mathbf{y} \sim N(\boldsymbol{\mu}, \Sigma),\\ \begin{bmatrix}y_1 \\ y_2 \\ y_3 \\ y_4 \\ \vdots \\ y_n \end{bmatrix} \sim N \left( \begin{bmatrix}\mu_1 \\ \mu_2 \\ \mu_3 \\ \mu_4 \\ \vdots \\ \mu_n \end{bmatrix}, \sigma^2 \begin{bmatrix} 1 & 0 & 0 & 0 & \dots & 0 \\ 0 & 1 & 0 & 0 & \dots & 0 \\ 0 & 0 & 1 & 0 & \dots & 0 \\ 0 & 0 & 0 & 1 & \dots & 0 \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots\\ 0 & 0 & 0 & 0 & \dots & 1 \end{bmatrix} \right),\]

which is the same as

\[\begin{bmatrix}y_1 \\ y_2 \\ y_3 \\ y_4 \\ \vdots \\ y_n \end{bmatrix} \sim N \left( \begin{bmatrix}\mu_1 \\ \mu_2 \\ \mu_3 \\ \mu_4 \\ \vdots \\ \mu_n \end{bmatrix}, \begin{bmatrix} \sigma^2 & 0 & 0 & 0 & \dots & 0 \\ 0 & \sigma^2 & 0 & 0 & \dots & 0 \\ 0 & 0 & \sigma^2 & 0 & \dots & 0 \\ 0 & 0 & 0 & \sigma^2 & \dots & 0 \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots\\ 0 & 0 & 0 & 0 & \dots & \sigma^2 \end{bmatrix} \right).\]

In summary, the assumptions are:

  • Linearity (or whatever the deterministic equation is)
  • Normality
  • Independence
  • Constant variance

2.1.1 Types of predictors

Quantitative predictors

Just like yesterday:

\[y_{i} \sim N(\mu_i, \sigma^2), \\ \mu_i = \beta_0 + x_{1i} \beta_1 + x_{2i} \beta_2,\] and say that \(y_{i}\) is the observed value for the \(i\)th observation, \(\beta_0\) is the intercept, \(x_{1i}\) if the \(i\)th observation of \(x_1\), \(\beta_1\) is the expected increase in \(y\) for each unit increase of \(x_1\), \(x_{2i}\) if the \(i\)th observation of \(x_2\), \(\beta_2\) is the expected increase in \(y\) for each unit increase of \(x_2\).

Qualitative predictors

Instead of a quantitative predictor, we could have a qualitative (or categorical) predictor. Let the qualitative predictor have two possible levels, A and B. We could use the same model as before,

\[y_{i} \sim N(\mu_i, \sigma^2), \\ \mu_i = \beta_0 + x_i \beta_1,\]

and say that \(y_{i}\) is the observed value for the \(i\)th observation, \(\beta_0\) is the expected value for A, \(x_i = 0\) if the \(i\)th observation belongs to A, and \(x_i = 1\) if the \(i\)th observation belongs to B. That way, \(\beta_1\) is the difference between A and B.

If the categorical predictor has more than two levels,

\[y_{i} \sim N(\mu_i, \sigma^2), \\ \mu_i = \beta_0 + x_{1i} \beta_1 + x_{2i} \beta_2+ ... + x_{ji} \beta_j,\]

\(y_{i}\) is still the observed value for the \(i\)th observation, \(\beta_0\) is the expected value for A (sometimes in designed experiments this level is a control),

\[x_{1i} = \begin{cases} 1, & \text{if } \text{Treatment is B} \\ 0, & \text{if } \text{else} \end{cases}, \\ x_{2i} = \begin{cases} 1, & \text{if } \text{Treatment is C} \\ 0, & \text{if } \text{else} \end{cases}, \\ \\ \dots, \\ \\ x_{ji} = \begin{cases} 1, & \text{if } \text{Treatment is J} \\ 1, & \text{if } \text{else} \end{cases}. \]

That way, all \(\beta\)s are the differences between the treatment and the control.

2.2 Variations to that very common statistical model

We can adapt the assumptions one by one:

  • Linearity – change the deterministic equation
  • Normality – assume a different distribution
  • Independence – implement a hierarchical/multi-level/mixed model that models the data that were generated together.
  • Constant variance – model the variance/assume a different distribution where the mean and the variance are not independent.

2.3 Relaxing the assumption of independence

  • What if we can model the variance-covariance matrix with something else that’s not \(\sigma^2 \mathbf{I}\)?

This is a visualization of the classical, default statistical model:

This is a visualization of what’s coming with mixed models:

Example 1:

Example 2:

2.4 Fixed effects and random effects

In what follows, we have a small elaboration of what it means to model that “structure in the data” (i.e., the groups of similarly generated points).

2.4.1 Going from fixed effects to fixed+random effects

The data below were generated by an experiment that tested 18 sorghum genotypes in a randomized complete block design.

  • Discuss the treatment structure
  • Discuss the design structure
dat_blocked <- agridat::omer.sorghum |> filter(env == "E3")

str(dat_blocked)
## 'data.frame':    72 obs. of  4 variables:
##  $ env  : Factor w/ 6 levels "E1","E2","E3",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ rep  : Factor w/ 4 levels "R1","R2","R3",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ gen  : Factor w/ 18 levels "G01","G02","G03",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ yield: num  449 458 545 547 784 ...

Now, remember that the blocks are supposed to indicate “groups of similar experimental units”. Said “groups of similar experimental units” means that the assumption of independence would be kind of a stretch. In reality, all observations from the same block (are supposed to) have something in common. It’s not reasonable to assume that the observations are independent, because observations from the same field have more in common than observations from different fields. They share more similar soil and, with that, a baseline fertility and yield.

Basically, we expect the genotypes relative performance to be similar across fields, but the baseline (a.k.a., the intercept) to be field-specific. Then, we could say
\[y_{ij} = \beta_{0j} + G_i + \varepsilon_{ij}, \\ \varepsilon_{ij} \sim N(0, \sigma^2),\]
where \(\beta_{0j}\) is a block-specific intercept, and \(G_i\) is the genotype effect.
Now, there are different ways to model that block-specific intercept.

This is a big forking path in statistical modeling. All-fixed models estimate the effects of everything. Mixed-effects models indicate what is similar to what via random effects.

Fixed effects

We could define an all-fixed model,

\[y_{ij} = \beta_{0j} + G_i + \varepsilon_{ij}, \\ \beta_{0j} = \beta_0 + u_j \\ \varepsilon_{ij} \sim N(0, \sigma^2),\]

where \(u_j\) is the effect of the \(j\)th block on the intercept (i.e., on the baseline). In this case, \(u_j\) is a fixed effect, which means it may be estimated via least squares estimation or maximum likelihood estimation. Under both least squares and maximum likelihood (assuming normal distribution), we may estimate the parameters by computing

\[\hat{\boldsymbol{\beta}}_{ML} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y},\]

which yields the minimum variance unbiased estimate of \(\boldsymbol{\beta}\).

We still have \[\boldsymbol{\Sigma} = \begin{bmatrix} \sigma_{\varepsilon}^2 & 0 & 0 & 0 & \dots & 0 \\ 0 & \sigma_{\varepsilon}^2 & 0 & 0 & \dots & 0 \\ 0 & 0 & \sigma_{\varepsilon}^2 & 0 & \dots & 0 \\ 0 & 0 & 0 & \sigma_{\varepsilon}^2 & \dots & 0 \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots\\ 0 & 0 & 0 & 0 & \dots & \sigma_{\varepsilon}^2 \end{bmatrix}\].

m_fixed <- lm(yield ~ gen + rep, data = dat_blocked)

summary(m_fixed)
## 
## Call:
## lm(formula = yield ~ gen + rep, data = dat_blocked)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -349.66  -81.58   -1.08   78.47  318.59 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   672.03      86.46   7.773 3.29e-10 ***
## genG02        -66.52     113.20  -0.588 0.559393    
## genG03        223.07     113.20   1.971 0.054201 .  
## genG04        167.40     113.20   1.479 0.145336    
## genG05         61.05     113.20   0.539 0.591996    
## genG06       -267.20     113.20  -2.361 0.022111 *  
## genG07        355.20     113.20   3.138 0.002826 ** 
## genG08        159.53     113.20   1.409 0.164820    
## genG09        231.31     113.20   2.043 0.046198 *  
## genG10        156.66     113.20   1.384 0.172393    
## genG11        146.29     113.20   1.292 0.202071    
## genG12        -42.87     113.20  -0.379 0.706453    
## genG13        243.18     113.20   2.148 0.036467 *  
## genG14          1.06     113.20   0.009 0.992565    
## genG15         64.72     113.20   0.572 0.570007    
## genG16         22.81     113.20   0.201 0.841122    
## genG17       -296.66     113.20  -2.621 0.011534 *  
## genG18        448.28     113.20   3.960 0.000233 ***
## repR2        -150.83      53.36  -2.827 0.006704 ** 
## repR3         -85.66      53.36  -1.605 0.114592    
## repR4        -124.21      53.36  -2.328 0.023936 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 160.1 on 51 degrees of freedom
## Multiple R-squared:  0.6748, Adjusted R-squared:  0.5473 
## F-statistic: 5.291 on 20 and 51 DF,  p-value: 7.64e-07

Random effects

We could also assume that the effects of the \(j\)th block (i.e., \(u_j\)) arise from a random distribution. The most common assumption (and the default in most statistical software) is that

\[u_j \sim N(0, \sigma^2_b).\]

Now, we don’t estimate the effect, but the variance \(\sigma^2_b\). Note that there are \(J\) levels of the random effects, meaning that a random effect is always categorical.
Also, now

\[\hat{\boldsymbol{\beta}}_{REML} = (\mathbf{X}^T \mathbf{V}^{-1} \mathbf{X})^{-1}\mathbf{X}^T \mathbf{V}^{-1} \mathbf{y},\]

where \(\mathbf{V} = Var(\mathbf{y})\) is the variance-covariance matrix of \(\mathbf{y}\), including residual variance and random-effects variance. Note that this formula yields the same point estimate for \(\boldsymbol{\beta}\), but with a different confidence interval.

Now, the variance-covariance matrix \(\boldsymbol{\Sigma}\) of the marginal distribution has changed.

library(lme4)
m_mixed <- lmer(yield ~ gen + (1|rep), data = dat_blocked)

summary(m_mixed)
## Linear mixed model fit by REML ['lmerMod']
## Formula: yield ~ gen + (1 | rep)
##    Data: dat_blocked
## 
## REML criterion at convergence: 729.7
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.99895 -0.58043 -0.01854  0.52108  1.92022 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  rep      (Intercept)  2906     53.91  
##  Residual             25627    160.09  
## Number of obs: 72, groups:  rep, 4
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)   581.86      84.46   6.889
## genG02        -66.52     113.20  -0.588
## genG03        223.07     113.20   1.971
## genG04        167.40     113.20   1.479
## genG05         61.05     113.20   0.539
## genG06       -267.20     113.20  -2.361
## genG07        355.20     113.20   3.138
## genG08        159.53     113.20   1.409
## genG09        231.31     113.20   2.043
## genG10        156.66     113.20   1.384
## genG11        146.29     113.20   1.292
## genG12        -42.87     113.20  -0.379
## genG13        243.18     113.20   2.148
## genG14          1.06     113.20   0.009
## genG15         64.72     113.20   0.572
## genG16         22.81     113.20   0.201
## genG17       -296.66     113.20  -2.621
## genG18        448.28     113.20   3.960
## 
## Correlation matrix not shown by default, as p = 18 > 12.
## Use print(x, correlation=TRUE)  or
##     vcov(x)        if you need it

2.5 Generalities on mixed models

Mixed models combine fixed effects and random effects. Generally speaking, we can write out a mixed-effects model using the model equation form, as

\[\mathbf{y} = \mathbf{X} \boldsymbol{\beta} + \mathbf{Z}\mathbf{u} + \boldsymbol{\varepsilon}, \\ \begin{bmatrix}\mathbf{u} \\ \boldsymbol{\varepsilon} \end{bmatrix} \sim \left( \begin{bmatrix}\boldsymbol{0} \\ \boldsymbol{0} \end{bmatrix}, \begin{bmatrix}\mathbf{G} & \boldsymbol{0} \\ \boldsymbol{0} & \mathbf{R} \end{bmatrix} \right),\]

where \(\mathbf{y}\) is the observed response, \(\mathbf{X}\) is the matrix with the explanatory variables, \(\mathbf{Z}\) is the design matrix, \(\boldsymbol{\beta}\) is the vector containing the fixed-effects, \(\mathbf{u}\) is the vector containing the random effects, \(\boldsymbol{\varepsilon}\) is the vector containing the residuals, \(\mathbf{G}\) is the variance-covariance matrix of the random effects, and \(\mathbf{R}\) is the variance-covariance matrix of the residuals. Note that \(\mathbf{X} \boldsymbol{\beta}\) is the fixed effects part of the model, and \(\mathbf{Z}\mathbf{u}\) is the random effects part of the model.

Using the probability distribution form, we can then say that \(E(\mathbf{y}) = \mathbf{X}\boldsymbol{\beta}\) and \(Var(\mathbf{y}) = \mathbf{V} = \mathbf{Z}\mathbf{G}\mathbf{Z}' + \mathbf{R}\). Usually, we assume \[\mathbf{G} = \sigma^2_u \begin{bmatrix} 1 & 0 & 0 & \dots 0 \\ 0 & 1 & 0 & \dots & 0 \\ 0 & 0 & 1 & \dots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \dots & 1 \end{bmatrix}\] and \[\mathbf{R} = \sigma^2 \begin{bmatrix} 1 & 0 & 0 & \dots & 0 \\ 0 & 1 & 0 & \dots & 0 \\ 0 & 0 & 1 & \dots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \dots & 1 \end{bmatrix} \].

Then,

\[\mathbf{y} \sim N(\boldsymbol{\mu}, \Sigma), \\ \Sigma = \begin{bmatrix} \sigma^2 + \sigma^2_u & \sigma^2_u & 0 & 0 & 0 & 0 &\dots & 0\\ \sigma^2_u & \sigma^2 + \sigma^2_u & 0 & 0 & 0 & 0 & \dots & 0 \\ 0 & 0 & \sigma^2 + \sigma^2_u & \sigma^2_u & 0 & 0 & \dots & 0 \\ 0 & 0 & \sigma^2_u & \sigma^2 + \sigma^2_u & 0 & 0 & \dots & 0 \\ 0 & 0 & 0 & 0 & \sigma^2 + \sigma^2_u & \sigma^2_u & \dots & 0 \\ 0 & 0 & 0 & 0 & \sigma^2_u & \sigma^2 + \sigma^2_u & \dots & \vdots \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & 0 & 0 & 0 & \dots & \sigma^2 + \sigma^2_u \end{bmatrix}.\]

Take your time to digest the variance-covariance matrix above. What type of data do you think generated it?

2.5.1 Random effects

  • By definition, random effects are regression coefficients that arise from a random distribution.
  • Typically, a random effect \(u \sim N(0, \sigma^2_u)\).
  • Note that this model for the parameter may result in shrinkage.
  • We estimate the variance \(\sigma^2_u\).
  • Calculating degrees of freedom can get much more complex than in all-fixed effects models (e.g., with unbalanced data, spatio-temporally correlated data, or non-normal data).
  • In the context of designed experiments, random effects are assumed to be independent to each other and independent to the residual.

2.5.2 Estimation of parameters

“Estimation” is a term held mostly exclusive to fixed effects and variance components. Restricted maximum likelihood estimation (REML) is the default in most mixed effects models because, for small data (aka most experimental data), maximum likelihood (ML) provides variance estimates that are downward biased. - In REML, the likelihood is maximized after accounting for the model’s fixed effects.

  • In ML, \(\ell_{ML}(\boldsymbol{\sigma; \boldsymbol{\beta}, \mathbf{y}}) = - (\frac{n}{2}) \log(2\pi)-(\frac{1}{2}) \log ( \vert \mathbf{V}(\boldsymbol\sigma) \vert ) - (\frac{1}{2}) (\mathbf{y}-\mathbf{X}\boldsymbol{\beta})^T[\mathbf{V}(\boldsymbol\sigma)]^{-1}(\mathbf{y}-\mathbf{X}\boldsymbol{\beta})\)
  • In REML, \(\ell_{REML}(\boldsymbol{\sigma};\mathbf{y}) = - (\frac{n-p}{2}) \log (2\pi) - (\frac{1}{2}) \log ( \vert \mathbf{V}(\boldsymbol\sigma) \vert ) - (\frac{1}{2})log \left( \vert \mathbf{X}^T[\mathbf{V}(\boldsymbol\sigma)]^{-1}\mathbf{X} \vert \right) - (\frac{1}{2})\mathbf{r}[\mathbf{V}(\boldsymbol\sigma)]^{-1}\mathbf{r}\), where \(p = rank(\mathbf{X})\), \(\mathbf{r} = \mathbf{y}-\mathbf{X}\hat{\boldsymbol{\beta}}_{ML}\).
    • Start with initial values for \(\boldsymbol{\sigma}\), \(\tilde{\boldsymbol{\sigma}}\).
    • Compute \(\mathbf{G}(\tilde{\boldsymbol{\sigma}})\) and \(\mathbf{R}(\tilde{\boldsymbol{\sigma}})\).
    • Obtain \(\boldsymbol{\beta}\) and \(\mathbf{b}\).
    • Update \(\tilde{\boldsymbol{\sigma}}\).
    • Repeat until convergence.

2.5.3 Fixed effects versus random effects

What is behind a random effect:

  • $ N ( , (^T ^{-1} )^{-1} ) $
  • \(u_j \sim N(0, \sigma^2_u)\)
  • What process is being studied?
  • How were the levels selected? (randomly, carefully selected)
  • How many levels does the factor have, vs. how many did we observe?
  • BLUEs versus BLUPs.

Read more in in Gelman (2005, page 20), “Analysis of variance—why it is more important than ever” [link], and Gelman and Hill (2006), page 245.

2.6 Applied examples

Three sets of data describe the relationship between treatment and crop yield. However, the structures in the data (i.e., data architecture) are different.

2.6.1 Example A – independence holds

dat_independent <- read.csv("../data/cochrancox_kfert.csv")

m_independent <- lm(yield ~ factor(K2O_lbac), data = dat_independent)

2.6.2 Example B – simple groups of similar observations

library(lme4)
dat_blocked <- agridat::omer.sorghum |> filter(env == "E3")

m_blocked <- lmer(yield ~ gen + (1|rep), data = dat_blocked)

2.6.3 Example C – different groups of similar observations

dat_multilevel <- agridat::durban.splitplot

m_multilevel <- lmer(yield ~ gen*fung + (1|block/fung), data = dat_multilevel)

2.7 Coming up tomorrow:

  • Review on mixed models
  • Model diagnostics and model comparison
  • Kahoot