Day 4 Mixed models III – model checking, ANOVA, statistical inference

February 2nd, 2026

4.1 Review

Mixed-effects models combine fixed effects and random effects. Typically, we can define a Gaussian mixed-effects model as

\[\mathbf{y} = \mathbf{X} \boldsymbol{\beta} + \mathbf{Z}\mathbf{u} + \boldsymbol{\varepsilon}, \\ \begin{bmatrix}\mathbf{u} \\ \boldsymbol{\varepsilon} \end{bmatrix} \sim \left( \begin{bmatrix}\boldsymbol{0} \\ \boldsymbol{0} \end{bmatrix}, \begin{bmatrix}\mathbf{G} & \boldsymbol{0} \\ \boldsymbol{0} & \mathbf{R} \end{bmatrix} \right),\]

where \(\mathbf{y}\) is the observed response, \(\mathbf{X}\) is the matrix with the explanatory variables, \(\mathbf{Z}\) is the design matrix, \(\boldsymbol{\beta}\) is the vector containing the fixed-effects parameters, \(\mathbf{u}\) is the vector containing the random effects parameters, \(\boldsymbol{\varepsilon}\) is the vector containing the residuals, \(\mathbf{G}\) is the variance-covariance matrix of the random effects, and \(\mathbf{R}\) is the variance-covariance matrix of the residuals. Typically, \(\mathbf{G} = \sigma^2_u \mathbf{I}\) and \(\mathbf{R} = \sigma^2 \mathbf{I}\).
If we do the math, we get that

\[E(\mathbf{y}) = \mathbf{X}\boldsymbol{\beta},\] \[Var(\mathbf{y}) = \mathbf{Z}\mathbf{G}\mathbf{Z}' + \mathbf{R}.\]

Fixed effects versus random effects

Fixed vs Random Effects Table

	Fixed effects	Random effects
Where	Expected value	Variance-covariance matrix
Inference	Constant for all groups in the population of study	Differ from group to group
Usually used to model	Carefully selected treatments or genotypes	The study design (aka structure in the data, or what is similar to what)
Assumptions	\[\hat{\boldsymbol{\beta}} \sim N \left( \boldsymbol{\beta}, (\mathbf{X}^T \mathbf{V}^{-1} \mathbf{X})^{-1} \right) \]	\[u_j \sim N(0, \sigma^2_u)\]
Method of estimation	Maximum likelihood, least squares	Restricted maximum likelihood (shrinkage)

4.2 Model checking and comparison

Important things to keep in mind:

Statistical models to analyze data generated by designed experiments.
Models created to explain vs. models created to predict. See Shmueli (2010).

4.3 Model checking

4.4 Simulation-based model-checking

Goal: to detect systematic differences between the model and observed data.
See Chapter 8 in Gelman and Hill.

library(lme4)
library(tidyverse)
library(latex2exp)
library(ggpubr)

dat <- read.csv("../data/N_fert.csv")

m1 <- lm(Yield_SY ~ Total_N, data= dat)
dat$residuals <- resid(m1)
dat$yhat <- predict(m1)

theme_set(theme_pubr())

y_sim <- simulate(m1, nsim =30, seed = 42) |>
  mutate(Total_N = dat$Total_N, 
         Yield_SY = dat$Yield_SY) |> 
  pivot_longer(cols = -c(Total_N, Yield_SY))

y_sim |> 
  ggplot(aes(value))+
  geom_histogram()+
  geom_histogram(aes(Yield_SY), fill = "gold", alpha = .3)+
  facet_wrap(~name)

## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.

y_sim |> 
  ggplot(aes(Total_N, value))+
  geom_point()+
  geom_point(aes(Total_N, Yield_SY), shape = 21, fill = "gold", alpha = .3)+
  facet_wrap(~name)

Discuss the plot above: how are the simulated data compared to the observed data?

4.5 Some useful metrics to compare models

4.5.1 Root mean squared error

Root mean squared error \(RMSE = \sqrt{\frac{1}{n} \cdot \sum_{i=1}^n(y_i-\hat{y}_i)^2}\)
In-sample versus out-of-sample RMSE

4.5.2 The coefficient of determination R²

Usually interpreted as the proportion of the variation in \(y\) that is explained with the variation in \(x\).
Used as a metric for predictive ability and model fit.
Can increase when adding more predictors.

The R² of a given model (and observed data) is calculated as \[R^2 = \frac{MSS}{TSS}= 1 - \frac{RSS}{TSS} = 1- \frac{MSE}{MST},\] where \(RSS\) is the residual sum of squares and \(TSS\) is the total sum o squares, and \(MSE\) is the mean squared error and \(MST\) is the mean squared of the data (i.e., \(y\) versus \(\bar{y}\).

4.5.3 Adjusted R²

The adjusted R² also penalizes the addition of extra parameters

\[R^2_{adj} = R^2 - (1 - R^2) \frac{p-1}{n-p},\]

where \(R^2\) is the one defined above, \(p\) is the number of parameters and \(n\) is the total number of observations.

4.5.4 Some issues with R²

Bootstrapped R²
Anscombe’s quartet
Out-of-Sample R²: Estimation and Inference

4.5.5 Akaike Information Criterion (AIC)

Used as a metric for predictive ability and model fit.
Lower value is better.
Values are always compared to other models (i.e., there are no general rules about reasonable AIC values).

The AIC of a given model \(M\) and observed data \(\mathbf{y}\) is calculated as
\[AIC_M = 2p - 2\log(\hat{L}),\]
\(p\) is the number of parameters estimated in the model and \(\hat{L}\) is the maximized value of the likelihood function for the model (i.e., \(\hat{L}=p(\mathbf{y}|\hat{\boldsymbol\beta}, M)\)).

4.5.6 Bayesian Information Criterion (BIC)

The BIC of a given model (and observed data) is a variant of AIC and is calculated as

\[BIC = p\log(n) - 2\log(\hat{L}),\]

where \(p\) is the number of parameters estimated in the model, \(n\) is the number of observations, and \(\hat{L}\) is the maximized value of the likelihood function for the model (i.e., \(\hat{L}=p(\mathbf{y}|\hat{\boldsymbol\beta}, M)\)).

4.6 Analysis of variance – ANOVA

One of the oldest methods to analyze and interpret the data.

knitr::include_graphics("figures/Ronald_Fisher.jpg")

Figure 4.1: Sir R.A. Fisher, the father of the ANOVA.

Helpful for understanding a previously fit model:
- Divide the sources of variability in “batches”.
- Quantify (& test) said sources of variability to see which ones are more relevant to characterize the data.
Some frequent assumptions:
- Linearity
- Normality
- Constant variance
- Discuss independence

4.6.1 This is how you build an ANOVA table

The “What would Fisher do” (WWFD) approach

Name the sources of variability and classify them into treatment sources of variability and ‘topographical’ (or logistical) sources.
Assign degrees of freedom to each category.
Combine both Sources (treatment + topographical) into a single ANOVA table.

WWFD in the whiteboard

Table 4.1: Constructing the ANOVA skeleton

Table 4.1: Experiment or Topographical
Source	df
Block	b-1
	-
Fungicide(Block)	(f-1)*b
	-
	-
Gen(Fung x Block)	(g-1)fb
Total	N-1

Table 4.1: Treatment
Source	df
-	-
Fungicide	f-1
-
Genotype	g-1
Fung x Gen	(f-1)(g-1)
Parallels	N-(f*g)
Total	N-1

Table 4.1: Combined Table
Source	df
Block	b-1
Fungicide	t-1
Fungicide(Block)	(f-1)*b - (t-1)
Genotype	g-1
Fung x Gen	(f-1)(g-1)
Pens(Block x Trt) error	(g-1)* f * b - (g-1 + (f-1)(g-1))
Total	N-1

4.6.2 The elements of the ANOVA

ANOVA table for the cookie split-plot experiment.

Source	df	SS	MS	EMS
Block	\(b-1\)			\[\sigma^2_{\varepsilon}+g\sigma^2_w+tg\sigma^2_d\]
Fungicide	\(t-1\)	\(SS_{F}\)	\(\frac{SS_{F}}{b-1}\)	\[\sigma^2_{\varepsilon}+g\sigma^2_w+\phi^2(\alpha)\]
Error(whole plot)	\((b-1)(t-1)\)			\[\sigma^2_{\varepsilon}+g\sigma^2_w\]
Genotype	\(g-1\)	\(SS_{G}\)	\(\frac{SS_{G}}{g-1}\)	\[\sigma^2_{\varepsilon}+\phi^2(\gamma)\]
\(T \times G\)	\((t-1)(g-1)\)	\(SS_{F \times G}\)	\(\frac{SS_{F \times G}}{(t-1)(g-1)}\)	\[\sigma^2_{\varepsilon}+\phi^2(\alpha \gamma)\]
Error(split plot)	\(t(b-1)(g-1)\)	\(SSE\)	\(\frac{SSE}{t(b-1)(g-1)}\)	\[\sigma^2_{\varepsilon}\]

knitr::kable(t_comb, caption = "Combined Table")

Table 4.2: Combined Table
Source	df
Block	b-1
Fungicide	t-1
Fungicide(Block)	(f-1)*b - (t-1)
Genotype	g-1
Fung x Gen	(f-1)(g-1)
Pens(Block x Trt) error	(g-1)* f * b - (g-1 + (f-1)(g-1))
Total	N-1

4.7 Critiques of ANOVA over the years

Mostly for normal data (GLMM class tomorrow).
Low interpretability beyond statistical significance or lack thereof [greater Sums of squares are not necessarily those with higher estimated underlying variance components].
Complicated specifications for unbalanced data.

4.8 A different take on ANOVA using multilevel modeling

“ANOVA is still very relevant” – see Gelman (2005).

“The essence of analysis of variance is in the structuring of the coefficients into batches—hence the notation \(\beta^{(m)}_j\) – going beyond the usual linear model formulation that has a single indexing of coefficients \(\beta_j\)”

Still useful as exploratory data analysis

A different approach

Use a hierarchical formulation in which each batch of regression coefficients is modeled as a sample from a normal distribution with mean 0 and its own variance \(\sigma^2_m\):

\[\beta^{(m)}_j \sim N(0, \sigma^2_m)\] for \(j = 1,\dots,J_m\) for each batch \(m = 1,\dots,M\), .

starting point for assessing the relative importance of the effects \(\beta\) in linear models

Note:

The standard deviation \(\sigma^2_m\) (describing the superpopulation) characterizes the uncertainty for predicting a new coefficient from batch \(m\).
For that reason, Gelman (2005) proposes using \(s^2_m\) (describing the finite population),

4.8.1 An applied example

The data below were generated by an experiment studying the effect of different feed additives on a pig’s health in two different timepoints. The data were generated under a CRD.

Classic ANOVA

Under the classic CRD, we could analyze the data using a classic two-way ANOVA. Now, other than saying that all treatments have an impact on the response, the ANOVA doesn’t really tell us much.

dat <- read.csv("../data/blood_study_pigs.csv") |> 
  mutate(Day =as.factor(Day))

m_classic <- lm(Serum_haptoglobin_mg.dL ~ Trt * Day , data = dat)
car::Anova(m_classic, type = 2)

## Anova Table (Type II tests)
## 
## Response: Serum_haptoglobin_mg.dL
##           Sum Sq  Df F value    Pr(>F)    
## Trt        81720   5  80.049 < 2.2e-16 ***
## Day        69495   1 340.369 < 2.2e-16 ***
## Trt:Day    24352   5  23.854 < 2.2e-16 ***
## Residuals  36751 180                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

A multilevel model-based ANOVA

We elaborate this approach to ANOVA under a Bayesian approach, in order to quantify the uncertainty to estimate the variance components.

# get Bayes-specific libraries
library(brms)
library(bayesplot)

# fit the Bayesian model
m_new <- brm(Serum_haptoglobin_mg.dL ~ (1|Trt) + (1|Day) + (1|Trt:Day),
             prior = 
               c(set_prior("student_t(3, 0, 50)", class = "sd", group = "Trt"),
             set_prior("student_t(3, 0, 50)", class = "sd", group = "Day"),
             set_prior("student_t(3, 0, 50)", class = "sd", group = "Trt:Day")),
             backend = "cmdstanr",
             iter = 2000,
             silent = 2,
             data = dat)

# check model diagnostics
bayesplot::mcmc_trace(m_new)

The code below gets the posteriors for the random effects and computes the \(s^2_m\).

# get s2 for Day
re_Day <- as_draws_df(m_new) |> 
  dplyr::select(contains("r_Day[")) |> 
  as.matrix()
s_Day <- numeric(nrow(re_Day))

for (i in 1:nrow(re_Day)){
  s_Day[i] <- sd(re_Day[i,])
}

# get s2 for Trt
re_Trt <- as_draws_df(m_new) |> 
  dplyr::select(contains("r_Trt[")) |> 
  as.matrix()
s_Trt <- numeric(nrow(re_Trt))

for (i in 1:nrow(re_Trt)){
  s_Trt[i] <- sd(re_Trt[i,])
}

# get s2 for Trt x Day 
re_TrtxDay <- as_draws_df(m_new) |> 
  dplyr::select(contains("r_Trt:Day")) |> 
  as.matrix()
s_TrtxDay <- numeric(nrow(re_TrtxDay))

for (i in 1:nrow(re_TrtxDay)){
  s_TrtxDay[i] <- sd(re_TrtxDay[i,])
}

# Summarize all s2 results into quantiles
finite_sds <- 
  data.frame(Trt = s_Trt, Day = s_Day, TrtxDay = s_TrtxDay) |>
    rownames_to_column("iter") |> 
    pivot_longer(cols = Trt:TrtxDay, names_to = "Source") |> 
    group_by(Source) |> 
    summarise(s_m_mean = mean(value), 
              q025 = quantile(value, probs = c(0.025)), 
              q25 = quantile(value, probs = c(0.25)), 
              q75 = quantile(value, probs = c(0.75)), 
              q975 = quantile(value, probs = c(0.975)))

ggplot(finite_sds, 
       aes(x = s_m_mean, y = factor(Source, levels = c("TrtxDay", "Trt", "Day")))) +
  geom_point(size = 4) +
  geom_segment(aes(x = q025, 
                   xend = q975, 
                   y = Source, yend = Source), linetype = "solid") +
  geom_segment(aes(x = q25, 
                   xend = q75, 
                   y = Source, yend = Source), 
               size = 1.5,
               linetype = "solid") +
  labs(
    title = "Hierarchical ANOVA Display -- finite population",
    x = "Estimated SD of Effects",
    y = "Source of Variation"
  ) +
  theme_minimal()

m_new_results <- as.matrix(m_new)
colnames(m_new_results) <-
  str_replace(colnames(m_new_results), "__Intercept", "")
colnames(m_new_results) <-
  str_replace(colnames(m_new_results), "sd_", "")

color_scheme_set("purple")
mcmc_intervals(m_new_results, 
           pars = c("Day",
                    "Trt", 
                    "Trt:Day",
                    "sigma"))+
  labs(
    title = "Hierarchical ANOVA Display -- superpopulation",
    x = "Estimated SD of effects", 
    y = "Source of variability")

Discuss interpretation of a classic ANOVA versus this approach (simultaneous vs. not)
Opportunities & limitations

4.9 Coming up tomorrow:

GLMMs
Homework