Day 5 Generalized linear mixed models

February 3rd, 2026

5.1 GLMMs

Learning objectives:

  • What are GLMMs
  • What is a distributional assumption
  • Working with GLMMs

5.1.1 Review – building a statistical model

1 – Select a probability distribution for the data \(\mathbf{y}|\mathbf{u}\).

2 – Select a link function \(g(\cdot)\), \(\boldsymbol\eta = g(\mathbf{y}|\mathbf{u})\).

3 – Define the (linear) predictor \(\boldsymbol\eta = \mathbf{X}\boldsymbol{\beta} + \mathbf{Z}\mathbf{u}\).

knitr::include_graphics("figures/distributions.png")
Probability distributions.

Figure 5.1: Probability distributions.

knitr::include_graphics("figures/stroup_distributions.jpg")
Properties of probability distributions.

Figure 5.2: Properties of probability distributions.

5.1.2 Generalities of GLMMs

The structure of GLMMs is very similar to the notation we were using for LMMs. We can generally describe them as:

\[y|\mathbf{u} \sim P(\mu, \; \phi),\] where:

  • \(y\) is the data,
  • \(\mathbf{u}\) are the random effects,
  • \(\mu\) is the mean,
  • \(\phi\) is the dispersion.
  • The linear predictor for \(\mu\) can be described as \(g(\boldsymbol{\mu}) = \boldsymbol{\eta} = \mathbf{X} \boldsymbol{\beta} + Zu}\), where \(g(\mu) = \eta\) is the link function applied to the expected value.

5.1.3 Implications for model fitting

  • Least Squares Estimator is no longer Maximum Likelihood Estimator
  • Variance is no longer \(\hat\sigma^2 = \frac{SSE}{df_e}\)
  • The whole concept of degrees of freedom is more diffuse
  • ANOVA shells are still useful to analyze designs are number of independent, true, replicates
  • The \(Var(\mathbf{y}|\mathbf{u})\) may be specified, but not specify the full likelihood (check out the properties above)
  • Quasi-likelihood for modeling overdispersion or repeated measures in GLMMs:
    • \(E(\mathbf{y} \vert \mathbf{b}) = \boldsymbol{\mu}\vert \mathbf{b}\)
    • \(Var(\mathbf{y} \vert \mathbf{b}) = \mathbf{V}_{\mu}^{1/2}\mathbf{A}\mathbf{V}_{\mu}^{1/2}\)

5.1.4 Model diagnostics

Number of Orobanche seeds tested/germinated for two genotypes and two treatments.

library(glmmTMB)
dat <- agridat::crowder.seeds

m_binom <- glmmTMB(cbind(germ, n-germ) ~ gen*extract + (1|plate),
                   REML = TRUE,
                   family = binomial, data = dat)
# If we followed conventional model checking, we couldn't find out about anything
dat$phat <- predict(m_binom, type = "response")
dat$resid <- dat$germ - (dat$phat * dat$n)

plot(dat$phat, dat$resid)

DHARMa::simulateResiduals(m_binom, plot = T)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.1374871 0.5518681 0.1637147 0.9230439 0.7149169 0.8244776 0.3351514 0.277819 0.879538 0.08237197 0.6173228 0.5894222 0.7512811 0.2211422 0.108248 0.6607457 0.05951533 0.5632577 0.4263507 0.897899 ...

5.2 Applied example – picking distributions

A clinical field trial comparing the performance of different vaccines on the growth and survival of Atlantic salmon under standard production conditions.

  • Treatment structure: two-way factorial (4 vaccines x 5 times)
  • Design structure: completely randomized design.
  • Experimental unit: fish
  • Response: weight.
url <- "https://raw.githubusercontent.com/stat870/fall2025/refs/heads/main/data/fish_vaccines.csv"
fish <- read.csv(url) 

# plot the data
fish %>% 
  ggplot(aes(day, wt))+
  geom_point(aes(fill = factor(vaccine), 
                 group = factor(vaccine)),
             position = position_dodge(width = 60),
             shape=21, size =3.5, alpha =.7)+
  labs(y = expression(Weight~(g~fish^{-1})), x = "Time (days)")+
  theme_classic()+
  labs(fill="Vaccine treatment")+
  theme(legend.position = "bottom")+
  scico::scale_fill_scico_d()

Considering that the variance increases with the mean, a fair model for fish weight could be:

\[y_{ijk}|u_{ijk} \sim P(\mu_{ijk}, \phi),\\ \log(\mu_{ijk}) = \eta_{ijk} = \eta_0 + V_i + T_j + VT_{ij} + u_{ijk},\] where:

  • \(y_{ijk}\) is the fish weight of the \(k\)th fish under the \(i\)th vaccine treatment at the \(j\)th timepoint, and (conditional on \(u_{ijk}\)) arises from a Gamma distribution with mean \(\mu_{ijk}\) and dispersion \(\phi\),
  • \(\eta_{ijk}\) is the linear predictor,
  • \(\eta_0\) is the overall mean mean of the linear predictor,
  • \(V_i\) is the effect of the \(i\)th vaccine treatment,
  • \(T_j\) is the effect of the \(j\)th timepoint,
  • \(VT_{ij}\) is the interaction between the \(i\)th vaccine treatment and the \(j\)th timepoint, and
  • \(u_{ijk}\) is the random effect for the fish weight of the \(k\)th fish under the \(i\)th vaccine treatment at the \(j\)th timepoint, that is accounting for the fact that repeated measures are not independent.
fish <- fish |> 
  mutate(across(c(vaccine, day), ~as.factor(.))) |> 
  drop_na()

m1_Normal <- glmmTMB(wt ~ day*vaccine, data = fish)

m1_Gamma <- glmmTMB(wt ~ day*vaccine, family = Gamma(link = "log"), data = fish)

m1_log <- lm(log(wt) ~ day*vaccine, data = fish)
fish$residual_N <- resid(m1_Normal, type = "response")
fish$yhat_N <- predict(m1_Normal, type = "response")

fish$residual_G <- resid(m1_Gamma, type = "response")
fish$yhat_G <- predict(m1_Gamma, type = "response")
simulate_N <- simulate(m1_Normal)
simulate_N$day <- fish$day

simulate_G <- simulate(m1_Gamma)
simulate_G$day <- fish$day

simulate_log <- simulate(m1_log)
simulate_log$day <- fish$day

fish |> 
  ggplot(aes(day, wt))+
  geom_point(aes(day, sim_1), shape = 21, data = simulate_N, size = 7)+
  geom_point(shape =21, fill ="gold", size = 3)+
  theme_pubr()+
  labs(title = "Observed weights in time (gold) and simulated weights based on the fitted Normal model (open)")

fish |> 
  ggplot(aes(day, wt))+
  geom_point(aes(day, sim_1), shape = 21, data = simulate_G, size = 7)+
  geom_point(shape =21, fill ="gold", size = 3)+
  theme_pubr()+
  labs(title = "Observed weights in time (gold) and simulated weights based on the fitted Gamma model (open)")

fish |> 
  ggplot(aes(day, wt))+
  geom_point(aes(day, exp(sim_1)), shape = 21, data = simulate_log, size = 7)+
  geom_point(shape =21, fill ="gold", size = 3)+
  theme_pubr()+
  labs(title = "Observed weights in time (gold) and simulated weights based on the fitted Gamma model (open)")

library(ggpubr)