Day 8 Statistical inference

February 6th, 2026

8.1 What is statistical inference?

  • “Use math to understand the world”
  • From raw data to assumptions to understanding (?)
  • Define target population
Mind map in a research project.

Figure 8.1: Mind map in a research project.

8.2 The components that go into interpreting results (in the context of LMMs)

8.2.1 Estimation

  • Uncertainty in estimation – where is it coming from?
  • Contrasts
  • Multiple comparisons

Let’s revisit the Nitrogen example, where we describe yield \(y_{ij}\) (\(i\)th observation in \(j\)th location) as

\[y_{ij} \sim N(\mu_{ij}, \sigma^2), \\ \mu_{ij} = \beta_{0j} + \beta_1 x_{ij} + \beta_2 x_{ij}^2. \]

library(tidyverse)
library(lme4)
library(ggpubr)
library(mosaic)
url <- "https://raw.githubusercontent.com/stat799/spring2026/refs/heads/main/data/N_fert.csv"
data_N <- read.csv(url) |> mutate(sy = as.factor(sy))

data_N |> 
  ggplot(aes(Total_N, Yield_SY))+
  geom_point(size=2.5)+
  labs(x = expression(N[fert]~(kg~ha^{-1})), 
       y = expression(Yield~(kg~ha^{-1})))+
  theme_pubr()

m1 <- lm(Yield_SY ~ Total_N + I(Total_N^2) + sy, data_N)
DHARMa::simulateResiduals(m1, plot = T)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.168 0.54 0.004 0.904 0.764 0.428 0.02 0.18 0.596 0.868 0.744 0.744 0.352 0.748 0.116 0.676 0.08 0.256 0.46 0.768 ...

There’s multiple aspects to consider when we want to interpret the data.

What was the main objective of the study?

Option A: Study the effect of N on yield

  • Consider the mean \(f(x) = \mu_{ij} = \beta_{0j} + \beta_1 x_{ij} + \beta_2 x_{ij}^2\).
  • The slope is then \(f'(x) = \beta_1 + 2 \beta_2 x_{ij}\)
  • The slope depends on the level of \(N_{fert}\) we are considering!
  • Likewise, consider \(se(\hat{\beta}_1 + 2 \hat{\beta}_2 x_{ij})\).
  • How can we estimate the variance of a combination of parameters?
  • The delta method sets the asymptotic variance of a function \(g(\cdot)\) of the parameters (\(g(\hat\beta)\)): \(Var(g(\hat\beta)) \approx \nabla g(\hat\beta)^T Cov(\hat\beta) \nabla g(\hat\beta),\) where \(\nabla g(\hat\beta)\) is the gradient (vector of first derivatives) of function \(g(\cdot)\) with respect to the parameters.
  • In this case \(se(\hat{\beta}_1 + 2 \hat{\beta}_2 x_{ij}) = \sqrt(\widehat{Var}(\hat\beta_1)+ 4 x_{ij}^2\widehat{Var}(\hat\beta_2) + 4x_{ij}\widehat{Cov}(\hat\beta_2))\)
(covariance <- vcov(m1))
##                (Intercept)       Total_N  I(Total_N^2)           sy2           sy3           sy4           sy5
## (Intercept)   53465.012101 -4.143965e+02  1.148989e+00 -2.762121e+04 -2.762121e+04 -2.787241e+04 -2.762121e+04
## Total_N        -414.396458  8.777726e+00 -2.908876e-02  5.914528e-14  5.353022e-14 -7.348210e+00  3.743372e-14
## I(Total_N^2)      1.148989 -2.908876e-02  1.044085e-04 -2.632058e-16 -2.310362e-16  4.571248e-02 -1.754706e-16
## sy2          -27621.209876  5.914528e-14 -2.632058e-16  5.524242e+04  2.762121e+04  2.762121e+04  2.762121e+04
## sy3          -27621.209876  5.353022e-14 -2.310362e-16  2.762121e+04  5.524242e+04  2.762121e+04  2.762121e+04
## sy4          -27872.409930 -7.348210e+00  4.571248e-02  2.762121e+04  2.762121e+04  5.714695e+04  2.762121e+04
## sy5          -27621.209876  3.743372e-14 -1.754706e-16  2.762121e+04  2.762121e+04  2.762121e+04  5.524242e+04
msm::deltamethod(g = ~ x2 + (2*x3*0), mean = coef(m1), cov = covariance)
## [1] 2.962723
msm::deltamethod(g = ~ x2 + (2*x3*100), mean = coef(m1), cov = covariance)
## [1] 1.148286
msm::deltamethod(g = ~ x2 + (2*x3*200), mean = coef(m1), cov = covariance)
## [1] 1.487304

We can also do this with the emmeans package:

emtrends(m1, ~ Total_N, "Total_N", at = list(Total_N = c(0, 100, 200)))
##  Total_N Total_N.trend   SE  df lower.CL upper.CL
##        0         61.99 2.96 151    56.14     67.8
##      100         35.88 1.15 151    33.62     38.1
##      200          9.77 1.49 151     6.83     12.7
## 
## Results are averaged over the levels of: sy 
## Confidence level used: 0.95

Option B: Study the effect of N on yield

  • If we set \(f'(x) =0\) and solve for \(x\), we can get the optimum rate.
  • \(ONR = -\frac{\beta_1}{2\beta_2}\)
  • Consider the uncertainty behind \(-\frac{\beta_1}{2\beta_2}\).
(onr_hat <- -coef(m1)[2]/(2*coef(m1)[3]))
##  Total_N 
## 237.5568

Same thing, we can get SE using the delta method:

(onr_se_hat <- msm::deltamethod(g = ~ -x2/(2*x3), mean = coef(m1), cov = covariance))
## [1] 8.307668

8.2.2 Prediction

  • Uncertainty in predictions
    • Conditional distribution
    • Marginal distribution

Recall

\[y_{ij} \sim N(\mu_{ij}, \sigma^2), \\ \mu_{ij} = \beta_{0j} + \beta_1 x_{ij} + \beta_2 x_{ij}^2. \]

  • Where is the uncertainty coming from?

  • How can the uncertainty change depending on how \(\beta_{0j}\) is specified?

  • Model evaluation criteria for prediction-oriented models

8.3 On the use of R2 for statistical inference

Why do so many people use the coefficient of variation R2?

  • Metrics like RMSE are hard to compare across models (different units)
  • R2, however, is dimensionless and much easier to interpret
  • “The proportion of variability that is explained by the model”

R2 can be computed as

\[R^2 = \frac{SSM}{SST} = \frac{SS_{MODEL}}{SS_{INT-SLOPE}} = 1 - \frac{SSE}{SST},\]

where \(R^2\) is the coefficient of variation, \(SSM\) is the sum of squares of the model, \(SST\) is the total sum of squares (i.e., the residual sum of squares of an intercept-and-slope model). In essence, \(R^2\) is the ratio between the \(SSE\) of whatever model and the \(SSE\) of an intercept-and-slope model.

Uncertainty in R2

summary(m1)$r.squared
## [1] 0.8966592

8.3.1 Bootstrapped R2

boot_r2 <- do(1000) * {
  m_boot <- lm(Yield_SY ~ Total_N + I(Total_N^2), 
               data = resample(data_N, size = nrow(data_N), replace = TRUE))
  
  summary(m_boot)$r.squared
}

hist(boot_r2$result)

8.3.2 Out-of-sample R2

R2_oos <- numeric()

for (i in unique(data_N$sy)) {
  # leave-one-sy-out 
  d_is <- data_N |> filter(sy != i)
  d_oos <- data_N |> filter(sy == i)
  
  m <- lm(Yield_SY ~ Total_N + I(Total_N^2), data = d_is)
  
  SST_oos <- sum((resid(lm(Yield_SY ~ 1, data =d_oos)))^2)
  SSE_oos <- sum((d_oos$Yield_SY - predict(m, newdata = d_oos))^2) 
  
  R2_oos[i] <- 1 - SSE_oos/SST_oos
  
}

R2_oos
##         1         4         2         3         5 
## 0.7341245 0.8376546 0.3494678 0.3219911 0.9103108
mean(R2_oos)
## [1] 0.6307098
sd(R2_oos)
## [1] 0.2766323
  • Compare the R2

8.4 Next week

  • Monday 2.30 pm
  • HW1 due WEDNESDAY
  • Thursday 2.30pm