The data
Marine benthic data from nine inter-tidal areas along the Dutch coast collected by the Dutch institute RIKZ in the summer of 2002. In each inter-tidal area (denoted by ‘beach’), five samples were taken, and the macro-fauna and abiotic variables were measured.
Sample : sample number
Richness : species richness
Exposure : index composed of the surf zone, slope, grain size, and depth of anaerobic layer
NAP : height of sampling station compared to mean tidal level
Beach : beach identifier
Sample Richness Exposure NAP Beach
1 1 11 10 0.045 1
2 2 10 10 -1.036 1
3 3 13 10 -1.336 1
4 4 11 10 0.616 1
5 5 10 10 -0.684 1
6 6 8 8 1.190 2
Data structure
Traditional linear model
\[Richness_{i} = a + b_1\ NAP_{i} + b_2\ Exposure_{i} + \epsilon_{i}\]
What are the problems?
Pseudo-replication * Exposure varies only across beaches not within * Sites within a beach are not independent
2 processes at 2 different levels
Within a beach effect: NAP
Between beaches effect: Exposure
Within a beach
\[Richness_{i1} = a_1 + b_{1_1}\ NAP_{i1} + \epsilon_{i1}\] \[Richness_{i2} = a_2 + b_{1_2}\ NAP_{i2} + \epsilon_{i2}\]
\[Richness_{i9} = a_9 + b_{1_9}\ NAP_{i9} + \epsilon_{i9}\]
9 equations and 27 parameters Not great
Overall, \[Richness_{ij} = a_j + b_{1_j}\ NAP_{ij} + \epsilon_{ij}\]
Between beaches
Cannot simply do \(Richness \sim Exposure\)
In \[
Richness_{ij} = a_j + b_{1_j}\ NAP_{ij} + \epsilon_{ij}
\]
What captures the effect of Exposure
\[
a_{j} = a + b_{exp}\ exposure_{j} + \epsilon_{j}
\]
Within and between beaches effects
within effects -> 27 parameters
Between effects -> 3 parameters
Total 30 parameters
😱 😭
(with 3 being stats-on stats)
Estimating both simultaneously
\[
Richness_{ij} = (a + a_j) + b_1 NAP_{ij} + b_2 Exposure_j + \epsilon_{ij}
\]
a is the general intercept
aj are the deviation from \(a\) for each beach
b1 and b2 are the intercept and slope for the average beach
\[
Richness_{ij} = \underbrace{a + b_1{NAP_{ij}} + b_2{Exposure_j}}_\text{fixed} + \underbrace{a_j + \epsilon_{ij}}_\text{random}
\]
\[a_j \sim N(0, {\sigma_{a}}^2)\] \[\epsilon_{ij} \sim N(0, {\sigma_{\epsilon}}^2)\]
😃 5 parameters estimated 😃
assuming same NAP effect on each beach
assuming same residual variance for each beach
Fixed and random effects
Fixed
a, b1 and b2 estimated with a given error
Random
aj not estimated directly but assumed to come from a Gaussian distribution with a mena of 0 and an estimated Variance
\[
a_j \sim N(0, {\sigma^2_{a_j}})
\]
Fixed part
\[y_{ij} = \underbrace{\color{red}{a + b_1{NAP_{ij}}} + b_2{Exposure_j}}_\text{fixed} + \underbrace{a_j + \epsilon_{ij}}_\text{random}\]
Random part
\[
Richness_{ij} = \underbrace{\color{red}{a + b_1{NAP_{ij}}} + b_2{Exposure_j}}_\text{fixed} + \color{purple}{\underbrace{a_j + \epsilon_{ij}}_\text{random}}
\]
R model output
Linear mixed model fit by REML ['lmerMod']
Formula: Richness ~ NAP + scale(Exposure, scale = F) + (1 | Beach)
REML criterion at convergence: 225.5
Scaled residuals:
Min 1Q Median 3Q Max
-1.3462 -0.5025 -0.2306 0.2150 4.2746
Random effects:
Groups Name Variance Std.Dev.
Beach (Intercept) 0.3378 0.5812
Residual 9.3735 3.0616
Number of obs: 45, groups: Beach, 9
Fixed effects:
Estimate Std. Error t value
(Intercept) 6.6256 0.5221 12.691
NAP -2.6941 0.4700 -5.732
scale(Exposure, scale = F) -3.0005 0.5417 -5.539
Correlation of Fixed Effects:
(Intr) NAP
NAP -0.313
scl(Ex,s=F) 0.015 -0.047
Estimated equation
\[
Richness_{ij} = \underbrace{6.63 -2.69{NAP_{ij}} -3{Exposure_j}}_\text{fixed} + \underbrace{a_j + \epsilon_{ij}}_\text{random}
\]
\[
a_j \sim N(0, 0.34)
\]
\[
\epsilon_{ij} \sim N(0, 9.37)
\]
BLUPs (\(a_j\) ) are shrunk
Estimates for each groups are constrained (shrunk) too avoid extreme values.
Modelling the variance
We estimated the variance explained by the model
\({\sigma_{\hat{y}}}^2\) .
We decomposed teh residual variance in 2 :
\({\sigma_a}^2\) : across beach variance
\({\sigma_{\epsilon}}^2\) Variation not associated to beaches (i.e. within-beach variance)
Estimating repeatability
\[
Richness_{ij} = \underbrace{a + b_1{NAP_{ij}} + b_2{Exposure_j}}_\text{fixed} + \underbrace{a_j + \epsilon_{ij}}_\text{random}
\]
\[
a_j \sim N(0, {\sigma_a}^2)
\]
\[\epsilon \sim N(0, {\sigma_{\epsilon}}^2)\]
\[ r = \frac{{\sigma_a}^2}{{\sigma_a}^2 + {\sigma_{\epsilon}}^2}\]
With our model
\[
Richness_{ij} = \underbrace{37.3 -2.69{NAP_{ij}} -3{Exposure_j}}_\text{fixed} + \underbrace{a_j + \epsilon_{ij}}_\text{random}
\]
\[a_j \sim N(0, 0.34)\]
\[\epsilon_{ij} \sim N(0, 9.37)\]
\[
r^2 = \frac{0.34}{0.34 + 9.37}
= \frac{0.34}{9.71}
= 0.03
\]
Repeatability is conditioned on fixed effects
Differences among groups can be explained by fixed effects and thus influence \({\sigma_a}^2\)
We can estimate repeatability with 2 models for example:
With exposure \[
Richness_{ij} = \underbrace{a + b_1{NAP_{ij}} + b_2{Exposure_j}}_\text{fixed} + \underbrace{a_j + \epsilon_{ij}}_\text{random}
\]
\[
Richness_{ij} = \underbrace{a + b_1{NAP_{ij}}}_\text{fixed} + \underbrace{a_j + \epsilon_{ij}}_\text{random}
\]
Repeatability With exposure effects
\[
Richness_{ij} = \underbrace{37.3 -2.69{NAP_{ij}} -3{Exposure_j}}_\text{fixed} + \underbrace{a_j + \epsilon_{ij}}_\text{random}
\]
\[
a_j \sim N(0, 0.338)
\]
\[
\epsilon_{ij} \sim N(0, 9.374)
\]
\[
r\ (with\ exposure) = \frac{0.338}{0.338 + 9.37} = 0.03
\]
Repeatability without exposure
\[
Richness_{ij} = \underbrace{6.58 -2.57{NAP_{ij}}}_\text{fixed} + \underbrace{a_j + \epsilon_{ij}}_\text{random}
\]
\[
a_j \sim N(0, 8.668)
\]
\[
\epsilon_{ij} \sim N(0, 9.362)
\]
\[
r\ (without\ exposure) = \frac{8.668}{8.668 + 9.36} = 0.48
\]