Introduction to Generalized linear mixed models?
Harnessing non linearity random effects
0.1 Questions after reading Bolker et al 2009
Difference between fixed and random effects
-
When to transform data?
- if you have a funky looking distribution of continuous data, is it always ok to transform to achieve normality if you don’t violate any test assumptions?
Walkthrough Figure 1 ?
-
Get rid of non-significant fixed effects?
- If important for my hypothesis, should I always keep them?
- What if I have a fairly small dataset?
How to choose a link function? Why not using the default?
Can we go through example in Box 1?
1 GLMM: What are they?
1.1 GaCha Life Minie Movie
Video game allowing you to dress-up anime style characters
1.2 Generalized linear mixed model
An extension to Generalized linear model and an extension to linear mixed model
GLMM expresses the transformed conditional expectation of the dependent variable y as a linear combination of the regression variables X
Model has 3 components
- a structural component or additive expression \(\beta_0 + \beta_1 X_1 + ... + \beta_k X_k\)
- a link function: \(g(\mu)\)
- a response distribution: Gaussian, Binomial, Bernouilli, Poisson, negative binomial, zero-inflated …, zero-truncated …, …
\[ g(\mu_i) = \beta_0 + \beta_1 X_1 + ... + \beta_k X_k \]
and
\[ \mu_i = E(y_i | x_i) = g(\mu_i)^{-1} \]
1.3 How do you fit them?
In R:
-
glmer()
fromlme4
📦 same aslmer()
but with afamily
argument -
glmmPQL()
fromMASS
📦 (based onlme()
) -
glmmADMB()
from -glmmADMB
📦 works well and flexible be beware -
glmmTMB()
fromglmmTMB
📦 works well and flexible be beware -
asreml()
fromglmmTMB
📦 great but not-free -
MCMCglmm()
fromMCMCglmm
📦 great but Bayesian - Choose you bayesian flavor 📦:
-
stan
:brms
,rethinking
,rstan
, … -
BUGS
:runjags
,rjags
, …
-
1.4 Model assumptions
Easy answer none or really few
More advanced answer I am not sure, it is complicated
Just check residuals I as usual
- Technically only 3 assumption:
- Variance is a function of the mean specific to the distribution used
- observations are independent
- linear relation on the latent scale
Generalized Linear Models do not care if the residual errors are normally distributed as long as the specified mean-variance relationship is satisfied by the data
1.5 Choosing a link function
A link function should map the stuctural component from \((-\infty,\infty)\) to the distribution interval (e.g. (0,1) for binomial)
So number of link function possible is extremley large.
Choice of link function heavily influenced by field tradiditon
For binomial models
- logit assume modelling probability of an observation to be one
- probit assume binary outcome from a hidden gaussian variable (i.e. threshold model)
- logit & probit are really similar, both are symmetric but probit tapers faster. logit coefficient easier to interpret directly
- cologlog not-symmetrical
1.6 Estimating repeatability ?
Latent scale
Business as usual ?
Observed scale ??????
- Using
rptR
📦 is the easiest orQGGlmm
📦 (see associated citation for reference and explanations)
1.7 Marginalized vs Conditioned estimates
Difference between marginalized and conditioned coefficients?
GLMMadaptive 📦 is the only way I know to do easily get marginalized coefficients
1.8 Practical
Walkthrough Example box 1