Why make it simple when you could go Bayesian
BIO 8940
2025-01-08
Bayes’ theorem is the results in a conditional probability of two events:
\[ P[A|B] = \frac{P[A\ \&\ B]}{P[B]} = \frac{P[B|A] P[A]}{P[B]} \]
The conditional probability of A given B is the conditional probability ob B given A scaled by the relative probability of A compared to B.
\[ \underbrace{P[A|B]}_\text{posterior probability} = \frac{\overbrace{P[B|A]}^\text{likelihood} \cdot \overbrace{P[A]}^\text{prior probability}}{\underbrace{P[B]}_\text{marginal probability}} \]
Bayes’ theorem can be seen as the conditonal probability of a hypothesis given the data: \[ P[H_0 | \text{data}] = \frac{\overbrace{P[\text{data}|H_0]}^\text{likelihood} \cdot \overbrace{P[H_0]}^\text{prior}}{P[data]} \]
Or it can be seen as \[ \underbrace{P[H_0 | \text{data}]}_{\text{what we want to know}} = \frac{\overbrace{P[\text{data}|H_0]}^\text{what frequentist do} \cdot \overbrace{P[H_0]}^\text{what we have a hard time understanding}}{\underbrace{P[data]}_\text{what we happily ignore}} \]
Here we are counting number of observations in each case
Reality | Reject H0 | Accept H0 | Total |
---|---|---|---|
H0 is true | a (Type I error) | b | a + b |
H0 is false | c | d (type II error) | c + d |
Total | a+c | b+d | N (number of obs) |
False positive \(P[H_0 \text{ true} | \text{Reject }H_0] = \frac{a}{a+c}\)
False negative: \(P[H_0\ false | Accept\ H_0] = \frac{d}{b+d}\)
Same thing but with probabilities instead of number of observations
Reality | Reject H0 \([\not H_0]\) | Accept H0 \([H_0]\) | Total |
---|---|---|---|
H0 true [H0+] | \(P[H_0^+| \not H_0] P[H_0^+]\) | \(P[H_0^+| H_0] P[H_0^+]\) | \(P[H_0^+]\) |
H0 false [H0-] | \(P[H_0^-| \not H_0] P[H_0^-]\) | \(P[H_0^-| H_0] P[H_0^-]\) | \(P[H_0^-]\) |
Total | \(P[\not H_0]\) | \(P[H_0]\) | 1 |
False positive \(P[H_0 \text{ true} | \text{Reject }H_0]\) \[ \begin{align} P[H_0^+ | \not H_0] &=\frac{P[\not H_0 | H_0^+ ] P[H_0^+ ]}{P[\not H_0]} \\ &= \frac{P[\not H_0 | H_0^+ ] P[H_0^+]} {P[\not H_0 | H_0^+] P[H_0^+ ] + P[ \not H_0 | H_0^- ] P[H_0^-]} \end{align} \]
Why does it matter? If 1% of a population have covid, for a screening test with 80% sensitivity (1- Type II) and 95% specificity (1-Type I).
Assuming N test = 100,
Reality | Test +ve | Test -ve | Total |
---|---|---|---|
Healthy | |||
Has COVID | |||
Total | 100 |
Why does it matter? If 1% of a population have covid, for a screening test with 80% sensitivity (1- Type II) and 95% specificity (1-Type I).
Assuming N test = 100
Reality | Test +ve | Test -ve | Total |
---|---|---|---|
Healthy | 99 | ||
Has COVID | 1 | ||
Total | 100 |
Why does it matter? If 1% of a population have covid, for a screening test with 80% sensitivity (1- Type II) and 95% specificity (1-Type I).
Assuming N test = 100
Reality | Test +ve | Test -ve | Total |
---|---|---|---|
Healthy | 99 | ||
Has COVID | 0.8 | 0.2 | 1 |
Total | 100 |
Why does it matter? If 1% of a population have covid, for a screening test with 80% sensitivity (1- Type II) and 95% specificity (1-Type I).
Assuming N test = 100
Reality | Test +ve | Test -ve | Total |
---|---|---|---|
Healthy | 4.95 | 94.05 | 99 |
Has COVID | 0.8 | 0.2 | 1 |
Total | 100 |
Why does it matter? If 1% of a population have covid, for a screening test with 80% sensitivity (1- Type II) and 95% specificity (1-Type I).
Assuming N test = 100
Reality | Test +ve | Test -ve | Total |
---|---|---|---|
Healthy | 4.95 | 94.05 | 99 |
Has COVID | 0.8 | 0.2 | 1 |
Total | 5.75 | 94.25 | 100 |
Why does it matter? If 1% of a population have covid, for a screening test with 80% sensitivity (1- Type II) and 95% specificity (1-Type I).
Reality | Test +ve | Test -ve | Total |
---|---|---|---|
Healthy | 4.95 | 94.05 | 99 |
Has COVID | 0.8 | 0.2 | 1 |
Total | 5.75 | 94.25 | 100 |
Why does it matter? If 20% of a population have covid instead of 1%?
Reality | Test +ve | Test -ve | Total |
---|---|---|---|
Healthy | 4 | 76 | 80 |
Has COVID | 16 | 4 | 20 |
Total | 20 | 80 | 100 |
Mixing up P[ A | B ] with P[ B | A ] is the Prosecutor’s Fallacy
small P evidence given innocence \(\neq\) small P of innocence given evidence
True Story
Bayes’ Theorem is a rule about the ‘language’ of probabilities, that can be used in any analysis describing random variables, i.e. any data analysis.
Q. So why all the fuss?
A. Bayesian inference uses more than just Bayes’ Theorem
Bayesian inference uses the ‘language’ of probability to describe what is known about parameters.
Warning
Frequentist inference, e.g. using p-values & confidence intervals, does not quantify what is known about parameters. many people initially think it does; an important job for instructors of intro Stat/Biostat courses is convincing those people that they are wrong
A shooting cartoon
Adapted from Gonick & Smith, The Cartoon Guide to Statistics
To define anything frequentist, you have to imagine repeated experiments.
Let’s do some more ‘target practice’, for frequentist testing
How does Bayesian inference differ? Let’s take aim…
We use:
To get a posterior distribution, denoted \(P_{post}(β|Y)\): stating what we know about β combining the prior with the data – ?
Bayes Theorem used for inference tells us:
\[ \begin{align} P_{post}(β|Y) &∝ f(Y|β) × P_{prior}(β)\\ \text{Posterior} &∝ \text{Likelihood} × \text{Prior} \end{align} \]
… and that’s it! (essentially!)
A Bayesian is one who, vaguely expecting a horse, and catching a glimpse of a donkey, strongly believes he has seen a mule
Priors come from all data external to the current study (i.e. everything else) ‘Boiling down’ what subject-matter experts know/think is known as eliciting a prior. It’s not easy but here are some simple tips
Use stickers or a survey in the hallway
Use stickers (Johnson et al 2010, J Clin Epi) for survival when taking warfarin
Normalize marks (Latthe et al 2005, J Obs Gync) for pain effect of LUNA vs placebo
If the experts disagree? Try it both ways
If the posteriors differ, what you believe based on the data depends on your prior knowledge
To convince other people, expect to have to convince skeptics – and note that convincing [rational] skeptics is what science is all about
When the data provide a lot more information than the prior
Priors here are dominated by the likelihood, and they give very similar posteriors – i.e. everyone agrees. (Phew!)
Using very flat priors to represent ignorance
Flat priors do NOT actually represent ignorance!
Likelihood gives the classic 95% confidence interval can be good approx of Bayesian 95% Highest Posterior Density interval
With large samples (and some regularity conditions)
(sane) frequentist confidence intervals and (sane) Bayesian credible intervals are essentially identical
it’s actually okay to give Bayesian interpretations to 95% CIs, i.e. to say we have \(\neq\) 95% posterior belief that the true β lies within that range
Prior strongly supporting small effects, and with data from an imprecise study
Almost any analysis
Bayesian arguments are often seen in
Hierarchical modeling (Some expert calls the classic frequentist version a “statistical no-man’s land”)
Complex models: for messy data, measurement error, multiple sources of data fitting them is possible under Bayesian approaches, but perhaps still not easy
I barely scratched the surface
Is useful in many settings, and you should know about it
Is often not very different in practice from frequentist statistics. It is often helpful to think about analyses from both Bayesian and non-Bayesian points of view
Is not reserved for hard-core mathematicians, or computer scientists, or philosophers. If you find it helpful, use it.
BIO 8940