Correlation
- Berkson’s paradox
Biased samples can generate correlations that are not present in the population (in this case when the observed subpopulation depends on the observed variables for which a correlation is supposed). If celebrities are always attractive, talented, or both but never neither attractive nor talented than these two traits might be correlated in the subpopulation of celebrities even though this correlation is not present in the general population.
- Simpson’s paradox
Be careful when observing a correlation across multiple groups. The correlation within and between groups might be completely diametrical Example: Low birth-weigth paradox
Hypothesis Testing and Statistical Significance
- Absence of evidence does not equal evidence of absence
A non-significant statistical test is not to be treated as support for the null-hypothesis Conclusions can only be made under additional consideration of a priori statistical power
- Low-powered statistical tests produce unrealistically high effect sizes when reaching significance
When only looking at the subset of significant results in a distribution of effect sizes, low-powered tests lead to effects sizes in this subset to be unrealistically bloated This is especially problematic when presupposing that significant results are more likely to be published For further reading, see this blog post by Andrew Gelman
- The Union-Interception-Principle of multivariate statistical testing
A non-significant multivariate test implies that no test in a set of corresponding univariate tests with appropriately adjusted alpha levels would reach statistical significance The inversion of this conclusion is not true (a significant multivariate test does not imply significance in any test in a set of corresponding univariate tests with adjusted alpha) Rather, a significant multivariate test only implies significance for at least one linear combination of corresponding univariate statistical tests, which must not always coincide with an “actual” univariate test
Regression
- Interpreting a beta predictor in multiple regression
A positive change in one unit of X changes the predicted outcome variable to the value of beta given all other predictors remain constant This conditional interpretation is important since beta can have a completely different value in a simple regression with just one predictor As meaningful interpretations seem hard to grasp considering this conditional interpretation, I regard it an additional argument for favouring parsimonious models when carrying out research
- Interpreting a regression intercept
The value of the intercept represents the prediction of the outcome variable given all predictors have the value 0 This is generalizable across groups with differing fitted intercepts, for example in linear mixed models
Statistical Thinking and Decision-Making
- Base rate fallacy
This blog post on Andrew Gelman’s blog nicely illustrates how not taking into the account the base rate of events/classes/… results in wrong probability estimates. This is especially eye-opening when base rates are very unevenly distributed, for example when testing for a rare disease. Depending on the diagnostic utility of a test, positive results might much more often be false-positives than intuitively assumed. Related: Prosecutor’s fallacy
- Gambler’s fallacy
After observing red 20 times in a row, the gambler put all his money on black. As each event remained statistically independent, the roulette landed on red again.
Any questions, comments or corrections? Do not hesitate to get in touch with me via Twitter or send me an e-mail.