Cookies help us deliver our services. By using our services, you agree to our use of cookies. Learn more

Understanding the difference between correlation and causation - of shark attacks and ice cream sales


Candela Iglesias Chiesa & Joanne Fielding

The graph below shows that there are more shark attacks when more ice cream is sold, so to stop the attacks, let's stop eating ice cream.

Sounds preposterous? It is. It is also a very useful example of when correlation is not causation.

With so much research and information being spread about #COVID19, we at @GHA thought it would be important to come back to the difference between #correlation and #causation. Incorrectly interpreting a correlation as a causal relationship is a common source of confusion and data misinterpretations.

As in the shark and ice-cream example, humans naturally tend to mistake correlation as causation. That is, we tend to think that when two variables (for example ice-cream sales and shark attacks) change in relationship to each other (e.g. shark attacks increase when ice-cream sales increase), it is because one is causing the other. (ice-cream eating is somehow causing the shark attacks).

Correlation is about how strongly a pair of values are related and how they change together over time (e.g. when one increases, the other also increases, or vice versa). But correlation doesn't tell you anything about the WHY or HOW of the relationship. It just expresses that a relationship exists. Iit could even be due to pure chance, and in many cases it is. (If you want to see some funny spurious (e.g. due to chance) correlations, check out this website.)

Causation takes an extra step in analysing the relationship and says that any change of one value will cause a change in the value of the other (for example, a higher number of bathers results in increased shark attacks). This means one value directly makes the other happen.

To prove a causal relationship, we need very well designed studies (such as randomized control trials or RCTs), and need to check for the Bradford-Hill criteria (for example, is it plausible that one variable causes the other, is there a biological gradient, are the results reproducible, etc).

In the shark and ice-cream sales example, we are seeing a correlation, not a causal relationship (e.g. increase in ice-cream sales is associated with, but does not cause increased shark attacks). It is possible that both increase at the same time because of a third variable, namely, increased number of bathers on the beaches due to summer weather.

So next time you see an article about COVID19 out there and some drug or herb claiming to protect against the disease, pause to think about whether there is enough data to prove causality.


2020-05-12 08:01:56