Correlation does not imply causation

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by GurchBot (talk | contribs) at 12:09, 18 July 2006 (Spelling: recieved). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Jump to navigation Jump to search

You must add a |reason= parameter to this Cleanup template – replace it with {{Cleanup|July 2006|reason=<Fill reason here>}}, or remove the Cleanup template.

Correlation implies causation, also known as cum hoc ergo propter hoc (Latin for "with this, therefore because of this") and false cause, is a logical fallacy by which two events that occur together are prematurely claimed to be cause and effect.

General pattern

Generally, if A is observed to be correlated with B, this widespead logical fallacy makes a premature conclusion from the correlation that (1) A causes B, or (2) B causes A, even if an observed correlation in itself doesn't give any information about this. Two more possibilities are that (3) Both (1) and (2) are true, and (4) None of (1) and (2) are true, that is that there is no causal link between the two despite the correlation (see Spurious relationship). Which of these four is true is never given solely by the result of a correlation test, but requires further investigation (of course unless the causation or lack of such is totally obvious). However, it may be claimed before such investigation that any of (1)-(3) are made more probable (though sometimes negligibly so) by an observed correlation.

Examples

For example:

People who use cannabis (A) have a higher risk of developing schizophrenia (B).

This correlation is sometimes used as a proof that use of cannabis causes schizophrenia ((1), A causes B). This may of course very well be correct. However, it can also be claimed that (disposition of) schizophrenia causes use of cannabis ((2), B causes A). One may also argue that both of these are correct, ((3), A causes B and B causes A). Also, the correlation may somehow be coincidental (4). Which of these is correct can be decided through for example biochemical mechanisms, common sense or other investigations, but a causation arrow doesn't follow directly from the correlation. Concluding that any of (1)-(4) is correct from the observed correlation only is thus a logical fallacy. It may, however, be claimed that any of the causal relationships (1)-(3) have increased probability solely because of the correlation.

Another real-life example:

The health of English with higher income is better than that of English with low income.

From this correlation many people would instantly conclude that higher income gives better health, but this is again a logical fallacy (even if it may be a correct conclusion anyhow). People of better health could be better physically and psychologically disposed to gain demanding higher-income jobs, so the causal arrow may go the other way. Or both ways, or it can be coincidental. Again, the correlation itself doesn't give any information about the causality between the correlated variables.

For one event to be the cause of another it must happen first. In some cases the precipitating event may happen so quickly before the result, or may overlap the result in time, so they are said to occur simultaneously. However, the precipitating event can't happen after the result, for example, by concluding that a current increase in population caused a baby boom many years ago.

Another example:

Ice-cream sales are strongly (and robustly) correlated with crime rates.
Therefore, higher ice-cream sales cause crime.

The above argument commits the cum hoc ergo propter hoc fallacy, because it prematurely concludes ice cream sales cause crime. A more plausible explanation is that high temperatures increase ice-cream sales but also increase crime rates -- perhaps by making people irritable or restless, or by increasing the number of people outside at night.

A recent scientific example:

Young children who sleep with the light on are much more likely to develop myopia in later life.

This result of a study at University of Pennsylvania Medical Center was published in the May 13, 1999, issue of Nature and received much coverage at the time in the popular press [1]. However a later study at Ohio State University did not find any link between infants sleeping with the light on and developing myopia but did find a strong link between parental myopia and the development of child myopia and also noted that myopic parents were more likely to leave a light on in their children's bedroom [2].

Determining causation

An important consideration is the presence or absence of a known mechanism which may explain how one event causes the other. Using the above example, if ice-cream had been found to contain a chemical substance that made people more aggressive, the causality would seem more plausible. A counter-example would be astrology, where there is no convincing known mechanism to describe why personality would be affected by the position of the stars. Of course, the absence of a known mechanism doesn't preclude the possibility of an unknown mechanism.

Another possibility in correlated factors is that the direction of the causation may be wrong as stated. For example:

Every time a high profile game is released, console sales go up.
Therefore, high profile games are timed to coincide with spikes in console sales.

In the above example, it may be that the actual pattern is that the spike in console sales are caused by the high profile game being released. See wrong direction.

The statement "correlation does not imply causation" notes that it is wrong to deduce causation solely from a statistical correlation. If you only have A and B, a correlation between them does not let you infer A causes B, or vice versa, much less 'deduce' the connection. But if there was a common cause, and you had that data as well, then often you can establish what the correct structure is. Likewise (and perhaps more usefully) if you have a common effect of two independent causes.

But while often ignored, the advice is also overstated, as if to say there is no way to infer causal structure from statistical data. Clearly, we should not prematurely conclude something like ice-cream causes criminal tendencies. We expect the correlation to point us towards the real causal structure. Again, the tendency is to conclude robust correlations imply some sort of causation, whether common cause or something more complicated involving multiple factors. Hans Reichenbach suggested the Principle of the Common Cause, which asserts basically that robust correlations have causal explanations, and if there is no causal path from A to B (or vice versa), then there must be a common cause, though possibly a remote one.

Reichenbach's principle is closely tied to the Causal Markov condition used in Bayesian networks. The theory underlying Bayesian networks sets out conditions under which you can infer causal structure, when you have not only correlations, but also partial correlations. In that case, certain nice things happen. For example, once you consider the temperature, the correlation between ice-cream sales and crime rates vanishes, which is consistent with a common-cause (but not diagnostic of that alone).

In statistics literature this issue is often discussed under the headings of spurious correlation and Simpson's paradox.

David Hume argued that any form of causality cannot be perceived (and therefore cannot be known or proven), and instead we can only perceive correlation. However, we can use the scientific method to rule out false causes.

Humorous examples

An entertaining demonstration of this fallacy once appeared in an episode of The Simpsons (Season 7, "Much Apu About Nothing"). The city had just spent millions of dollars creating a highly sophisticated "Bear Patrol" in response to the sighting of a single bear the week before.

Homer: Not a bear in sight. The "Bear Patrol" is working like a charm!
Lisa: That's specious reasoning, Dad.
Homer: [uncomprehendingly] Thanks, honey.
Lisa: By your logic, I could claim that this rock keeps tigers away.
Homer: Hmm. How does it work?
Lisa: It doesn't work. (pause) It's just a stupid rock!
Homer: Uh-huh.
Lisa: But I don't see any tigers around, do you?
Homer: (pause) Lisa, I want to buy your rock.

Another example is the Witch hunting scene from Monty Python and the Holy Grail:

Sir Bedevere: Tell me, what do you do with witches?
Mr. Newt: Burn them!
Sir Bedevere: And what do you burn apart from witches?
Peasant #1: More witches! [Peasant gets slapped]
Peasant #2: Wood!
Sir Bedevere: So, why do witches burn?
Peasant #3: .......... 'Cause they're made of... wood?
Sir Bedevere: Good! So how do we tell whether she is made of wood?
Peasant #1: Build a bridge out of her!
Sir Bedevere: Ahh, but can you not also make bridges out of stone?
Peasant #1: Oh ya.
Sir Bedevere: Tell me, Does wood sink in water?
Peasant #1: No, no, it floats. Throw her into the pond!
Sir Bedevere: No, no. What also floats in water?
Peasants yell various answers: (Bread!) (Apples!) (Very small rocks!) (Cider!) (Gravy!) (Cherries!) (Mud!) (Churches!) (Lead! Lead!)
King Arthur: A duck!
Sir Bedevere: Exactly! So, logically.....
Peasant: If she weighs the same as a duck, she's made of wood.
Sir Bedevere: And therefore?
Peasant: A Witch!

A further often-quoted example is the (unverified) claim of a strong causative correlation between teachers' pay and the volume of whiskey sales.

See also