« Little brother - Surveillance 2.0 | Main | Bad manners of WordPress »

September 8, 2008

Anscombes quartet

Talking to some colleagues recently on the matter of correlations, I had a welcomed opportunity to discuss a good point of the English statistician Frank Anscombe: The Anscombe Quartet from 1973.

The anscombe quartet is four 2D data sets with same mean, variance, correlation and optimal least squares regression line:

Mean(x) = 9
Variance(x) = 10
Mean(y) = 7.5
Variance(y) = 3.75
Regression line: y = 0.5 x + 3

But with quite different appearance when inspected visually

anscombe.png

This is demonstrating two simple but important points:

  • Visual data inspection is important
  • There is more to life than first and second moments

Well I guess many people would say "I knew that", but many simple and yet important things are often forgotten.