« Paradise of RFID tags | Main | Open Source track at JLMR »

November 18, 2007

Why the number 23 is not special

number23.jpgIn the movie "The Number 23", the main character Walter Sparrow is going crazy thinking about the number 23. Seemingly, the number is everywhere and has a special link to his life. If Mr Sparrow was trained in machine learning or statistics, he would have recognized that he was over-interpreting due to either overfitting or data set selection. Here is how:

Overfitting
Overfitting is to adjust the model to the individual data points of a data set instead of the hidden dynamic. As an example in the line of historical conspiracy, one can select a data set of three well-known tyrans acting in the 20th century:

  • Pol Pot
  • Hitler
  • Stalin

Is there a mystical link? Oh yes, indeed:

1) Simply count the number of letters in each name: The number of letters is 6 for each of them, together forming 666; the biblical number of the beast.

2) Less obvious is number relations: Take the sum of digits from their birth dates and subtract the smallest non-prime number not present, if the digit ”0” is not in the birth date. The result of all of them is 32 – in the movie a sort of ”inverse” of 23.

The above is of course non-sense. The birth dates have su m of digits 32, 32, and 36, and making them end up on same number takes some effort. In fact, I had to carefully sinpect all three dates to look for something I could subtract from 36 (Stalin) but not from 32 (Pol Pot and Hitler). And it is this careful tweeking which is the mistake: If one can choose the system freely given three dates, any three dates can give any number in the end – thus no special coincident or conspiracy.

Data set selection
binoc.jpgThe above happens all the time, but he case of data set selection is in fact more common. Data set selection is having a (more or less) fixed model, but choosing the data points of the data set to fit the model. In everyday life this is what happens when people get an idea about something and then notice whenever reality fits with the idea. The problem is that they do not pay the same amount of attention to situations where the idea does not apply.

In the case of death and tyrans, one can instead fix the model such that date plus month equals 23 and then search the world history for exciting events. Then one finds e.g.

  • December 11, 1994 Russian President Boris Yeltsin orders Russian forces into Chechnya
  • November 12, 1927 Trotsky expelled from Soviet CP; Stalin becomes undisputed dictator
  • October 13, 1864 Battle at Darbytown Road Virginia (337 casualties)
  • September 14, 1812 Napoleon enters Moscow
  • August 15, 1961 East German authorities begin building the Berlin Wall
  • July 16, 1918 In Yekaterinburg, Russia, Czar Nicholas II and his family are executed by the Bolsheviks
  • June 17, 1992 Slaughtering by Inkhata-followers at Boipatong, South Africa, kills 42
  • May 18, 1940 German troops conquer Brussels
  • April 19, 1941 Bulgarian troops invade Macedonia
  • March 20, 1995 Nerve gas attach on Tokyo subway
  • February 21, 1916 Battle of Verdun in WW I begins (1 million casualties)
  • January 22, 1941 1st mass killing of Jews in Romania
Thus, we can always find something, which maches a system and if the data set (examples) is selected to match the pattern of choice, the apparence of coincidence is misleading. In fact given a date of year, there exists websites listing events on this particular day.

Summary
In summary, it is equally deceptive to adjust the model/pattern too freely or to select data points which maches the pattern. Both roads lead to wrong conclusions. One needs to balance the data set and the model in just the right way and statistics and machine learning have two different but equally good ways to do that.

overfittingandml.png