NEWS FROM THE LAB - Tuesday, April 24, 2007

No answer for the question of the day Posted by Mikko @ 17:49 GMT

We got quite a few answers to our question of the day but no conclusive answer.

The mystery is why Google gives such contradictory information when you search for the keyword "13123390".

Google says there are only five hits, but it's displaying the first ten of them?


And there are five more pages of this… so obviously there are more than five hits.


We did get lots of good guesses on what might be going on, including:

"The string in question, 13123390, is the same in decoded and encoded form. When search engines and web-indexing apps run across this text, it knocks things out of whack due to the identical nature of the decoded/encoded string."

"Results that are 'similar' were removed from the list...Why Didn't this Happen Immediately: Theory: In order to NOT process a complete list with a large set of results, Google performs "look aheads" to analyze the data. This look ahead is performed based on the page you are on. This "look ahead" only analyzes a couple pages immediately proceeding the initial page. Since you usually find what you are looking for in the first few pages, this means that Google doesn't have to perform a massive operation to eliminate duplicate/similar results."

"The distributed google index keeps track of many things, one which is the probabilistic frequency of search terms and words (or numbers) in their index. The search results page uses these figures to give hunch estimates on the search result relevancy, while the actual results are gathered from the full index. Hence, for some terms the figures don't seem to match. Seemingly irrational numbers are good for demonstrating it. Personalized results and/or link spamming prevention algorithms may
play their part in this as well. And of course, for some things, censorship."

"I'm going to take a wild guess and say that 4 is the average of 1, 3, 1, 2, 3, 3, 9, and 0."

Anyone else?