December 16, 2007

Millions and Millions of Search Results... Ignored

When you put a common term into a search engine, you're likely to get hundreds of thousands, if not millions, of search results, whether you use Google, Yahoo!, or any other leading engine.

In fact, to some, the vast number of search results is used to see which search engine crawler has done the most thorough job at indexing the Web - and it's assumed that with the most results has the superior algorithm.

But did you know that regardless of how many results there might actually be for a query, both Google and Yahoo! will only let you see the first 1,000?

Sorry, can't go past page 100. You've reached the end.

This artificial limit is excused as saying the limit has been put in place to reduce software and hardware resources, and that 1,000 results is good enough for most people. So you'll never, ever, get past page 100 on Google or Yahoo!, even if a search for "Google" on either engine shows more than 1 billion results.

But it's retrieving that data somewhere, right? If Google has a mountain of results available for a term, and only delivers the top 1,000, then some database somewhere knows what are the results for positions 1,001 to 9,999 and beyond, to the tens of millions. Yet users have no recourse if they want to peer into that index. There's no option to "Show all results" or "Display the top 10,000 results". Google and Yahoo! have arbitrarily decided that 1,000 is good enough for you, and that's that.

Do you feel lucky? Some have said Google overwhelmingly optimizes for the first results, and as the company writes, "We try to make your search experience so efficient that it's not necessary to scroll past the first ten listings."

But isn't it likely that there are projects out there where it would be helpful to analyze the top 5,000 results? Or 20,000? If you were an SEO firm, there are obvious benefits to this, or if you're doing any kind of artificial intelligence research, Google would be one of the best data pools out there.

So why are they doing this? It looks like even Google, who is assumed to have one of the most redundant, robust systems known to man, is trying to save money and resources. They write, in an explanation, "It would heavily tax our system to provide these results for everyone."

While that's understood, then what data is propping up Yahoo! or Google's claims that they have the most thorough results? Could the last step from Google's algorithm state (multiply results x 2), solely to have the biggest number available? After all, if you could only see the first 1,000, why not report you got eleventy trillion? There's got to be a way to get to the rest of the data.

(Also see: Search Engine Roundtable, Instant Fundas and FirstStop WebSearch for more...)


  1. It might benefit *you* to see the rest of the data, but what benefit is there to Google?

    If you haven't clicked on an ad, looking at the first 100 results, you're not going to click on an ad. I suspect the only reason for letting you have 1,000 results is to keep you a loyal Googler.

    If you need extra access to data, I'm sure Google will sell it to you. Just as they will sell you your own Google appliance.

    If you've got a good enough story, Google might even *give* it to you; they seem to be charitable folk.

    But I can't see any reason for Google to *give* you 2,000 or 10,000 or 1,000,000 results.

  2. There are quite a few tricks to get more than 1,000 results. Not in a single search though...

    1. You can use multiple search engines. Surprisingly, for most queries the overlap is less than 25%. You can check our test-drive of the major four search engines (Google, Yahoo!, MSN, Ask)

    2. If you slightly modify your query, the search results will also be slightly different:

    Use synonyms. Then remove duplicates and you'll get more than 1,000 results;

    Compose more specific queries. You'll get the same 1,000 results, but those will be more relevant 1,000 results. I.e. add more relevant words, or exclude some irrelevant words with "-". Then merge search results and remove duplicates.

    This trick with multiple derived queries with different excluded words will produce the most relevant list of more than 1,000 merged search results for you original query.

    Of course, you should consider some sort of automation when you merge long lists of search results and remove duplicates.

    Be creative

  3. Louis does that mean that google is like a human brain is bound by the processing power? That is, some researchers cite that the human brain recieves millions of bits (these may not be bits as computer bit heads think in terms of), that is pieces of information, senses and so forth, yet is only able to respond or address a much smaller number at any one point in time.

    So that corelates to the performance aspect of the brain, then there is the retention side and brain food for thought, here's a couple of reads, take them for what they are:

    And a mention of a report I saw on CNN (FWIW) the other day about measuring brain waves to read voters minds, maybe a new google applicaiton potential...