Imagine you are creating your own phone book. You heard there was money to be made in the phone book industry. Your cousin started one up a few years ago, and now your family keeps saying you should do the same in your town. So you hire some people to walk through the central business district and collect information about the businesses there. You instruct the people you’ve hired to go into the businesses and ask for the business’ contact information as well as the types of goods and services that the business provides.
The businesses are happy to give your agents this information, reckoning that once they are listed in the phone book, more customers are likely to find them. You take all this information your agents have collected and assemble it into a directory with categories for various types of businesses. You print and distribute the phone book for free and allow businesses to sell ads on certain pages of the phone book.
The phone book is a big success. You have a prospering business and are generally well respected around town as someone who found an opportunity for mutual benefit within the community. The businesses listed in your directory have flourished, with their listings drawing more customers their way. Once a quarter or so, you send your agents around to collect the information needed to compile the next phone book and solicit new advertisements. Life is good.
Then you make a mistake. You buy a car nice enough to give away how profitable the phone book business really is. Suddenly, in a frenzy of free market competition, hundreds of new would-be phone book providers show up in your town. The next time you send your people out to collect information about local businesses, they can barely get in the door, because there are so many agents from other phone book providers trying to get that information.
Businesses are recoiling. The secretaries, accustomed to sharing this data freely but infrequently, are overwhelmed by the crowds of competing operators in their lobbies demanding the same information. Phone calls to the businesses aren’t getting answered. The business community sees overall operations starting to suffer as secretaries become devoted to giving out information to phone book operators. Some businesses hire more secretaries, but they are quickly overwhelmed. Phone book operators continue to multiply and contend with one another.
Finally, a wise business owner puts their foot down and hires a security guard that they place in front of the lobby. The security guard is instructed to let in all customers but to only a limited number of phone book agents. You call the business owner, remind them of all you did for them before the bad times hit, and try to make sure you stay on the nice list. The owner assures you that you were already at the top.
Other business owners see how well the guard works, and everyone starts hiring their own. You call up the owners of each and make sure your people are allowed into the lobbies to talk to the secretaries. Life continues on and business is good. Fin.
So, let’s make the subtext text and unpack this metaphor.
- You are Google.
- The businesses are website operators.
- The secretaries are web servers.
- The phone book operators are competing search engines.
- The phone book agents are the computers doing the crawling.
- The security guards are IP block lists and firewalls.
What this story shows us is how the aggregate effect of being crawled will come to force website operators to limit the amount of crawling they can allow to happen. Website operators strive to balance the cost of having their websites crawled by weighing how much benefit they will derive by paying the cost of supplying their information to each crawler in question. If there is a group of entities that each give out valuable information for free about themselves despite the cost of supplying that information, then eventually that group of entities will end up blocking all but a few collectors of that information, and in doing so create a natural limitation on the collection and possession of that information.
A complication might arise with this metaphor if all the collectors of the information were able to send enough traffic to bring in revenue for the businesses, but so far, at least with search engines, that has not been the case. We will note that in our example a natural oligopoly is formed, because multiple search engines could theoretically be providing enough information to justify the cost of being crawled for each of them. In practice however, we have observed that this effect becomes more like a natural monopoly because website operators often give Google much more access than they give other search engines like Bing, in no small part because Google sends so much more traffic than every other search engine. If you are a new search engine just starting out though, you won’t even get as much access as Bing does and it’s very difficult to even get to Bing’s level of access, let alone approach Google’s.
This chain of events feels almost inevitable – the information they’re giving out freely is valuable, too many people want it, and the consequences that follow in those circumstances are predictable. As far as we know, a slowed down, drawn out version of the above story has been playing out for the past 25 years.