Google Search is a search engine. Search engines allow users to quickly look through a collection of records, documents, websites, really any information stored in an electronic format . A search engine takes a question or term from a user and returns a list of things, ranked by relevance, that it thinks will either answer the user’s question or are somehow related to what they are searching for. Google Search is remarkable because it uses a cache of almost all the websites on the Internet as its collection that users can search through with queries.
The work involved in operating Google Search happens in three distinct stages. First, during the crawling stage, Google collects information about every website on the Internet. Then, during the indexing stage, it collates and processes that information into an index for use during the next stage. Finally, during the querying stage, Google allows users to search through that index and find the documents they are looking for. Understanding each stage is important for understanding how search engines work, so let’s step through them each in more details.
The crawling stage describes the time during which Google collects copies of all the websites it can find. Across multitudes of computers, Google runs web browsers that visit each website in turn. As each computer visits a webpage, it records various information about what content is displayed on the page, how fast the page loaded, and as much other data as possible. The crawler will visit a webpage more frequently the more often it changes. News websites like CNN and The New York Times can be crawled as often as several times per minute. Less popular websites are crawled much less frequently, perhaps once a month or year.
Crawling is extremely important for the operation of search engines and, jumping ahead some, regulating Google’s advantage when it comes to crawling is what this website focuses on. But don’t just take our word for it, listen to Sergey Brin talk about how important crawling is when it comes to operating a search engine.
Indexing is the stage at which Google takes all those websites that the crawlers hastily recorded during the crawling stage and organizes that information into the formats and databases needed for the querying process. Google has been collecting, storing, and indexing all of the information publicly available on the web for the past two decades and has the most complete and extensive archive of the entire internet that has ever existed.
Querying is the stage when Google Search pulls a list of relevant documents found in the index in response to a user query. Crawling and indexing are just the prep work needed to be able to answer queries instantly as they come in. When you “google something,” this is the stage that you are thinking of, and as the average user of Google Search this is the only stage of this process that you will interact with.
All of the above also describes the essential functions of other search engines like Bing, Yandex, Baidu, and Yahoo. That said, according to one estimate, Google Search has over 90% of the search engine market, with Bing being the closest competitor with 2%. So why is Google so dominant? In a competitive market, we would expect new entrants to follow Google’s lead and start nipping away at Google’s profits. Before Chrome and Android were introduced in the late 2000’s, there were very few anti-competitive barriers that Google could put in place to prevent other search engines from gaining market share. Indeed, Google Search was not always the dominant search engine on the market. Many engines existed before Google, and at one time the founders of Google attempted to sell the algorithm to AltaVista, a now defunct search engine dominant that was popular in the late 1990’s.
So what happened? Why can’t other search engines compete today?