For most of us, Google is the Internet. Being the starting point for finding new web pages, it is probably the most important invention since the internet itself. As of now, it's not an exaggeration to say that Search Engines have become an integrated part of our lives. We use it for shopping, leisure and business, and also as a learning tool. If it wouldn't have been invented, new content would not have been reachable, resulting in a very dull internet experience.
How does a Search Engine work? The engine has three basic functions: crawling (searching for new content), indexing (tracking and storing the content found) and retrieval (delivering relevant content when a user queries the search engine). Here's a more detailed look at how these functions work:
The acquisition of data about a website is where it all begins. This step involves scanning the website and collecting details from each page: images, headings, titles, keywords and other linked pages. This is possible by making use of an automated bot, called a "spider", which visits page after page with remarkable speed, using page links to discover new addresses. In Google's early days, a spider could read several hundred pages per second, but nowadays, that number reached to three zeros, and it`s growing each day.
Therefore, when a spider visits a page, it collects every link, adding them to a long list of next places to visit. From there, it goes to the next page, collecting its links, and repeats the whole process. Also, spiders revisit past links, checking to see if any changes occurred.
This means that every site which is linked to an already indexed site will eventually be crawled. Some sites are visited more often, but if their page hierarchy (which is set up by the site's architecture) is too complex, a spider might give up on searching on that website.
This function is supposed to process and store in a database all the data resulted from a crawl. As an example, imagine a book collection. Crawling means reading each and every word in a book, while indexing is when you store the book in a library.
In fact, the resemblance is very close to reality. Google's data centers are huge rooms, filled with servers, which help the search engine to store all of its data.
Retrieval and Ranking
Retrieval is when a search engine processes the query, making use of complex algorithms, and returns the most relevant results. This step is what makes the difference between different search engines. This is the reason why the search results vary between Bing and Google.
Next, ranking algorithms are trying to determine the best possible results, checking your query against billions of pages, in order to determine which one is more relevant than others. Companies prefer to keep these ranking algorithms secret.
This is mostly due to the fact that they do not want every web page owner to cheat and climb to the top of the search results while having irrelevant content. Search engine exploitation was a common thing in the early days of Google, when the ranking algorithms were much simpler.
At first, search engines ranked sites based on how often a keyword appeared on a page. This led to the practice known as "keyword stuffing", which meant filling pages with keyword heavy nonsense. Then came the importance of links: search engines ranked sites based on how many incoming links were present, thinking the more connections were made, the more relevant were the results, based on the popularity of a site. This lead to link spamming all over the internet, wasting users' precious time.
Today, the secrets of ranking algorithms are deeply shrouded in mystery. Most search engines deliver results based on user experiences and good quality content.
What's the next step?
This answer comes in the form of understanding perfectly what a page is about, or in other words, semantics. Google SemanticExperiences is a new type of artificial intelligence, which promises to decode any language. Soon, we may be able to talk with the search engine as we would with a person, in order to find the most relevant answer to our query.
Until then, we should get used to a search algorithm that's becoming smarter with each passing day. Already with the addition of suggestions, more and more people are able to access the most relevant results, which translates into a better experience.