Google takes more than 60 percent of the world market among Internet search engines. More than 50 million search queries are daily and over 8 billion web pages are indexed daily.
The system was developed in 1998 by Sergey Brin and Larry Page, graduates of Stanford University, who used PageRank technology to rank documents. In this technology, one of the key points is to determine the “authority” of a particular document based on information about the documents that refer to it. Along with this, Google used to determine the relevance of a document not only the text of the document itself, but also the text of links to it. This technology has allowed him to provide quite relevant results compared to other search engines. Quite quickly, Google began to lead in various surveys on such an indicator as user satisfaction with search results.
Google searches documents in over 35 languages. Nowadays, many portals and specialized sites provide Google-based information search services on the Internet, which makes the task of successfully positioning sites on Google even more important. Google re-indexes its search database approximately every four weeks. During this process (unofficially called Google dance), the database is updated based on information collected by robots and the PageRank values of documents are recalculated. There is also a certain number of documents with a fairly high PageRank value, information about which in the search base is updated daily.
There are three main processes that allow us to get the expected result: crawling, indexing and search engine results.
Crawling is the process by which Googlebot discovers new and updated pages to add to the Google index. This process uses a large number of computers to efficiently scan the content of many web pages. The program that is responsible for performing the crawling is called a Google robot (or another common name, spider). The spider’s algorithm has the following points: programs determine which sites to crawl, how often to do it, and how many pages should be selected on each site. Google primarily starts with a list of web page URLs that was generated during previous crawls, augmented by data from sitemaps.
Sitemap (sitemap) are XML files with information for search engines, which is aimed at helping search engines to determine the location of the pages of the site, when they were last updated, the frequency of updates and the importance in relation to other pages on the site.
The Googlebot search robot finds links on each page and supplements the list of pages to be crawled, marks new and broken links. Robots send data about web pages to servers.
Indexing is the processing of a page in order to compile a complete index of the found words, as well as mark in which part of the page they are located. Moreover, the robot processes data from the main tags and attributes. However, some multimedia files and dynamically generated pages cannot be processed.
The search engine results are a response to a custom query. When a user enters a query, Google performs a search in the database that matches the terms and algorithmically determines the relevance of the content. After entering a search query, the system finds the corresponding pages in the index and the user receives the most relevant results.
Relevance refers to the degree to which search results meet user expectations. By this degree of issue, one can judge the effectiveness of the search engine. Relevance is determined by various factors, of which there are more than 200. For example, it can be the factor of incoming links from other pages. Each link to a page from another site increases the PageRank of the resource. To ensure that your site is ranked adequately on search engine results pages, it is important that the Googlebot can crawl and index your site correctly. It should be noted that for an objective assessment of the significance of a web resource, PageRank takes into account more than 108 variables and 109 terms.
Moreover, over 200 criteria are applied for processing requests. Google makes about 500 improvements to its search algorithm every year.
by Mason Flores