How Search Engines Work: Learn How Your Site Appears in the Search Results
Search Engines have changed quite a bit since the birth of the web, but how do they work? How does your website magically show up on the search results page for a given query? In the rest of this post, I will describe the process a Search Engine goes through, from visiting your website to deciding which queries will display your listing on the results page.
Visiting and Crawling a Website
The Search Engine process starts by a crawler visiting a website and downloading the content. A crawler is a program written with the task of downloading content from website pages on a schedule basis. The list of websites and pages for a crawler to visit come from many sources, such as:
- Submitting a website or page to a Search Engine via a form at their website
- The discovery of new pages or sites by the crawler when downloading a page
- Submission or download of a sitemap
When a crawler visits a website, there are many ways to control its behavior so that content you do or don't want to show in the search results has the desired effect. The crawler behavior can be modified using the following techniques.
- robots.txt - The robots.txt file goes in the website root and contains rules for a crawler to follow when visiting your website. You can allow or deny a crawler access to specific directories or files based on the user agent. Wikipedia has an excellent article on the details of the robots.txt file and links to additional resources.
- Meta Tags - Meta Tags can be used to control the behavior of a Search Engine crawler on a page by page basis. In most cases, using the robots.txt file is all that is needed to control what the crawler sees on your website.
- Nofollow Attribute - This attribute allows the webmaster to control which links will pass credit on to the pages they link to and which ones won't. This is useful for links added in forums or blog comments.
The Search Engine Index
Each Search Engine has their own algorithm used to sort and rank the crawled web pages. Most Search Engines have many similarities in the way they sort the pages. The built index determines what pages will be shown to the user for a particular search query.
The algorithms used to build an index look for things such as how many times a word or phrase shows up on a page, the font size of a word or phrase, the amount of content on the page, formatting, user experience, meta tags, and the number of links to a page. There are many tutorials and guides available on the Internet on best practices when developing web sites and pages to maximize the user experience and the ability of a Search Engine to crawl the website pages.
Searching and the Display of Search Results
A search is performed using a keyword or group of words to form a phrase. Most Search Engines include more advanced search functions, such as using Boolean operators and keywords meant to perform a specific function.
Once the search query is entered and submitted, the Search Engine consults its index to return a number of pages that best fix the query according to their algorithm. Search results can vary from search to search and even more so in different geographic regions. The search results page is constantly changing and it is rare for a website or page to remain at the same ranking for a long period of time.


Post comments