• Skip to main content

Victor Font Consulting Group, LLC

Digital Business Strategists

Call Us:

+1 919-604-5828

  • Home
  • Care Plans
    • Care Articles
    • Optional Subscriptions
  • Consultations
  • Products
    • Code Snippets
    • Public GitHub Repositories
    • Gist Snippets
    • Pastebin Snippets (Free)
    • Free Plugins
  • FAQs
  • Support
    • Graphic Design
  • Contact
    • Speakers
    • Portfolio
  • Resources
    • Free WordPress Video Training
    • Tutorials
    • Articles
    • Cybersecurity
    • EU Referral Network
You are here: Home / Computers and Internet / How Search Engines Work

How Search Engines Work

By Victor M. Font Jr.
July 6, 2014Leave a Comment

Note: How Search Engines Work is the first of a series of planned articles on search engine optimization that will coincide with the release of my next book: Google Ranking Signals: An Official Guide to the World's Most Popular Search engine.

Search engines have two major functions—crawling the World Wide Web and building indexes. They provide answers to a user’s query by algorithmically calculating relevancy and serving results.

Imagine for a second that the World Wide Web is a vast public transportation network. Each stop along a route is a web page, image, PDF or other type of document or site. The tracks or routes that connect these stops are hyperlinks. Hyperlinks are a foundational concept of the Internet and the most essential element on the Web. A hyperlink points to a whole document or to a specific element within a document.

When people browse a web document, they follow hyperlinks. Specially designed applications or programs may also follow hyperlinks automatically. A program that traverses each hyperlink and gathers all the retrieved information is known as a Web spider, bot or crawler. Search engines deploy these crawlers to index the billions of interconnected web documents.

According to moz.com’s Beginner’s Guide to SEO (Search Engine Optimization):

Search engines are answer machines. When a person looks for something online, it requires the search engines to scour their corpus of billions of documents and do two things – first, return only those results that are relevant or useful to the searcher’s query, and second, rank those results in order of perceived usefulness. It is both “relevance” and “importance” that the process of SEO is meant to influence.

To a search engine, relevance means more than simply finding a page with the right words. In the early days of the web, search engines didn’t go much further than this simplistic step, and their results suffered as a consequence. Thus, through evolution, smart engineers at the engines devised better ways to find valuable results that searchers would appreciate and enjoy. Today, 100s of factors influence relevance...

In a 2006 article on the Google Librarian Center, Matt Cutts said this about crawling and indexing:

A lot of things have to happen before you see a web page containing your Google search results. Our first step is to crawl and index the billions of pages of the World Wide Web. This job is performed by Googlebot, our “spider,” which connects to web servers around the world to fetch documents. The crawling program doesn’t really roam the web; it instead asks a web server to return a specified web page, then scans that web page for hyperlinks, which provide new documents that are fetched the same way. Our spider gives each retrieved page a number so it can refer to the pages it fetched.

Our crawl produces an enormous set of documents, but these documents aren’t searchable yet. Without an index, if you wanted to find a term like civil war, our servers would have to read the complete text of every document every time you searched.

So the next step is to build an index. To do this, we “invert” the crawl data; instead of having to scan for each word in every document, we juggle our data in order to list every document that contains a certain word. For example, the word “civil” might occur in documents 3, 8, 22, 56, 68, and 92, while the word “war” might occur in documents 2, 8, 15, 22, 68, and 77. Once we’ve built our index, we’re ready to rank documents and determine how relevant they are. Suppose someone comes to Google and types in civil war. In order to present and score the results, we need to do two things:

  1. Find the set of pages that contain the user’s query somewhere
  2. Rank the matching pages in order of relevance

Matt Cutts has been the head of the Google Webspam team for over 15 years and one of Google’s primary public facing figures. While Cutts can’t reveal how Google uses its 200+ signals in determining search quality and rank because spammers would take advantage of that information, he regularly advises that certain behaviors, such as guest blogging, might be punished by Google. Cutts recently announced that he is taking a long break from his job, to spend more time with his wife and family.

  • 8shares
  • Facebook0
  • Twitter0
  • Pinterest0
  • LinkedIn4
  • Print
  • SMS0

About Victor M. Font Jr.

Victor M. Font Jr. is an award winning author, entrepreneur, and Senior IT Executive. A Founding Board Member of the North Carolina Executive Roundtable, he has served on the Board of Advisors, of the North Carolina Technology Association, the International Institute of Business Analysis, Association of Information Technology Professionals, Toastmasters International, and the North Carolina Commission for Mental Health, Developmental Disabilities, and Substance Abuse Services. He is author of several books including The Ultimate Guide to the SDLC and Winning With WordPress Basics, and Cybersecurity.

Reader Interactions

VictorFont.com runs on the Genesis Framework

Genesis FrameworkThe Genesis Framework empowers you to quickly and easily build incredible websites with WordPress. Genesis provides the secure and search-engine-optimized foundation that takes WordPress to places you never thought it could go.

Check out the incredible features and the selection of designs. It's that simple—start using Genesis now!

Click here to download The Genesis Guide for Absolute Beginners (PDF - 1.4 MB)

Leave a Reply Cancel reply

Your email address and website will not be published. Required fields are marked *
Posting a comment means that you agree with and accept our Comment & Product Review Policy

Call: +1 919-604-5828

Send us an E-mail

Accessibility Statement | Affiliate Marketing Disclosure | Capability Statement

Cookie Policy | Comment & Product Review Policy | Privacy Policy | Site Map | Terms & Conditions

Copyright © 2003–2022 Victor M. Font Jr.

Return to top of page
Posting....
We only use analytical cookies on our website that allow us to recognize and count the number of visitors, but they do not identify you individually. They help us to improve the way our website works. By clicking Accept you, agree to cookies being used in accordance with our Cookie Policy.OkNoCookie policy