| Search
Engine Overview
How
do search engines work?
The term "search engine"
is often used generically to describe both crawler-based
search engines and human-powered directories.
These two types of search engines gather their
listings in radically different ways.
Crawler-Based
Search Engines
Crawler-based
search engines, such as Google, create their listings
automatically. They "crawl" or "spider"
the web, then people search through what they
have found.
If
you change your web pages, crawler-based search
engines eventually find these changes, and that
can affect how you are listed. Page titles, body
copy and other elements all play a role.
Human-Powered
Directories
A
human-powered directory, such as the Open Directory,
depends on humans for its listings. You submit
a short description to the directory for your
entire site, or editors write one for sites they
review. A search looks for matches only in the
descriptions submitted.
Changing
your web pages has no effect on your listing.
Things that are useful for improving a listing
with a search engine have nothing to do with improving
a listing in a directory. The only exception is
that a good site, with good content, might be
more likely to get reviewed for free than a poor
site.
"Hybrid
Search Engines" Or Mixed Results
In
the web's early days, it used to be that a search
engine either presented crawler-based results
or human-powered listings. Today, it extremely
common for both types of results to be presented.
Usually, a hybrid search engine will favor one
type of listings over another. For example, MSN
Search is more likely to present human-powered
listings from LookSmart. However, it does also
present crawler-based results (as provided by
Inktomi), especially for more obscure queries.
Search
Engine Watch Members have access to in-depth information
and get extra benefits.
Learn more about becoming a Member.
The
Parts Of A Crawler-Based Search Engine
Crawler-based
search engines have three major elements. First
is the spider, also called the crawler. The spider
visits a web page, reads it, and then follows
links to other pages within the site. This is
what it means when someone refers to a site being
"spidered" or "crawled." The
spider returns to the site on a regular basis,
such as every month or two, to look for changes.
Everything
the spider finds goes into the second part of
the search engine, the index. The index, sometimes
called the catalog, is like a giant book containing
a copy of every web page that the spider finds.
If a web page changes, then this book is updated
with new information.
Sometimes
it can take a while for new pages or changes that
the spider finds to be added to the index. Thus,
a web page may have been "spidered"
but not yet "indexed." Until it is indexed
-- added to the index -- it is not available to
those searching with the search engine.
Search
engine software is the third part of a search
engine. This is the program that sifts through
the millions of pages recorded in the index to
find matches to a search and rank them in order
of what it believes is most relevant. You can
learn more about how search engine software ranks
web pages on the aptly-named How Search Engines
Rank Web Pages page.
Major Search Engines:
The Same, But Different
All
crawler-based search engines have the basic parts
described above, but there are differences in
how these parts are tuned. That is why the same
search on different search engines often produces
different results. Some of the significant differences
between the major crawler-based search engines
are summarized on the Search Engine Features Page.
Information on this page has been drawn from the
help pages of each search engine, along with knowledge
gained from articles, reviews, books, independent
research, tips from others and additional information
received directly from the various search engines.
We
will gladly give you a proposal and an estimated
time frame
free of charge so contact
us today! |