
(goo'gul') n. 1. a number 1 followed by
100 zeros 2. the worlds best search
engine.
Google's core technology conducts more than 200 million
searches per day.
They have built the web's largest index of web pages (more
than 3 billion).
Google is powered by the world's largest commercial Linux
cluster (more than 10,000 servers).
The importance of Google cannot be overstated. Given the
integration among the major portals and search engines,
Google performs a substantial portion of all searches conducted
on the Internet. A fair estimation of Google's contribution
to a properly optimized site is in the 85% range (i.e. 85%
of a site's traffic arising from an Internet search will
originate from Google).
Google runs on a unique combination of advanced hardware
and software. The heart of the software is PageRank (PR);
the means of obtaining PageRank is accomplished by a number
of different spiders/crawls (e.g. algorithms). It is worth
noting that PageRank is not the sole determiner to actual
rank. There are many instances of sites with lower PR scores
ranking above sites with higher PR. Meaning 'more' is involved
in the final rank determination. It is however, a significant
contributor and as the algorithm becomes more sophisticated,
will become increasingly more important.
Google's PageRank (PR) represents a citation graph developed
from maps of a site's hyperlinks, page title and citations.
It is a 'Citation Importance Ranking' system, defining a
citation as an inbound link (backlink), resulting in an
approximation of a page's importance and quality.
The actual PageRank Model:
 |
number of outgoing links on |
 |
PageRank of
|
 |
PageRank of
|
A simplified look of the above:
We assume page A has pages p1...pn which point to it (i.e.,
they are the citations, backlinks, or inbound links). The
parameter d is a damping factor which can be set between
0 and 1 (believed to be set at 0.85). The PageRank of a
page A is given as follows:
PR of 'A' = (1-d) + d x [ PR(p1) / OL(p1) + PR(p2) / OL(p2)
+ ... PR(pn) / OL(pn) ]
again,
represents a specific page
is the PageRank of the given page 'n'
is the number of outbound links on page 'n'
is a constant (called the damping factor) which is approximately 0.85
Therefore,
PR of 'A' is (1 - 0.85) + (0.85) x (the summation of [the
linking pages PR divided by the number of outbound links
on the linking page] for all the pages linking to page 'A'
Thus, we can conclude the following:
1. PageRank is increased by the number of pages that link
to it.
2. The higher the PageRank of a linking page the better.
3. The number of outbound links contained on the linking
page has a diluting effect on its contribution.
Further contemplation on the equation that calculates PageRank
results in the link structure strategy discussed previously.
Google does two types of crawls, a main crawl
and a fresh crawl. The main crawl is done
once a month and the fresh crawl is done constantly. Google
is consistently altering these crawls, resulting in variations
on which sites are crawled and how deep the selected sites
are crawled (how many pages are spidered).
The monthly main crawl is a more comprehensive crawl then
the fresh crawl. It visits all pages within the index and
will crawl deeper. In essence, the main crawl is setting
the stage for the big 'dance'.
The fresh crawl re-spiders pages that are already indexed
and will also spider any new pages it finds along the way.
New pages that are evaluated can be inserted into search
results immediately, however the rank is not stable (the
pages will actually 'disappear'). Do not let this short-term
'fresh gift' fool you, the pages have not yet been included
into the main index. This observation can also be viewed
with pages that are already in Google's main index but upon
'fresh' evaluation are re-ranked reflecting any new page
changes.
The volatility of a 'fresh' rank will continue to occur
until the page is properly added to the index. This occurs
when the site has been through a main crawl and at least
one 'dance'. The Google 'dance' generally occurs once a
month (although at the time of this writing, the frequency
as been extended to every 6 weeks). During the dance, pages
are evaluated using PageRank. As explained above, PageRank
of a particular page is dependent upon all the inbound links
and the corresponding PageRanks.
Therefore, in order to calculate all the inbound links
to a site, all pages in the index have to be evaluated first.
Think about this… How can you evaluate a part of a whole
when the evaluation is dependent upon the evaluation of
the whole, which is influenced, by the evaluation of the
part?
Logically, this is not possible.
Therefore, a page's PageRank is always based on incomplete
data (by definition of the PageRank algorithm, a PR score
is only an approximation). Increased accuracy of Google's
approximation results from running many iterations of the
evaluation over the entire index (40 - 50 cycles). It is
during this time that one will see specific pages jump up
and down in rank (i.e. 'dance').
Completion of the dance (several days) concludes with new
PR scores and updated backlinks (inbound) for all pages
within the index, the so-called 'update'. |