PageRank was once the heart of search – and made Google the empire it is today.
Even if you think search has evolved from PageRank, there’s no denying that it’s been a ubiquitous concept in the industry for a long time.
Every SEO professional should have a good understanding of what PageRank was – and what it still is today.
In this article we cover the following:
- What is PageRank?
- The history of the development of PageRank.
- How PageRank revolutionized search.
- Toolbar PageRank vs PageRank.
- This is how PageRank works.
- How PageRank flows between pages.
- Is PageRank still used?
let’s dive in
What is PageRank?
Developed by Google founders Larry Page and Sergey Brin, PageRank is an algorithm based on the combined relative strength of all hyperlinks on the web.
Most people argue that the name is based on Larry Page’s surname, while others argue “Page” refers to a website. Both positions are probably true, and the overlap was probably intentional.
When Page and Brin were at Stanford University, they wrote an article entitled: The PageRank Citation Ranking: bringing order to the web.
The paper, published in January 1999, demonstrates a relatively simple algorithm for rating the strength of web pages.
The paper was later declared a patent in the US (but not in Europe, where mathematical formulas are not patentable).
Stanford University owns the patent and has assigned it to Google. The patent currently expires in 2027.
The history of the development of PageRank
While at Stanford in the late 1990s, both Brin and Page explored information gathering methods.
At the time, using links to find out how “important” each page was relative to another was a revolutionary way of ranking pages. It was computationally difficult, but by no means impossible.
The idea quickly grew into Google, which was a small player in the search world at the time.
Such was the institutional belief of some parties in Google’s approach that the company initially launched its search engine with no possibility of generating revenue.
And while Google (then known as “BackRub”) was the search engine, PageRank was the algorithm used to rank pages in search engine results pages (SERPs).
The Google Dance
One of PageRank’s challenges was that the math, while simple, had to be processed iteratively. The calculation runs multiple times, on every page and every link on the Internet. At the turn of the millennium, this math took several days to process.
The Google SERPs moved up and down during this time. These changes were often irregular, as new PageRanks were calculated for each page.
This was known as the “Google dance” and it stalled the SEO pros of the time every time Google launched its monthly update.
(The Google Dance later became the name of an annual party Google hosted at its Mountain View headquarters for SEO pros.)
A later iteration of PageRank introduced the idea of a “trusted seed” to start the algorithm, rather than giving every page on the web the same seed.
Another iteration of the model introduced the idea of a “reasonable surfer”.
This model suggests that a page’s PageRank may not be shared evenly with the pages it links to — but that the relative value of each link could be weighted based on how likely a user is to click on it.
The retreat of PageRank
Google’s algorithm was initially considered “non-spamable” internally, since the importance of a page was not only determined by its content, but also by a kind of “voting system” generated by links to the page.
Google’s trust didn’t last, however.
As the backlink industry grew, PageRank started to become problematic. As a result, Google withdrew it from public view, but continued to rely on it for its ranking algorithms.
The PageRank toolbar was retired in 2016 and eventually all public access to PageRank was restricted. But especially Majestic (an SEO tool) was able to correlate its own calculations quite well with the PageRank.
Up until January 2017, Google spent many years discouraging SEO professionals from link spoofing through documentation of its “Google Guidelines” and advice from its spam team led by Matt Cutts.
Google’s algorithms also changed during this time.
The company relied less on PageRank, and after acquiring MetaWeb and its proprietary Knowledge Graph (dubbed “Freebase” in 2014) Google began indexing the world’s information in a variety of ways.
Toolbar PageRank vs. Page Rank
At first, Google was so proud of its algorithm that it was happy to make the result of its calculation publicly available to anyone who wanted to see it.
The most notable representation was a toolbar extension for browsers such as Firefox, which displayed a score between 0 and 10 for every page on the web.
In fact, PageRank has a much wider range of values, but 0-10 gave SEO professionals and consumers an instant way to gauge the importance of any page on the web.
The PageRank Toolbar made the algorithm extremely visible, which, however, also came with complications. In particular, this meant that it was clear that links were the easiest way to trick Google.
The more links (or more specifically, the better the link), the better a page could rank in Google’s SERPs for each targeted keyword.
This meant that a secondary market was created to buy and sell links based on the PageRank of the URL at which the link was sold.
This problem was exacerbated when Yahoo launched a free tool called Yahoo Search Explorer that allowed anyone to find links to a specific page.
Later, two tools – Moz and Majestic – built on the free option by building their own indexes on the web and evaluating links separately.
How PageRank revolutionized search
Other search engines relied heavily on analyzing the content of each page individually. Using these methods, it was almost impossible to tell the difference between an influential page and one simply written with random (or manipulative) text.
This meant that other search engines’ retrieval methods were extremely easy for SEO professionals to manipulate.
So Google’s PageRank algorithm was revolutionary.
Combined with a relatively simple concept of “nGrams” to determine relevance, Google has found a winning formula.
It soon overtook the main incumbents of the time, such as AltaVista and Inktomi (which, among other things, operated MSN).
By working at the page level, Google also found a much more scalable solution than Yahoo’s and later DMOZ’s “directory-based” approach — although DMOZ (aka the Open Directory Project) was initially able to open-source Google’s own directory.
How PageRank works
The formula for PageRank comes in different forms, but it can be explained in a few sentences.
First, every page on the internet is given an estimated PageRank score. This can be any number. In the past, PageRank has been presented to the public as a value between 0 and 10, but in practice estimates need not start in this range.
The PageRank for that page is then divided by the number of links from the page, giving a smaller fraction.
The PageRank is then distributed to the linked pages – and the same applies to every other page on the Internet.
Then, for the next iteration of the algorithm, the new estimate for PageRank for each page is the sum of all fractions of pages that point to any given page.
The formula also includes a “dampening factor,” which has been described as the likelihood that a person who surfs the Internet will stop surfing altogether.
Before each further iteration of the algorithm begins, the proposed new PageRank is reduced by the damping factor.
This method is repeated until the PageRank values reach a balanced equilibrium. The resulting numbers were then generally transposed to a more recognizable range of 0 to 10 for convenience.
One way to represent this mathematically is:
- PR = PageRank in the next iteration of the algorithm.
- d = damping factor.
- j = the page number on the web (if each page had a unique number).
- n=total number of pages on the Internet.
- i = the iteration of the algorithm (initially set to 0).
The formula can also be expressed in matrix form.
Problems and iterations of the formula
The formula presents some challenges.
If a page doesn’t link to another page, the formula won’t reach equilibrium.
In this case, the PageRank would be distributed to every page on the Internet. In this way, even a page with no incoming links could be obtained some PageRank – but it wouldn’t accumulate enough to be significant.
Another, less documented challenge is that newer pages may be available more are more important than older pages have a lower PageRank. This means that old content can have a disproportionately high PageRank over time.
The time that a page was online is not taken into account in the algorithm.
How PageRank flows between pages
If a page starts with a score of 5 and contains 10 links, each page it links to will have a PageRank of 0.5 (minus the damping factor).
This is how PageRank flows across the web between iterations.
When new pages come onto the Internet, they initially only have a low PageRank. However, when other sites start linking to these sites, their PageRank increases over time.
Is PageRank still used?
Although public access to PageRank was removed in 2016, the score is believed to remain available to search engine engineers at Google.
A leak of the factors used by Yandex showed that PageRank was still a factor it could use.
Google engineers have proposed replacing the original form of PageRank with a new approximation that requires less computing power to calculate. While the formula for ranking Google pages is less important, it remains a constant for any webpage.
And regardless of what other algorithms Google might use, PageRank is likely ingrained in many of the search giant’s systems to this day.
Dixon explains in more detail how PageRank works in this video:
Original patents and papers for in-depth reading:
Featured image: VectorMine/Shutterstock