PageRank is an algorithm originally developed by Larry Page and Sergey Brin (founders of Google) to rank web pages in search engine results. It measures the relative importance of each node (e.g., webpage) in a directed graph based on the structure of incoming links.

Intuition

The core idea is:

A node is important if many other important nodes link to it.

PageRank simulates a “random surfer” who clicks on links at random:

  • With probability (typically 0.85), the surfer follows a link from the current page.
  • With probability , the surfer jumps to a random page.

Mathematical Formulation

Given a graph with nodes, the PageRank of node is defined recursively as:

Where:

  • is the damping factor (usually 0.85),
  • is the set of nodes linking to ,
  • is the number of outbound links from node .

Implementation (using NetworkX)

import networkx as nx
 
G = nx.DiGraph()
G.add_edges_from([
    ('A', 'B'), ('B', 'C'), ('C', 'A'), ('A', 'D'), ('D', 'C')
])
 
pagerank_scores = nx.pagerank(G, alpha=0.85)
for node, score in pagerank_scores.items():
    print(f"{node}: {score:.4f}")

📊 Use Cases

  • Graph-based NLP: Keyword extraction (e.g., TextRank).

graphdata_visualization