We provide the hyperlink graph on four different levels of aggregation:

  • Page-Level Graph – This version of the graph contains all details with each node representing a single page and each arc a hyperlink between to two pages.
  • Subdomain-Level Graph – This graph aggregates the page graph by subdomain. Each node in the graph represents a specific subdomain (like and a arc exists, if at least one hyperlink was found between pages that belong to a pair of subdomains. Note that subdomains can be of arbitrary depth.
  • First-Level-Subdomain Graph – Each node represents a first level subdomain (like with all subjacent subdomains aggregated into this domain.
  • Pay-Level-Domain Graph – Each node represents a pay-level-domain (lie An arc exists if at least one hyperlink was found between pages contained in a pair pay-level-domains.

The table below gives an overview of the size of the different graphs:

Graph #Nodes #Arcs
Page Graph ,56 million 8,736 million
Subdomain Graph 1 million 2,043 million
1st Level Subdomain Graph 95 million 1,937 million
PLD Graph 43 million 623 million

2. Formats and Download

We provide the graphs for free download in several formats. All graphs are provided in an index/arc data format. In addition, we provide the page graph in the format used by the WebGraph library and the PLD graph in the format used by Pajek. The page graphs are hosted on Amazon S3. The aggregated graphs are provided for download via a server in Mannheim, Germany.

2.1 Index/Arc Format

The Index/Arc format represents each graph using two files. Within the index file each line represents one node. The first column states the node name, the second column states the node index. Within the arc file each line represents a directed edge between two nodes, where the first column is the origin node and the second the target node. The files are sorted by index and use tabs as a delimiter. The following example files contain a graph with 106 nodes and 141 arcs.

The following table contains the links for downloading the graphs.

Source link

No tags for this post.


Please enter your comment!
Please enter your name here