-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathpagerank.py
68 lines (51 loc) · 2.49 KB
/
pagerank.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
import time
import mmap
from struct import unpack
"""
Below is code for the PageRank algorithm (power iteration),
most of which has been implemented for you.
Here, the binary index_file contains the mapping of each node's ID to its degree.
The node IDs range from 0 to max_node_id.
Hence, the index_file contains (max_node_id + 1) pairs of values,
each of which is of 'int' C type in little endian byte order.
The index_file is memory-mapped into the index_map object.
The binary edge_file contains the edges in the (source ID, target ID) format.
Hence, the edge_file contains edge_count pairs of values,
each of which is of 'int' C type in big endian byte order.
The edge_file is memory-mapped into the edge_map object.
Your task is to determine the correct parameters needed to
(1) initialize the memory-mapped objects (index_map and edge_map),
(2) unpack the source and target IDs from the edge_map, and
(3) upack the source ID and source degree from the index_map.
This code assumes that the node IDs start from 0 and are contiguous up to max_node_id.
Your algorithm, if implemented correctly, should return the same scores
as popular graph analysis libraries like NetworkX.
"""
def pagerank(index_file, edge_file, max_node_id, edge_count, damping_factor=0.85, iterations=10):
index_map = mmap.mmap(
index_file.fileno(),
length=(max_node_id+1)*8, # ~~~ MODIFY THIS LINE (i) ~~~
access=mmap.ACCESS_READ)
edge_map = mmap.mmap(
edge_file.fileno(),
length=edge_count*8, # ~~~ MODIFY THIS LINE (ii) ~~~
access=mmap.ACCESS_READ)
scores = [1.0 / (max_node_id + 1)] * (max_node_id + 1)
start_time = time.time()
for it in range(iterations):
new_scores = [0.0] * (max_node_id + 1)
for i in range(edge_count):
source, target = unpack(
'>ii', # ~~~ MODIFY THIS LINE (iii) ~~~
edge_map[i * 8: i * 8 + 8]) # ~~~ MODIFY THIS LINE (iv) ~~~
source_degree = unpack(
'<ii', # ~~~ MODIFY THIS LINE (v) ~~~
index_map[source * 8: source * 8 + 8])[1] # ~~~ MODIFY THIS LINE (vi) ~~~
new_scores[target] += damping_factor * scores[source] / source_degree
min_pr = (1 - damping_factor) / (max_node_id + 1)
new_scores = [min_pr + item for item in new_scores]
scores = new_scores
print ("Completed {0}/{1} iterations. {2} seconds elapsed." \
.format(it + 1, iterations, time.time() - start_time))
print ()
return scores