simrank_similarity(G, source=None, target=None, importance_factor=0.9, max_iterations=1000, tolerance=0.0001)
SimRank is a similarity metric that says "two objects are considered to be similar if they are referenced by similar objects." .
The pseudo-code definition from the paper is:
def simrank(G, u, v): in_neighbors_u = G.predecessors(u) in_neighbors_v = G.predecessors(v) scale = C / (len(in_neighbors_u) * len(in_neighbors_v)) return scale * sum(simrank(G, w, x) for w, x in product(in_neighbors_u, in_neighbors_v))
where G
is the graph, u
is the source, v
is the target, and C
is a float decay or importance factor between 0 and 1.
The SimRank algorithm for determining node similarity is defined in .
A NetworkX graph
If this is specified, the returned dictionary maps each node v
in the graph to the similarity between source
and v
.
If both source
and target
are specified, the similarity value between source
and target
is returned. If target
is specified but source
is not, this argument is ignored.
The relative importance of indirect neighbors with respect to direct neighbors.
Maximum number of iterations.
Error tolerance used to check convergence. When an iteration of the algorithm finds that no similarity value changes more than this amount, the algorithm halts.
If source
and target
are both None
, this returns a dictionary of dictionaries, where keys are node pairs and value are similarity of the pair of nodes.
If source
is not None
but target
is, this returns a dictionary mapping node to the similarity of source
and that node.
If neither source
nor target
is None
, this returns the similarity value for the given pair of nodes.
Returns the SimRank similarity of nodes in the graph G
.
>>> G = nx.cycle_graph(2)
... nx.simrank_similarity(G) {0: {0: 1.0, 1: 0.0}, 1: {0: 0.0, 1: 1.0}}
>>> nx.simrank_similarity(G, source=0) {0: 1.0, 1: 0.0}
>>> nx.simrank_similarity(G, source=0, target=0) 1.0
The result of this function can be converted to a numpy array representing the SimRank matrix by using the node order of the graph to determine which row and column represent each node. Other ordering of nodes is also possible.
>>> import numpy as np
... sim = nx.simrank_similarity(G)
... np.array([[sim[u][v] for v in G] for u in G]) array([[1., 0.], [0., 1.]])
>>> sim_1d = nx.simrank_similarity(G, source=0)See :
... np.array([sim[0][v] for v in G]) array([1., 0.])
The following pages refer to to this document either explicitly or contain code examples using this.
networkx.algorithms.similarity.simrank_similarity
Hover to see nodes names; edges to Self not shown, Caped at 50 nodes.
Using a canvas is more power efficient and can get hundred of nodes ; but does not allow hyperlinks; , arrows or text (beyond on hover)
SVG is more flexible but power hungry; and does not scale well to 50 + nodes.
All aboves nodes referred to, (or are referred from) current nodes; Edges from Self to other have been omitted (or all nodes would be connected to the central node "self" which is not useful). Nodes are colored by the library they belong to, and scaled with the number of references pointing them