snap_aggregation(G, node_attributes, edge_attributes=(), prefix='Supernode-', supernode_attribute='group', superedge_attribute='types')
This function uses the Summarization by Grouping Nodes on Attributes and Pairwise edges (SNAP) algorithm for summarizing a given graph by grouping nodes by node attributes and their edge attributes into supernodes in a summary graph. This name SNAP should not be confused with the Stanford Network Analysis Project (SNAP).
Here is a high-level view of how this algorithm works:
Group nodes by node attribute values.
Iteratively split groups until all nodes in each group have edges
to nodes in the same groups. That is, until all the groups are homogeneous in their member nodes' edges to other groups. For example, if all the nodes in group A only have edge to nodes in group B, then the group is homogeneous and does not need to be split. If all nodes in group B have edges with nodes in groups {A, C}, but some also have edges with other nodes in B, then group B is not homogeneous and needs to be split into groups have edges with {A, C} and a group of nodes having edges with {A, B, C}. This way, viewers of the summary graph can assume that all nodes in the group have the exact same node attributes and the exact same edges.
Build the output summary graph, where the groups are represented by
super-nodes. Edges represent the edges shared between all the nodes in each respective groups.
A SNAP summary graph can be used to visualize graphs that are too large to display or visually analyze, or to efficiently identify sets of similar nodes with similar connectivity patterns to other sets of similar nodes based on specified node and/or edge attributes in a graph.
The summary graph produced is called a maximum Attribute-edge compatible (AR-compatible) grouping. According to , an AR-compatible grouping means that all nodes in each group have the same exact node attribute values and the same exact edges and edge types to one or more nodes in the same groups. The maximal AR-compatible grouping is the grouping with the minimal cardinality.
The AR-compatible grouping is the most detailed grouping provided by any of the SNAP algorithms.
Networkx Graph to be summarized
An iterable of the edge attributes considered in the summarization process. If provided, unique combinations of the attribute values found in the graph are used to determine the edge types in the graph. If not provided, all edges are considered to be of the same type.
The prefix used to denote supernodes in the summary graph. Defaults to 'Supernode-'.
The node attribute for recording the supernode groupings of nodes. Defaults to 'group'.
The edge attribute for recording the edge types of multiple edges. Defaults to 'types'.
Creates a summary graph based on attributes and connectivity.
SNAP aggregation takes a graph and summarizes it in the context of user-provided node and edge attributes such that a viewer can more easily extract and analyze the information represented by the graph
>>> nodes = {
... "A": dict(color="Red"),
... "B": dict(color="Red"),
... "C": dict(color="Red"),
... "D": dict(color="Red"),
... "E": dict(color="Blue"),
... "F": dict(color="Blue"),
... }
... edges = [
... ("A", "E", "Strong"),
... ("B", "F", "Strong"),
... ("C", "E", "Weak"),
... ("D", "F", "Weak"),
... ]
... G = nx.Graph()
... for node in nodes:
... attributes = nodes[node]
... G.add_node(node, **attributes) ...
>>> for source, target, type in edges:
... G.add_edge(source, target, type=type) ...
>>> node_attributes = ('color', )See :
... edge_attributes = ('type', )
... summary_graph = nx.snap_aggregation(G, node_attributes=node_attributes, edge_attributes=edge_attributes)
The following pages refer to to this document either explicitly or contain code examples using this.
networkx.algorithms.summarization.snap_aggregation
Hover to see nodes names; edges to Self not shown, Caped at 50 nodes.
Using a canvas is more power efficient and can get hundred of nodes ; but does not allow hyperlinks; , arrows or text (beyond on hover)
SVG is more flexible but power hungry; and does not scale well to 50 + nodes.
All aboves nodes referred to, (or are referred from) current nodes; Edges from Self to other have been omitted (or all nodes would be connected to the central node "self" which is not useful). Nodes are colored by the library they belong to, and scaled with the number of references pointing them