Social Network Analysis

Clustering Coefficient

Triadic Closure

Triadic closure: The tendency for people who share connections in a social network to become connected.

Local clustering coefficient of a node: Fraction of pairs of the node’s friends that are friends with each other. $$ \text{The local clustering coefficient of node C}: \frac{\operatorname{\#\,of\,pairs\,of } C^{\prime} \text{s friends who are friends} }{\operatorname{\#\,of\,pairs\,of } C^{\prime} \text{s friends}} $$

Global Clustering Coefficient: Measuring clustering on the whole network:

  • Approach 1: Average local clustering coefficient over all nodes in the graph.

  • Approach 2: Measuring clustering on the whole network: percentage of “open triads” that are triangles in a network. $$ \text { Transitivity }=\frac{3 \ast \text { Number of closed triads }}{\text { Number of open triads }} $$

Distance

The eccentricity of a node $n$ is the largest distance between n and all other nodes.

The radius of a graph is the minimum eccentricity.

The periphery of a graph is the set of nodes that have eccentricity equal to the diameter.

The center of a graph is the set of nodes that have eccentricity equal to the radius.

Connectivity

An undirected graph is connected if, for every pair nodes, there is a path between them.

A directed graph is strongly connected if, for every pair nodes $u$ and $v$, there is a directed path from $u$ to $v$ and a directed path from $v$ to $u$.

A directed graph is weakly connected if replacing all directed edges with undirected edges produces a connected undirected graph.

Robustness

Network robustness: the ability of a network to maintain its general structural properties when it faces failures or attacks.

Type of attacks: removal of nodes or edges.

Structural properties: connectivity.

Centrality

Centrality measures identify the most important nodes in a network:

Centrality Measures:

  • Degree centrality important nodes have many connections. $C_{d e g}(v)=\frac{d_v}{|N|-1}$

  • Closeness centrality important nodes are close to other nodes. $C_{\text {close }}(v)=\frac{|N|-1}{\sum_{u \in N \backslash\{v\}} d(v, u)}$

  • Betweenness centrality important nodes connect other nodes. $C_{b t w}(v)=\sum_{s, t \in N} \frac{\sigma_{s, t}(v)}{\sigma_{s, t}}$

  • Load centrality

  • Page Rank

  • Katz centrality

  • Percolation centrality

How to measure the closeness centrality of a node when it cannot reach all other nodes?

Consider only nodes that L can reach and normalize by the fraction of nodes L can reach: $C_{\text {close }}(L)=\left[\frac{|R(L)|}{|N-1|}\right] \frac{|R(L)|}{\sum_{u \in R(L)} d(L, u)}$

Preferential Attachment Model

  • Start with two nodes connected by an edge.
  • At each time step, add a new node with an edge connecting it to an existing node.
  • Choose the node to connect to at random with probability proportional to each node’s degree.
  • The probability of connecting to a node $u$ of degree $k_u$ is $k_u / \sum_j k_j$.

As the number of nodes increases, the degree distribution of the network under the preferential attachment model approaches the power law $P(k) = C k^{-3}$ with constant $C$.

Preferential Attachment in NetworkX

barabasi_albert_graph(n, m) returns a network with $n$ nodes. Each new node attaches to $m$ existing nodes according to the Preferential Attachment model.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
import networkx as nx  # version 2.8.4
import matplotlib.pyplot as plt
import seaborn as sns

number_of_nodes = 1000000
G = nx.barabasi_albert_graph(number_of_nodes, 1)
degrees = dict(G.degree())
degree_values = sorted(set(degrees.values()))
histogram = [list(degrees.values()).count(i) / number_of_nodes
             for i in degree_values]

plt.loglog(degree_values, histogram, 'o')
plt.xlabel('Degree'); plt.ylabel('Fraction of Nodes');

Small-World Phenomenon

Social networks tend to have high clustering coefficient and small average path length.

Small-world model:

  • Start with a ring of $n$ nodes, where each node is connected to its $k$ nearest neighbors.
  • Fix a parameter $p \in [0, 1]$
  • Consider each edge $u, v$. With probability $p$ , select a node $w$ at random and rewire the edge $(u, v)$ so it becomes $(u, w)$.

Small World Model in NetworkX

watts_strogatz_graph(n, k, p) returns a small world network with $n$ nodes, starting with a ring lattice with each node connected to its $k$ nearest neighbors, and rewiring probability $p$.

connected_watts_strogatz_graph(n, k, p, t) runs watts_strogatz_graph(n, k, p) up to $t$ times, until it returns a connected small world network.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
import networkx as nx  # version 2.8.4
import matplotlib.pyplot as plt
import seaborn as sns

G = nx.connected_watts_strogatz_graph(1000000, 6, 0.04, 10)
degrees = dict(G.degree())
degree_values = sorted(set(degrees.values()))
histogram = [list(degrees.values()).count(i) / nx.number_of_nodes(G)
             for i in degree_values]

plt.bar(degree_values, histogram)
plt.xlabel('Degree')
plt.ylabel('Fraction of Nodes')

Given a pair of nodes, how to assess whether they are likely to connect?

Measure 1: Common Neighbors

Measure 2: Jaccard Coefficient

Number of common neighbors normalized by the total number of neighbors. $$ \operatorname{jacc\_coeff}(X, Y)=\frac{|N(X) \cap N(Y)|}{|N(X) \cup N(Y)|} $$ Measure 3: Resource Allocation $$ \operatorname{ res\_alloc }(X, Y)=\sum_{u \in N(X) \cap N(Y)} \frac{1}{|N(u)|} $$ Measure 4: Adamic-Adar Index $$ \operatorname{ adamic\_adar }(X, Y)=\sum_{u \in N(X) \cap N(Y)} \frac{1}{\log (|N(u)|)} $$ Measure 5: Pref. Attachment

In the preferential attachment model, nodes with high degree get more neighbors. $$ \operatorname{ pref\_attach }(X, Y)=|N(X)||N(Y)| $$ Measure 6: Community Common Neighbors

Number of common neighbors with bonus for neighbors in same community. $$ \begin{gathered} \operatorname{cn\_soundarajan\_hopcroft }(X, Y)=|N(X) \cap N(Y)|+\sum_{u \in N(X) \cap N(Y)} f(u)\\ \text{where } f(u)=\begin{cases}1, & u \text { in same comm. as } X \text { and } Y \\ 0, & \text {otherwise }\end{cases} \end{gathered} $$ Measure 7: Community Resource Allocation $$ \begin{gathered} \operatorname{ ra\_soundarajan\_hopcroft }(X, Y)=\sum_{u \in N(X) \cap N(Y)} \frac{f(u)}{|N(u)|} \\ \text { where } f(u)= \begin{cases} 1, & u \text { in same comm. as } X \text { and } Y \\ 0, & \text {otherwise } \end{cases} \end{gathered} $$

updatedupdated2025-12-162025-12-16