28 November 2022

Graph Algorithms Overivew

by aaronchenwei

1. Algorithm Types

Centrality
- Indicators of centrality assign numbers or rankings to nodes within a graph corresponding to their network position
Classification
- Classify the graph into sets
Community
- A network is said to have community structure if the nodes of the network can be easily grouped into sets of nodes such that each set of nodes is densely connected internally
GraphML/Embeddings
- A graph embedding is an approach converting neighborhood topology into a fixed size vector of decimal values
Path
- Identify path between vertexes and path analytics
Similarity
- Computes similarity between pairs of items
Topological Link Prediction
- Link prediction is the problem of predicting the existence of a link between two entities in a network
Frequent Pattern Mining
- An analytical process that finds frequent patterns

2. Algorithms

2.1. Centrality

Name	Mechanism	Use Case
Eigenvector	A vertex's influence is based on its indegree and the influence of the vertices which refer to it.	Finding influencer from a social network Ranking web pages for searching engine Identify the marketing targets within a doctor referral network Personalized version can be used for user user behavior prediction and recommender systems
PageRank	Variant of Eigenvector. Adding additional jump probability
Article Rank	Variant of PageRank. Lowers the influence of low-degree nodes
Closeness	Find vertexes that have shorter distance to all other nodes	Recommend locations for a grocery store
Harmonic	Variant of Closeness Centrality. Reverses the sum and reciprocal operations in the definition of closeness centrality	Same as Closeness, but works better when run on disconnected graph
Degree	Variant of Closeness Centrality. Reverses the sum and reciprocal operations in the definition of closeness centrality	Same as Closeness, but works better when run on disconnected graph
Betweenness	Variant of Closeness Centrality. Reverses the sum and reciprocal operations in the definition of closeness centrality	Same as Closeness, but works better when run on disconnected graph
Influence Maximization	Variant of Closeness Centrality. Reverses the sum and reciprocal operations in the definition of closeness centrality	Same as Closeness, but works better when run on disconnected graph

2.1.1. PageRank

2.1.2. Article Rank

2.1.3. Betweenness

2.1.4. Closeness

2.1.5. Degree

2.1.6. Eigenvector

2.1.7. Harmonic

2.1.8. Influence Maximization

2.2. Community

Name	Mechanism	Use Case
Weakly Connected Components (WCC)	Every vertex is reachable from every other vertex through an undirected path	Entity Resolution, obtain a holistic picture of an given user entity Fraud ring detection
Strongly Connected Components	Every vertex is reachable from every other vertex through an directed path	AML, money laundering ring detection
K Core	Maximal connected subgraph in which every vertex is connected to at least k vertices in the subgraph	Network analytical characterization of hierarchical layers (Hierarchical result based on different k values)
K-means	Partition vertexes into k clusters in which each vertex belongs to the cluster with the nearest mean	Grouping transportation orders into clusters based on longitude and latitude Grouping product into categories based on their embeddings
Label Propagation	If the plurality of your neighbors all bear the label X, then you should label yourself as also a member of X	Anti-fraud, detect fraud rings that is a group of applications/accounts that share common PII info Detect user communities in a social network, at later stage, one could extract features of each user communities for marketing purposes
Speaker-Listener Label Propagation	Variant of the Label Propagation algorithm that is able to detect overlapping communities
Louvain	Partitions the vertices in a graph by approximately maximizing the graph's modularity score
Local Cluster Coefficient (LCC)	local clustering coefficient = Number_trangles/((n-1)n/2)	Anti phone call fraud, how many pairs of your callees called each other
Triangle Counting	Count the number of triangles in the graph	Evaluate the cohesiveness of communities

2.2.1. Connected Components

2.2.2. K Core

2.2.3. K Means

2.2.4. Label Propagation

2.2.5. Local Cluster Coefficient

2.2.6. Louvain

2.2.7. Speaker-Listener Label

2.2.8. Propagation

2.2.9. Triangle Counting

2.3. GraphML/Embeddings

Name	Mechanism	Use Case
FastRP	It generates node embeddings of low dimensionality through random projections from the graph’s adjacency matrix to a low-dimensional matrix	Use the embeddings as machine Learning features Use the embeddings for similarity algorithms
Node2Vec	Uses random walks in the graph to create a vector representation of a node

2.3.1. FastRP

2.3.2. Node2Vec

2.4. Path

Name	Mechanism	Use Case
Astar Shortest Path	A shortest path algorithm guided by a heuristic function that estimates the cost of the cheapest path	Route planning. Planning the best route for delivering an order based on train schedules Flight search. Planning a series of flights that take least amount of travelling time Anti-fraud. Analyse relationship between a new credit card application and a blacklisted application
Shortest Path	Return either the shortest or top k shortest unweighted/weighted paths connecting a set of source vertices to a set of target vertices
Cycle Detection	Find all the cycles (loops) in a graph	AML, find the transaction loops
Estimated Diameter	The worst-case length of a shortest path between any pair of vertices in a graph	Understand the interconnectedness of the graph to support following analytics, e.g. to choose small world version of WCC or standard version
Maxflow	Finding a feasible flow through a flow network that obtains the maximum possible flow rate	AML, find the maximum money flow between two accounts Supply chain, find the maximum supply flow from one site to another
Minimum Spanning Tree	A minimum spanning tree is a set of edges that can connect all the vertices in the graph with the minimal sum of edge weights	Travelling salesman problem
Minimum Spanning Forest	A set of minimum spanning trees, one for each WCC	Travelling salesman problem

2.4.1. Astar_shortest_path

2.4.2. BFS

2.4.3. Cycle_detection

2.4.4. Estimated_diameter

2.4.5. Maxflow

2.4.6. Minimum_spanning_forest

2.4.7. Minimum_spanning_tree

2.4.8. Shortest Path

2.5. Classification

Name	Mechanism	Use Case
Greedy Graph Coloring	Assigns a unique integer value known as its color to the vertices of a graph such that no neighboring vertices share the same color	Find the maximum tasks that can be performed together without conflicting with each other Find the maximum of vertexes that can execute some logic in parallel without conflicting with each other
Maximal Independent Set	An independent set of vertices does not contain any pair of vertices that are neighbors

2.5.1. Greedy Graph Coloring

2.5.2. Maximal Independent Set

2.6. Similarity

Name	Mechanism	Use Case
Cosine Similarity	Calculates cosine similarity based on topological common attributes	Find similar users, patients User purchased item A also purchased product B
Jaccard Similarity	Calculates jaccard similarity based on topological common attributes
Approximate Nearest Neighbors	Calculate the approximate nearest neighbors with similarity score based on non-topological attribute values
K Nearest Neighbors	Based on similarity scores makes a prediction for every vertex whose label is not known	Anti-fraud, predict fraud labels

2.6.1. Cosine

2.6.2. Jaccard

2.6.3. K Nearest Neighbors

2.6.4. Approximate Nearest Neighbors

2.7. Topological Link Prediction

Name	Mechanism	Use Case
Adamic Adar	Number of shared links between two nodes	Collaborative filtering for recommender systems
Common Neighbors	Number of common neighbors between two vertices
Preferential Attachment	Product of the number of neighbors of the two vertices
Resource Allocation	Reciprocal of number of neighbors of common neighbors
Same Community	Returns 1 if they two vertices are in the same community, and returns 0 if they are not in the same community
Total Neighbors	Total number of neighbors of two vertices

2.7.1. Adamic Adar

2.7.2. Common Neighbors

2.7.3. Preferential Attachment

2.7.4. Resource Allocation

2.7.5. Same Community

2.7.6. Total Neighbors

2.8. Frequent Pattern Mining

Name	Mechanism	Use Case
Apriori	Returns top k most frequent sequential patterns	Application UX improvements Customer churn prediction Bad rating cause analysis

2.8.1. Aprior