Skip to content

Graph Analytics

RelationalAI’s SQL Graph Library for Snowflake lets you perform graph analytics on your Snowflake data.

Graphs are created from an edge data stream — a data stream linked to a Snowflake table or view that represents the graph’s edge set — and are stored in a RAI database. You can create, manage, and perform analyses on graphs, all from within Snowflake, using the functions and procedures contained in the SQL Graph Library.

See the Quick Start for an example of creating a graph and running an algorithm. The Managing Graphs and Working With Algorithms guides explain how to work with graphs in more detail.

Quick Start

Before you create a graph, you must have a table or view in Snowflake with two integer columns whose rows represent the edges of the graph, such as the my_edges table created in the following example.

🔎

Nodes must be integers. In this example, an appropriate edge table is created directly. See Managing Graphs: Prepare Your Snowflake Data for an example of creating nodes from Snowflake data.

Next, you can create a data stream from the edge table using the create_data_stream procedure to send the graph’s edge data to RAI, and then create a graph using the create_graph procedure and execute an algorithm:

USE ROLE my_sf_role;
USE WAREHOUSE my_sf_warehouse;
USE DATABASE my_sf_database;
 
CREATE OR REPLACE SCHEMA graph_analytics_quickstart;
USE SCHEMA graph_analytics_quickstart;
 
-- Set a RAI context. The RAI database is used to store the graph,
-- and the RAI engine is used to to execute graph algorithms.
CALL RAI.use_rai_database('my_rai_db');
CALL RAI.use_rai_engine('my_rai_engine');
 
-- Create a table with two integer columns representing edges in a graph.
-- Typically, the edge table or view is created from existing Snowflake data
-- that has been transformed into integer nodes using RAI's Schema Mapping Library.
CREATE TABLE my_edges(x INT, y INT) AS SELECT * FROM VALUES (1, 2), (2, 3);
 
-- Create a data stream from the 'my_edges' table.
-- By default, the data stream has the same name as the table.
-- It may take several minutes for the data stream to become available.
CALL RAI.create_data_stream('my_edges');
 
-- Create an undirected graph in RAI called 'my_graph' from the 'my_edges' data stream.
-- By default, 'my_graph' is undirected. To create a directed graph:
-- CALL RAI.create_graph('my_graph', 'my_edges', {'directed': true})
CALL RAI.create_graph('my_graph', 'my_edges');
/*+--------------------------------------------------------------------------------+
  | {"message":"Graph 'my_graph' created in database 'my_rai_db'.","success":true} |
  +--------------------------------------------------------------------------------+  */
 
-- Compute the degree of each node in the graph.
SELECT * FROM TABLE(RAI.degree('my_graph'));
/*+------+--------+
  | NODE | DEGREE |
  |------+--------+
  | 1    | 1      |
  | 2    | 2      |
  | 3    | 1      |
  +------+--------+ */
💡

By default, the functions and procedures in the SQL Graph Library for Snowflake are contained in the RAI schema in your Snowflake database. Note that this schema name may vary depending on how your administrator set up the RAI integration.

Learn the Basics

These guides show you how to create and manage graphs, execute algorithms, and work with results:

Overview of Graph Algorithms

RelationalAI’s SQL Graph Library for Snowflake implements a wide range of functions for common graph analytics tasks, including functions for:

Basics

These functions are used to find basic properties of graphs, such as counting the number of nodes and edges and finding neighbors and degrees of nodes:

FunctionDescriptionReturn Value
num_nodesReturns the number of nodes in the graph.INT
num_edgesReturns the number of edges in the graph.INT
neighborFinds the neighbors of each node in the graph.TABLE(node1 INT, node2 INT)
degreeFinds the degrees of each node in the graph.TABLE(node INT, degree INT)
min_degreeReturns the minimum degree of the graph.INT
max_degreeReturns the maximum degree of the graph.INT
average_degreeReturns the average degree of the graph.FLOAT
degree_histogramCounts the number of nodes with each degree.TABLE(degree INT, count INT)
🔎

In a directed graph, node u is a neighbor of node v if u is an inneighbor or an outneighor of v. The degree of a node in a directed graph is the sum of its indegree and outdegree.

Paths

The Graph Library implements the following algorithms related to paths in graphs:

FunctionDescriptionReturn Value
shortest_path_lengthComputes the distance (length of the shortest path) between nodes.TABLE(source INT, target INT, length INT)
transitive_closureComputes the transitive closure of the edges in a graph. It may be used to determine which nodes are reachable from each other.TABLE(source INT, target INT)

Centrality

Centrality algorithms assign ranks to nodes, often for the purpose of measuring a node’s influence or importance in the graph. The Graph Library implements three centrality algorithms:

FunctionDescriptionReturn Value
eigenvector_centralityMeasures a node’s importance in such a way that connections to more important nodes contribute more to a node’s score than connections to less important nodes.TABLE(node INT, value FLOAT)
pagerankMeasures a node’s importance in a graph. pagerank is similar to eigenvector_centrality, but with an additional scaling factor.TABLE(node INT, value FLOAT)
degree_centralityMeasures a node’s importance based on its degree. Unlike pagerank and eigenvector_centrality, degree_centrality does not consider the importance of a node’s neighbors when ranking a node.TABLE(node INT, value FLOAT)

Similarity

Similarity algorithms are used to cluster nodes and predict links between nodes. The Graph Library implements a number of algorithms related to similarity:

FunctionDescriptionReturn Value
jaccard_similarityMeasures the similarity of two nodes based on the number of neighbors common to both nodes. Values range from 0 to 1, inclusive.TABLE(node1 INT, node2 INT, score FLOAT)
cosine_similarityMeasures the similarity of two nodes as a function of the angle between vector representations of their neighborhoods. Values range from -1 to 1, inclusive.TABLE(node1 INT, node2 INT, score FLOAT)
preferential_attachmentComputes the “closeness” of two nodes u and v as the number of neighbors of u times the number of neighbors of v. Higher scores indicate that two nodes are “closer” than lower scores.TABLE(node1 INT, node2 INT, score INT)
adamic_adarComputes the “closeness” of two nodes by computing the inverse logarithmic sum of the degrees of neighbors common to both nodes. Higher scores indicate that two nodes are “closer” than lower scores.TABLE(node1 INT, node2 INT, score FLOAT)
common_neighborFinds common neighbors of nodes in a graph.TABLE(node1 INT, node2 INT, node3 INT)

Community

These algorithms are used to determine how nodes are clustered in a graph:

FunctionDescriptionReturn Value
is_connectedComputes whether or not a graph is connected.BOOLEAN
weakly_connected_componentComputes the weakly connected components of a graph.TABLE(node INT, component INT)
unique_triangleComputes triples of nodes, unique up to order, that form a triangle in the graph. Use unique_triangle to find unique triangles containing a given node or pair of nodes.TABLE(node1 INT, node2 INT, node3 INT)
num_trianglesComputes the number of unique triangles in a graph.INT
triangle_countComputes the number of unique triangles each node belongs to.TABLE(node INT, count INT)
diameter_rangeEstimates the diameter of a graph by giving a minimum and maximum bound.OBJECT
Was this doc helpful?