Graph Analytics
RelationalAI’s SQL Graph Library for Snowflake lets you perform graph analytics on your Snowflake data.
Graphs are created from an edge data stream — a data stream linked to a Snowflake table or view that represents the graph’s edge set — and are stored in a RAI database. You can create, manage, and perform analyses on graphs, all from within Snowflake, using the functions and procedures contained in the SQL Graph Library.
See the Quick Start for an example of creating a graph and running an algorithm. The Managing Graphs and Working With Algorithms guides explain how to work with graphs in more detail.
Quick Start
Before you create a graph,
you must have a table or view in Snowflake with two integer columns
whose rows represent the edges of the graph,
such as the my_edges
table created in the following example.
Nodes must be integers. In this example, an appropriate edge table is created directly. See Managing Graphs: Prepare Your Snowflake Data for an example of creating nodes from Snowflake data.
Next, you can create a data stream from the edge table using the
create_data_stream
procedure to send the graph’s edge data to RAI,
and then create a graph using the
create_graph
procedure and execute an algorithm:
USE ROLE my_sf_role;
USE WAREHOUSE my_sf_warehouse;
USE DATABASE my_sf_database;
CREATE OR REPLACE SCHEMA graph_analytics_quickstart;
USE SCHEMA graph_analytics_quickstart;
-- Set a RAI context. The RAI database is used to store the graph,
-- and the RAI engine is used to to execute graph algorithms.
CALL RAI.use_rai_database('my_rai_db');
CALL RAI.use_rai_engine('my_rai_engine');
-- Create a table with two integer columns representing edges in a graph.
-- Typically, the edge table or view is created from existing Snowflake data
-- that has been transformed into integer nodes using RAI's Schema Mapping Library.
CREATE TABLE my_edges(x INT, y INT) AS SELECT * FROM VALUES (1, 2), (2, 3);
-- Create a data stream from the 'my_edges' table.
-- By default, the data stream has the same name as the table.
-- It may take several minutes for the data stream to become available.
CALL RAI.create_data_stream('my_edges');
-- Create an undirected graph in RAI called 'my_graph' from the 'my_edges' data stream.
-- By default, 'my_graph' is undirected. To create a directed graph:
-- CALL RAI.create_graph('my_graph', 'my_edges', {'directed': true})
CALL RAI.create_graph('my_graph', 'my_edges');
/*+--------------------------------------------------------------------------------+
| {"message":"Graph 'my_graph' created in database 'my_rai_db'.","success":true} |
+--------------------------------------------------------------------------------+ */
-- Compute the degree of each node in the graph.
SELECT * FROM TABLE(RAI.degree('my_graph'));
/*+------+--------+
| NODE | DEGREE |
|------+--------+
| 1 | 1 |
| 2 | 2 |
| 3 | 1 |
+------+--------+ */
By default, the functions and procedures in the SQL Graph Library for Snowflake
are contained in the RAI
schema in your Snowflake database.
Note that this schema name may vary depending on how your administrator
set up the RAI integration.
Learn the Basics
These guides show you how to create and manage graphs, execute algorithms, and work with results:
Overview of Graph Algorithms
RelationalAI’s SQL Graph Library for Snowflake implements a wide range of functions for common graph analytics tasks, including functions for:
Basics
These functions are used to find basic properties of graphs, such as counting the number of nodes and edges and finding neighbors and degrees of nodes:
Function | Description | Return Value |
---|---|---|
num_nodes | Returns the number of nodes in the graph. | INT |
num_edges | Returns the number of edges in the graph. | INT |
neighbor | Finds the neighbors of each node in the graph. | TABLE(node1 INT, node2 INT) |
degree | Finds the degrees of each node in the graph. | TABLE(node INT, degree INT) |
min_degree | Returns the minimum degree of the graph. | INT |
max_degree | Returns the maximum degree of the graph. | INT |
average_degree | Returns the average degree of the graph. | FLOAT |
degree_histogram | Counts the number of nodes with each degree. | TABLE(degree INT, count INT) |
In a directed graph, node u
is a neighbor of node v
if u
is an inneighbor or an outneighor of v
.
The degree of a node in a directed graph
is the sum of its indegree and outdegree.
Paths
The Graph Library implements the following algorithms related to paths in graphs:
Function | Description | Return Value |
---|---|---|
shortest_path_length | Computes the distance (length of the shortest path) between nodes. | TABLE(source INT, target INT, length INT) |
transitive_closure | Computes the transitive closure of the edges in a graph. It may be used to determine which nodes are reachable from each other. | TABLE(source INT, target INT) |
Centrality
Centrality algorithms assign ranks to nodes, often for the purpose of measuring a node’s influence or importance in the graph. The Graph Library implements three centrality algorithms:
Function | Description | Return Value |
---|---|---|
eigenvector_centrality | Measures a node’s importance in such a way that connections to more important nodes contribute more to a node’s score than connections to less important nodes. | TABLE(node INT, value FLOAT) |
pagerank | Measures a node’s importance in a graph. pagerank is similar to eigenvector_centrality , but with an additional scaling factor. | TABLE(node INT, value FLOAT) |
degree_centrality | Measures a node’s importance based on its degree. Unlike pagerank and eigenvector_centrality , degree_centrality does not consider the importance of a node’s neighbors when ranking a node. | TABLE(node INT, value FLOAT) |
Similarity
Similarity algorithms are used to cluster nodes and predict links between nodes. The Graph Library implements a number of algorithms related to similarity:
Function | Description | Return Value |
---|---|---|
jaccard_similarity | Measures the similarity of two nodes based on the number of neighbors common to both nodes. Values range from 0 to 1, inclusive. | TABLE(node1 INT, node2 INT, score FLOAT) |
cosine_similarity | Measures the similarity of two nodes as a function of the angle between vector representations of their neighborhoods. Values range from -1 to 1, inclusive. | TABLE(node1 INT, node2 INT, score FLOAT) |
preferential_attachment | Computes the “closeness” of two nodes u and v as the number of neighbors of u times the number of neighbors of v . Higher scores indicate that two nodes are “closer” than lower scores. | TABLE(node1 INT, node2 INT, score INT) |
adamic_adar | Computes the “closeness” of two nodes by computing the inverse logarithmic sum of the degrees of neighbors common to both nodes. Higher scores indicate that two nodes are “closer” than lower scores. | TABLE(node1 INT, node2 INT, score FLOAT) |
common_neighbor | Finds common neighbors of nodes in a graph. | TABLE(node1 INT, node2 INT, node3 INT) |
Community
These algorithms are used to determine how nodes are clustered in a graph:
Function | Description | Return Value |
---|---|---|
is_connected | Computes whether or not a graph is connected. | BOOLEAN |
weakly_connected_component | Computes the weakly connected components of a graph. | TABLE(node INT, component INT) |
unique_triangle | Computes triples of nodes, unique up to order, that form a triangle in the graph. Use unique_triangle to find unique triangles containing a given node or pair of nodes. | TABLE(node1 INT, node2 INT, node3 INT) |
num_triangles | Computes the number of unique triangles in a graph. | INT |
triangle_count | Computes the number of unique triangles each node belongs to. | TABLE(node INT, count INT) |
diameter_range | Estimates the diameter of a graph by giving a minimum and maximum bound. | OBJECT |