Graph Analytics

RelationalAI’s SQL Graph Library for Snowflake lets you perform graph analytics on your Snowflake data.

Graphs are created from an edge data stream — a data stream linked to a Snowflake table or view that represents the graph’s edge set — and are stored in a RAI database. You can create, manage, and perform analyses on graphs, all from within Snowflake, using the functions and procedures contained in the SQL Graph Library.

See the Quick Start for an example of creating a graph and running an algorithm. The Managing Graphs and Working With Algorithms guides explain how to work with graphs in more detail.

Quick Start

Before you create a graph, you must have a table or view in Snowflake with two integer columns whose rows represent the edges of the graph, such as the my_edges table created in the following example.

🔎

Nodes must be integers. In this example, an appropriate edge table is created directly. See Managing Graphs: Prepare Your Snowflake Data for an example of creating nodes from Snowflake data.

Next, you can create a data stream from the edge table using the create_data_stream procedure to send the graph’s edge data to RAI, and then create a graph using the create_graph procedure and execute an algorithm:

USE ROLE my_sf_role;
USE WAREHOUSE my_sf_warehouse;
USE DATABASE my_sf_database;
 
CREATE OR REPLACE SCHEMA graph_analytics_quickstart;
USE SCHEMA graph_analytics_quickstart;
 
-- Set a RAI context. The RAI database is used to store the graph,
-- and the RAI engine is used to to execute graph algorithms.
CALL RAI.use_rai_database('my_rai_db');
CALL RAI.use_rai_engine('my_rai_engine');
 
-- Create a table with two integer columns representing edges in a graph.
-- Typically, the edge table or view is created from existing Snowflake data
-- that has been transformed into integer nodes using RAI's Schema Mapping Library.
CREATE TABLE my_edges(x INT, y INT) AS SELECT * FROM VALUES (1, 2), (2, 3);
 
-- Create a data stream from the 'my_edges' table.
-- By default, the data stream has the same name as the table.
-- It may take several minutes for the data stream to become available.
CALL RAI.create_data_stream('my_edges');
 
-- Create an undirected graph in RAI called 'my_graph' from the 'my_edges' data stream.
-- By default, 'my_graph' is undirected. To create a directed graph:
-- CALL RAI.create_graph('my_graph', 'my_edges', {'directed': true})
CALL RAI.create_graph('my_graph', 'my_edges');
/*+--------------------------------------------------------------------------------+
  | {"message":"Graph 'my_graph' created in database 'my_rai_db'.","success":true} |
  +--------------------------------------------------------------------------------+  */
 
-- Compute the degree of each node in the graph.
SELECT * FROM TABLE(RAI.degree('my_graph'));
/*+------+--------+
  | NODE | DEGREE |
  |------+--------+
  | 1    | 1      |
  | 2    | 2      |
  | 3    | 1      |
  +------+--------+ */

💡

By default, the functions and procedures in the SQL Graph Library for Snowflake are contained in the RAI schema in your Snowflake database. Note that this schema name may vary depending on how your administrator set up the RAI integration.

Learn the Basics

These guides show you how to create and manage graphs, execute algorithms, and work with results:

Managing Graphs: Learn how to create, view, update, and delete graphs.
Working With Algorithms: Learn more about the algorithms in the library and how to use them.

Overview of Graph Algorithms

RelationalAI’s SQL Graph Library for Snowflake implements a wide range of functions for common graph analytics tasks, including functions for:

Basics

These functions are used to find basic properties of graphs, such as counting the number of nodes and edges and finding neighbors and degrees of nodes:

Function	Description	Return Value
`num_nodes`	Returns the number of nodes in the graph.	`INT`
`num_edges`	Returns the number of edges in the graph.	`INT`
`neighbor`	Finds the neighbors of each node in the graph.	`TABLE(node1 INT, node2 INT)`
`degree`	Finds the degrees of each node in the graph.	`TABLE(node INT, degree INT)`
`min_degree`	Returns the minimum degree of the graph.	`INT`
`max_degree`	Returns the maximum degree of the graph.	`INT`
`average_degree`	Returns the average degree of the graph.	`FLOAT`
`degree_histogram`	Counts the number of nodes with each degree.	`TABLE(degree INT, count INT)`

🔎

In a directed graph, node u is a neighbor of node v if u is an inneighbor or an outneighor of v. The degree of a node in a directed graph is the sum of its indegree and outdegree.

Paths

The Graph Library implements the following algorithms related to paths in graphs:

Function	Description	Return Value
`shortest_path_length`	Computes the distance (length of the shortest path) between nodes.	`TABLE(source INT, target INT, length INT)`
`transitive_closure`	Computes the transitive closure of the edges in a graph. It may be used to determine which nodes are reachable from each other.	`TABLE(source INT, target INT)`

Centrality

Centrality algorithms assign ranks to nodes, often for the purpose of measuring a node’s influence or importance in the graph. The Graph Library implements three centrality algorithms:

Function	Description	Return Value
`eigenvector_centrality`	Measures a node’s importance in such a way that connections to more important nodes contribute more to a node’s score than connections to less important nodes.	`TABLE(node INT, value FLOAT)`
`pagerank`	Measures a node’s importance in a graph. `pagerank` is similar to `eigenvector_centrality`, but with an additional scaling factor.	`TABLE(node INT, value FLOAT)`
`degree_centrality`	Measures a node’s importance based on its degree. Unlike `pagerank` and `eigenvector_centrality`, `degree_centrality` does not consider the importance of a node’s neighbors when ranking a node.	`TABLE(node INT, value FLOAT)`

Similarity

Similarity algorithms are used to cluster nodes and predict links between nodes. The Graph Library implements a number of algorithms related to similarity:

Function	Description	Return Value
`jaccard_similarity`	Measures the similarity of two nodes based on the number of neighbors common to both nodes. Values range from 0 to 1, inclusive.	`TABLE(node1 INT, node2 INT, score FLOAT)`
`cosine_similarity`	Measures the similarity of two nodes as a function of the angle between vector representations of their neighborhoods. Values range from -1 to 1, inclusive.	`TABLE(node1 INT, node2 INT, score FLOAT)`
`preferential_attachment`	Computes the “closeness” of two nodes `u` and `v` as the number of neighbors of `u` times the number of neighbors of `v`. Higher scores indicate that two nodes are “closer” than lower scores.	`TABLE(node1 INT, node2 INT, score INT)`
`adamic_adar`	Computes the “closeness” of two nodes by computing the inverse logarithmic sum of the degrees of neighbors common to both nodes. Higher scores indicate that two nodes are “closer” than lower scores.	`TABLE(node1 INT, node2 INT, score FLOAT)`
`common_neighbor`	Finds common neighbors of nodes in a graph.	`TABLE(node1 INT, node2 INT, node3 INT)`

Community

These algorithms are used to determine how nodes are clustered in a graph:

Function	Description	Return Value
`is_connected`	Computes whether or not a graph is connected.	`BOOLEAN`
`weakly_connected_component`	Computes the weakly connected components of a graph.	`TABLE(node INT, component INT)`
`unique_triangle`	Computes triples of nodes, unique up to order, that form a triangle in the graph. Use `unique_triangle` to find unique triangles containing a given node or pair of nodes.	`TABLE(node1 INT, node2 INT, node3 INT)`
`num_triangles`	Computes the number of unique triangles in a graph.	`INT`
`triangle_count`	Computes the number of unique triangles each node belongs to.	`TABLE(node INT, count INT)`
`diameter_range`	Estimates the diameter of a graph by giving a minimum and maximum bound.	`OBJECT`

Was this doc helpful?