Cortex Package

ML Forecast
ML Anomaly Detection
LLM Cortex Functions
Top Insights
Classification
Document AI
Code

ML Forecast

The Coalesce ML Forecast UDN is a versatile node that allows you to create a forecast table and insert forecasts of time series data using the Snowflake built-in class FORECAST.

Snowflake Cortex is Snowflake's intelligent, fully-managed service that enables organizations to quickly analyze data and build AI applications, all within Snowflake. This service makes Machine Learning (ML) functionality accessible to data engineers to enrich data pipelines while still using SQL. Forecasting employs a machine learning algorithm to predict future data by using historical time series data.

Node Configuration

The ML Forecast has two configuration groups:

Node Properties
Forecast Model Input

Node Properties

Property	Description
Storage Location	Storage Location where the Forecast table will be created
Node Type	Name of template used to create node objects
Description	A description of the node's purpose
Deploy Enabled	If TRUE the node will be deployed / redeployed when changes are detected If FALSE the node will not be deployed or will be dropped during redeployment

Forecast Model Input

Option	Description
Model Instance Name	(Required) Name of the model that needs to be created
Create Model	True/False toggle to determine model creation: - True: Forcefully create Forecast model -- Series Column (required for multi-series): For multiple time series models, the name of the column defining the multiple time series in input data. - False: Refer to existing Forecast model
Multi-Series Forecast	True/False toggle for forecast type: - True: Create multi-series forecast model with series column, timestamp column and target column - False: Specify the timestamp column and target column to create single-series forecast model
Series Column	(Required for multi-series) Column defining multiple time series in input data
Timestamp Column	(Required) Column containing timestamps in input data
Target Column	(Required) Column containing target values in input data
Config object	OBJECT containing key-value pairs to configure forecast job
Series value	Required for multi-series forecasts. Single value or VARIANT
Exogenous Variables	True/False toggle: - True: Add future-valued exogenous data using multi-source toggle - False: Create forecast model based on days to forecast only
Multi Source	Toggle to add future-valued Exogenous data
Days to Forecast	(Required for forecasts without exogenous variables) Number of steps ahead to forecast

ML Forecast Preprocessing of Data

When the forecast model returns an error, the error message returned by Snowflake is captured and surfaced directly in the Coalesce application for troubleshooting.

Common scenarios you may encounter:

NULLS in the source data. The model will cope with some, but not too many NULLS.
Missing time periods. If the model is unable to determine a consistent frequency in the time series it will cause an error.
Missing exogenous variables. If the model was trained with exogenous variables.
Exogenous variables need to be provided into the future to predict future values.

A data preparation step in Coalesce can be used prior to the ML Forecast node to address these issues.

ML Forecast Deployment

ML Forecast Initial Deployment

When deployed for the first time into an environment the ML Forecast node will execute:

Stage	Description
Create Forecast Table	This will execute a CREATE OR REPLACE statement and create a Forecast Table in the target environment

ML Forecast Redeployment

After the ML Forecast node has been deployed for the first time into a target environment, subsequent deployments may result in altering the forecast table.

ML Forecast Altering the Forecast Tables

The following column or table changes that is made in isolation or all-together will result in an ALTER statement to modify the Forecast table in the target environment:

Change in table name
Dropping existing column
Alter column data type
Adding a new column

The following stages are executed:

Stage	Description
Clone Table	Creates an internal table
Rename Table/Alter Column/Delete Column/Add Column/Edit table description	Alter table statement is executed to perform the alter operation accordingly
Swap cloned Table	Upon successful completion of all updates, the clone replaces the main table ensuring that no data is lost
Delete Table	Drops the internal table

ML Forecast Undeployment

If a ML Forecast table is deleted from a Workspace, that Workspace is committed to Git and that commit deployed to a higher level environment then the Forecast Table in the target environment will be dropped.

This is executed in two stages:

Stage	Description
Delete Table	Coalesce Internal table is dropped
Delete Table	Target table in Snowflake is dropped

ML Anomaly Detection

The Coalesce ML Anomaly Detection UDN is a versatile node that allows you to create an Anomaly table and insert anomalies of time series data using the Snowflake built-in class ANOMALY DETECTION.

Snowflake Cortex is Snowflake's intelligent, fully-managed service that enables organizations to quickly analyze data and build AI applications, all within Snowflake. This service makes Machine Learning (ML) functionality accessible to data engineers to enrich data pipelines while still using SQL. Anomalies in data are detected by analyzing the dataset using a machine learning algorithm.

ML Anomaly Detection Node Configuration

The ML Anomaly has two configuration groups:

Node Properties
Anomaly Model Input

ML Anomaly Node Properties

Property	Description
Storage Location	Storage Location where the Table will be created
Node Type	Name of template used to create node objects
Description	A description of the node's purpose
Deploy Enabled	If TRUE the node will be deployed / redeployed when changes are detected If FALSE the node will not be deployed or will be dropped during redeployment

ML Anomaly Model Input

Option	Description
Model Instance Name	(Required) Name of the model that needs to be created
Create Model	True/False toggle to determine if model can be created or refer to an existing model: - True: Forcefully create Anomaly model - False: Refer to existing Anomaly model
Multi-Series	True/False toggle for series type: - True: Create multi-series model with series column, timestamp column and target column - False: Create single-series model
Series Column	(Required for multi-series) Column defining multiple time series in input data
Timestamp Column	(Required) Column containing timestamps in input data
Target Column	(Required) Column containing target(dependent) values in input data
Config object	Object containing configuration settings for anomaly detection job
Supervised Data	Toggle to train model using labeled data through multi-source toggle
Labeled Column	(Availbe with supervised datas) Essential for supervised data, distinguishes between normal and anomalous instances
Unsupervised Data	Toggle to train model using historical data through multi-source toggle
Multi Source	Toggle to add data for analysis

ML Anomaly Detection Preprocessing of Data

When the Anomaly model returns an error, the error message returned by Snowflake is captured and surfaced directly in Coalesce for troubleshooting.

Common scenarios you may encounter:

Missing time periods. If the model is unable to determine a consistent frequency in the time series it will cause an error.
Missing labeled column. If the model was trained with supervised data it's necessary to pass a labeled column.

Often, a data preparation step in Coalesce can be used prior to the ML Anomaly Detection node to address these issues.

ML Anomaly Detection Deployment

ML Anomaly Detection Initial Deployment

When deployed for the first time into an environment the ML Anomaly Detection node will execute:

Stage	Description
Create Anomaly Detection Table	This will execute a CREATE OR REPLACE statement and create an Anomaly Table in the target environment

ML Anomaly Detection Redeployment

After the ML Anomaly node has been deployed for the first time into a target environment, subsequent deployments may result in altering the Anomaly table.

ML Anomaly Detection Altering the Anomaly Tables

The following column or table changes that is made in isolation or all-together will result in an ALTER statement:

Change in table name
Dropping existing column
Alter column data type
Adding a new column

The following stages are executed:

Stage	Description
Clone Table	Creates an internal table
Rename Table/Alter Column/Delete Column/Add Column/Edit table description	Alter table statement executed to perform operation
Swap cloned Table	Clone replaces main table after successful updates
Delete Table	Drops the internal table

ML Anomaly Detection Undeployment

If a ML Anomaly table is deleted from a Workspace, that Workspace is committed to Git and that commit deployed to a higher level environment then the Anomaly Table in the target environment will be dropped.

This is executed in two stages:

Stage	Description
Delete Table	Coalesce Internal table is dropped
Delete Table	Target table in Snowflake is dropped

LLM Cortex Functions

The Coalesce Cortex Function UDN provides instant access to industry-leading large language models (LLMs) developed by researchers. Additionally, it offers models that Snowflake has finely tuned for specific use cases.

Snowflake Cortex - LLM Functions LLMs are fully hosted and managed by Snowflake, using them requires no setup. Your data stays within Snowflake, giving you the performance, scalability, and governance you expect.

LLM Cortex Functions Node Configuration

The LLMs Cortex function has three configuration groups:

Node Properties
Options
Cortex Package

LLM Cortex Functions Node Properties

Property	Description
Storage Location	Storage Location where the Table will be created
Node Type	Name of template used to create node objects
Description	A description of the node's purpose
Deploy Enabled	If TRUE the node will be deployed / redeployed when changes are detected If FALSE the node will not be deployed or will be dropped during redeployment

LLM Cortex Functions Options

Option	Description
Create As	Provides option to choose materialization type as `table`
Multi Source	True/False toggle for SQL UNIONs: - True: Multiple sources combined using Multi Source Strategy - False: Single source or join
Truncate Before	True/False toggle: - True: INSERT OVERWRITE used - False: INSERT used to append data
Enable tests	Toggle to enable testing features
Pre-SQL	SQL to execute before data insert operation
Post-SQL	SQL to execute after data insert operation

LLM Cortex Functions Cortex Package

Option	Description
SUMMARIZE	True/False toggle if the data from the column should be returned as a summary: - True: System prompts to add a column - False: Function remains inactive
SENTIMENT	True/False toggle to return sentiment score (-1 to 1) for English text, with -1 being the most negative, 0 is neutral, and 1 is positive.: - True: System prompts to add a column - False: Function remains inactive
TRANSLATE	True/False toggle to translate text: - True: System prompts to add a column - False: Function remains inactive
EXTRACT ANSWER	True/False toggle to extract answers: - True: System prompts to add a column - False: Function remains inactive

LLM Cortex Functions Key Points for Consideration

Review the required privileges
Datatype of target column for using function "Extract Answer" should be an Array
If source data is required in the target column, duplication of column is necessary before applying a cortex function

LLM Cortex Functions Deployment

LLM Cortex Functions Initial Deployment

When deployed for the first time into an environment the LLM node will execute:

Stage	Description
Create Table	This will execute a CREATE OR REPLACE statement and create a Table in the target environment

LLM Cortex Functions Redeployment

After the LLM node has been deployed for the first time into a target environment, subsequent deployments may result in altering the table.

Altering the LLM Cortex Functions Tables

The following column or table changes if made in isolation or all-together will result in an ALTER statement:

Change in table name
Dropping existing column
Alter Column data type
Adding a new column

The following stages are executed:

Stage	Description
Clone Table	Creates an internal table
Rename Table/Alter Column/Delete Column/Add Column/Edit table description	Alter table statement executed to perform operation
Swap cloned Table	Clone replaces main table after successful updates
Delete Table	Drops the internal table

LLM Cortex Functions Undeployment

If a LLM Node is deleted from a Workspace, and that Workspace is committed to Git, subsequently deployed to a higher-level environment, then the table in the target environment will be dropped.

This is executed in two stages:

Stage	Description
Delete Table	Coalesce Internal table is dropped
Delete Table	Target table in Snowflake is dropped

Top Insights

The Coalesce Top Insights UDN is a versatile node that allows you to streamline and improve the process of root cause analysis around changes in observed metrics. Learn more about Top Insights.

Top Insights Node Configuration

The Top Insights node has two configuration groups:

Node Properties
Top Insights

Top Insights Node Properties

Property	Description
Storage Location	Storage Location where the Table will be created
Node Type	Name of template used to create node objects
Description	A description of the node's purpose
Deploy Enabled	If TRUE the node will be deployed / redeployed when changes are detected If FALSE the node will not be deployed or will be dropped during redeployment

Top Insights Configuration

Option	Description
Create As	Provides option to create a 'view'
CATEGORICAL DIMENSIONS	(Required) Categorical attributes essential for analysis
CONTINUOUS DIMENSIONS	(Required) Columns representing continuous aspects that vary within a range
Metric	(Required) Column representing target metric being investigated
Label	(Required) Column distinguishing between control (FALSE) and test (TRUE) data
Filter Insights	Textbox to customize filtering of top insights
Order By	True/False toggle: - True: Sort column and sort order visible and required for order by clause - False: Sort column and sort order invisible

Top Insights Deployment

Top Insights Initial Deployment

When deployed for the first time into an environment the View node will execute:

Stage	Description
Create or replace View	This will execute a CREATE OR REPLACE statement and create a View in the target environment

Top Insights Redeployment

The subsequent deployment of a View node with changes in the view definition, altering options, or renaming the view results in the deletion of the existing view and the recreation of the view.

The following stages are executed:

Stage	Description
Delete View	Removes existing view
Create View	Creates new view with updated definition

Top Insights Undeployment

If a View Node is removed from a Workspace, and the changes are committed to Git and deployed to a higher-level environment, the corresponding View in the target environment will be dropped.

This is executed in a single stage:

Stage	Description
Delete View	Drops the existing View from target environment

Classification

The Coalesce Classification is a versatile node that allows you to create a classification table and classification model to classify data into different classes using patterns detected in training data using in-built snowflake ML function CLASSIFICATION.

Classification uses machine learning algorithms to sort data into different classes using patterns detected in training data. Binary classification (two classes) and multi-class classification (more than two classes) are supported. Common use cases of classification include customer churn prediction, credit card fraud detection, and spam detection.

Classification Node Configuration

The Classification node has two configuration groups:

Node Properties
Classification Model Input

Classification Node Properties

Property	Description
Storage Location	Storage Location where the Table will be created
Node Type	Name of template used to create node objects
Description	A description of the node's purpose
Deploy Enabled	If TRUE the node will be deployed / redeployed when changes are detected If FALSE the node will not be deployed or will be dropped during redeployment

Classification Model Input

Option	Description
Model Instance Name	(Required) Name of the model that needs to be created
Create Model	True/False toggle to determine model creation: - True: Forcefully create Classification model - False: Refer to existing Classification model
Target Column	(Required) Column containing target values in input data
Multi Source	Toggle to add data for analysis

Classification Deployment

Classification Initial Deployment

When deployed for the first time into an environment the Classification node will execute:

Stage	Description
Create Classification Table	This will execute a CREATE OR REPLACE statement and create a Classification Table in the target environment

Classification Redeployment

After the Classification node has been deployed for the first time into a target environment, subsequent deployments may result in altering the Classification table.

Altering the Classification Tables

The following column or table changes that is made in isolation or all-together will result in an ALTER statement:

Change in table name
Dropping existing column
Alter column data type
Adding a new column

The following stages are executed:

Stage	Description
Clone Table	Creates an internal table
Rename Table/Alter Column/Delete Column/Add Column/Edit table description	Alter table statement executed to perform operation
Swap cloned Table	Clone replaces main table after successful updates
Delete Table	Drops the internal table

Classification Undeployment

If a Classification table is deleted from a Workspace, that Workspace is committed to Git and that commit deployed to a higher level environment then the Classification Table in the target environment will be dropped.

This is executed in two stages:

Stage	Description
Delete Table	Coalesce Internal table is dropped
Delete Table	Target table in Snowflake is dropped

Document AI

The Coalesce Document AI UDN is a node that allows you to develop and deploy a processing pipeline using the already prepared Document AI model build, streams, and tasks. The pipeline will extract information from new inspection documents stored in an internal stage.

Document AI is an advanced AI model to extract data from documents. It can read both text and images, like logos or signatures, and is perfect for processing documents like invoices or financial statements automatically. More information about Document AI can be found in the official Snowflake's Introduction to Document AI.

Prerequisite for DocumentAI node type

Set up the required objects(database,schema) and privileges to create table,task,stream
Prepare and publish a DocumentAI model in Snowflake using DocumentAI interface

Usage of DocumentAI node type

Set up the required objects and privileges
Prepare and publish a Document AI model build in Snowflake using DocumentAI interface
Provide the information of the Document AI model build under config section of the node added for Document AI node type.Also provide task related information as well
The node creates a pipeline to process documents
The data is available in the target table only after uploading new documents to the internal stage specified in config.
If the node is created with 'Development mode-ON',no task is created and data can instantly loaded into target from documents using run option once the files are uploaded.
If the node is created with 'Development mode-OFF',task is created to process the uploaded files

Document AI Node Configuration

The Document AI node has the following configuration groups:

Node Properties
General Options
Stream Options
Source Data
Document AI Model build
Scheduling Options

Document AI Node Properties

Property	Description
Storage Location	Storage Location where the stream,table,task will be created
Node Type	Name of template used to create node objects
Deploy Enabled	If TRUE the node will be deployed / redeployed when changes are detected If FALSE the node will not be deployed or will be dropped during redeployment

Document AI General Options

Option	Description
Development Mode	True / False toggle that determines whether a task will be created or if SQL executes as DML True - Table created and SQL executes as Run action False - SQL wrapped in task with specified Scheduling Options
CREATE AS	Choose target object type: - Table : Permanent table with data retention and fail-safe - Transient Table : Temporary table without data retention
Truncate Before	True / False toggle determines whether a table will be overwritten each time a task executes True - Uses INSERT OVERWRITE False - Uses INSERT to append data

Document AI Stream Options

Option

Description

Source Object

Directory Table:
- A directory table is an object that sits on top of a stage, similar to an external table, and stores metadata about the files in the stage. It doesn’t have its own privileges and is used to reference file-level data. Both external (cloud storage) and internal (Snowflake) stages support directory tables. You can add a directory table to a stage when creating it with CREATE STAGE or modify it later using ALTER STAGE.

Redeployment Behavior

options for Redeployment :
- Create or Replace
- Create if Not Exists
- Create at Existing Stream

Document AI Source Data

Option	Description
Colaesce Storage Location of stage	The Storage location Name in Coalesce where the stage is located
Stage Name	The Stage name created in Snowflake

Document AI Model build

Option	Description
Coalesce Storage Location of DocumentAI model	The Storage location Name in Coalesce where the extraction query of DOcumentAI model is located
DocumentAI model Name	The identifier or name of the a model build for extracting information (Located in AI & ML- Document AI in Snowflake)
DocumentAI model build Version	The specific version of the DocumentAI model, helping to track updates or changes (Build Version in AI & ML- Document AI in Snowflake)

Document AI Scheduling Options

Option	Description
Scheduling Mode	Choose compute type: - Warehouse Task: User managed warehouse executes tasks - Serverless Task: Uses serverless compute
When Source Stream has Data Flag	True/False toggle to check for stream data True - Only run task if source stream has capture change data False - Run task on schedule regardless of whether the source stream has data. If the source is not a stream should set this to false.
Select Warehouse on which to run task	Visible if Scheduling Mode is set to Warehouse Task. Enter the name of the warehouse you want the task to run on without quotes.
Select initial serverless Warehouse size	Visible when Scheduling Mode is set to Serverless Task. Select the initial compute size on which to run the task. Snowflake will adjust size from there based on target schedule and task run times.
Task Schedule	Choose schedule type: - Minutes - Specify interval in minutes. - Cron - . Specifies a cron expression and time zone for periodically running the task. Supports a subset of standard cron utility syntax.
Enter task schedule using minutes	Visible when Task Schedule is set to Minutes. Enter a whole number from 1 to 11520 which represents the number of minutes between task runs
Enter task schedule using CRON	For more reference visit Cron expressions

DocumentAI - System Columns

The set of columns which has source data and file metadata information.

System Column	Description
FILENAME	Name of the staged documents extracted
FILE_URL	The location url of the document
FILE_LAST_MODIFIED	The last modified timestamp of the staged documents
SIZE	The size of the document extracted
EXTRACTED_DATA	The data extracted from documents
DATA_EXTRACT_TIMESTAMP	The load timestamp of document extraction

Document AI Deployment

Document AI Deployment Parameters

The DocumentAI node includes an environment parameter that allows you to specify a different warehouse used to run a task in different environments.

The parameter name is targetTaskWarehouse with default value DEV ENVIRONMENT.

{
    "targetTaskWarehouse": "DEV ENVIRONMENT"
}

When set to any value other than DEV ENVIRONMENT` the node will attempt to create the task using a Snowflake warehouse with the specified value.

For example, with the below setting for the parameter in a QA environment, the task will execute using a warehouse named SNOWFLAKE_DOCUMENT_AI_WH.

{
    "targetTaskWarehouse": "SNOWFLAKE_DOCUMENT_AI_WH"
}

Document AI Initial Deployment

Stage	Description
Create Stream	Creates Stream in target environment
Create Work Table/Transient Table	Creates table loaded by task
Create Task	Creates scheduled task
Resume Task	Enables task execution

If a task is part of a DAG of tasks, the DAG needs to include a node type called Task DAG Resume Root. This node will resume the root node once all the dependent tasks have been created as part of a deployment. The task node has no ALTER capabilities. All task-enabled nodes are CREATE OR REPLACE only, though this is subject to change

Document AI Redeployment

Stream redeployment behavior:

Redeployment Behavior	Stage Executed
Create Stream if not exists	Re-Create Stream at existing offset
Create or Replace	Create Stream
Create at existing stream	Re-Create Stream at existing offset

Table changes execute:

Stage	Description
Rename Table/Alter Column/Delete Column/Add Column/Edit description	Alters table as needed

If the materialization type is changed from one type to another(transient table/table) the following stages execute:

Stage	Description
Drop Table/Transient Table	Drop the target table
Create Work/Transient table	Create the target table

Task changes:

Stage	Description
Create Task	Creates scheduled task
Resume Task	Resumes the task

Redeployment with no changes

If the nodes are redeployed with no changes compared to previous deployment,then no stages are executed

Document AI Undeployment

When node is deleted, the following stages execute:

Stage	Description
Drop Stream	Removes the stream
Drop Table	Drop the table
Drop Current Task	Drop the task

Code

ML Forecast Code

Node definition
Create Template
Run Template

ML Anomaly Detection Code

Node definition
Create Template
Run Template

LLM Cortex Functions Code

Node definition
Create Template
Run Template

TopInsights Code

Node definition
Create Template

Classification Code

Node definition
Create Template
Run Template

Document AI Code

Node definition
Create Template
Run Template

Macros

Files

README.md

Latest commit

History

README.md

File metadata and controls

Cortex Package

ML Forecast

Node Configuration

Node Properties

Forecast Model Input

ML Forecast Preprocessing of Data

ML Forecast Deployment

ML Forecast Initial Deployment

ML Forecast Redeployment

ML Forecast Altering the Forecast Tables

ML Forecast Undeployment

ML Anomaly Detection

ML Anomaly Detection Node Configuration

ML Anomaly Node Properties

ML Anomaly Model Input

ML Anomaly Detection Preprocessing of Data

ML Anomaly Detection Deployment

ML Anomaly Detection Initial Deployment

ML Anomaly Detection Redeployment

ML Anomaly Detection Altering the Anomaly Tables

ML Anomaly Detection Undeployment

LLM Cortex Functions

LLM Cortex Functions Node Configuration

LLM Cortex Functions Node Properties

LLM Cortex Functions Options

LLM Cortex Functions Cortex Package

LLM Cortex Functions Key Points for Consideration

LLM Cortex Functions Deployment

LLM Cortex Functions Initial Deployment

LLM Cortex Functions Redeployment

Altering the LLM Cortex Functions Tables

LLM Cortex Functions Undeployment

Top Insights

Top Insights Node Configuration

Top Insights Node Properties

Top Insights Configuration

Top Insights Deployment

Top Insights Initial Deployment

Top Insights Redeployment

Top Insights Undeployment

Classification

Classification Node Configuration

Classification Node Properties

Classification Model Input

Classification Deployment

Classification Initial Deployment

Classification Redeployment

Altering the Classification Tables

Classification Undeployment

Document AI

Prerequisite for DocumentAI node type

Usage of DocumentAI node type

Document AI Node Configuration

Document AI Node Properties

Document AI General Options

Document AI Stream Options

Document AI Source Data

Document AI Model build

Document AI Scheduling Options

DocumentAI - System Columns

Document AI Deployment

Document AI Deployment Parameters

Document AI Initial Deployment

Document AI Redeployment

Redeployment with no changes

Document AI Undeployment

Code

ML Forecast Code

ML Anomaly Detection Code

LLM Cortex Functions Code

TopInsights Code

Classification Code

Document AI Code