diff --git a/nql.md b/nql.md index f84776a..69d47bd 100644 --- a/nql.md +++ b/nql.md @@ -186,23 +186,24 @@ Using `EXPLAIN `, a forecast can be generated. This forecast estimates bo ### 10.3 CREATING MATERIALIZED VIEWS -#### 10.4.1 Materialized Views +#### 10.3.1 Materialized Views In databases that support them, a materialized view is a database object that stores the result of a query physically. It provides a way to cache expensive query results and improve query performance by reading from this pre-computed result set, which can be refreshed periodically or on-demand. Similarly, in NQL, creating a materialized view creates a new, unique dataset within the Narrative Data Collaboration Platform that can be further queried or actioned on downstream. Creating a materialized view effectively creates a new dataset with a unique name. Such datasets cannot ingest data from other sources. Data purchase costs may or may not be incurred when executing a query that materializes data as a new dataset, depending on the underlying query's access rules. -#### 10.4.2 Materialized View Syntax +#### 10.3.2 Materialized View Syntax ```sql CREATE MATERIALIZED VIEW "" [ DISPLAY_NAME = '' ] [ DESCRIPTION = '' ] -[ EXPIRE = { } // Supported syntax: "expire_when > P60", default retains all data. +[ EXPIRE = { } // Supported syntax: "expire_when > P60", default retains all data. ] [ STATUS = { 'active' | 'updating' | 'draft' } ] [ TAGS = ( '_nio_materialized_view', '', ... ) ] [ WRITE_MODE = { 'append' | 'overwrite' } ] [ EXTENDED_STATS = { 'all' | 'none' } ] -[ PARTITIONED_BY , ] +[ PARTITIONED_BY , ] +[ REFRESH_SCHEDULE = { '@hourly' | '@daily' | '@weekly' | '@monthly' | '@once' | cron expression} ] AS SELECT FROM ``` @@ -257,16 +258,27 @@ The following parameters apply to the dataset that is generated by the `CREATE M - Type: expression - Default: Narrative sample partition is always present; users can add additional partitions. +- `REFRESH_SCHEDULE`: Defines the frequency of updates for the materialized view. + - Allowed Values: + - '@hourly' + - '@daily' + - '@weekly' + - '@monthly' + - '@once' + - cron expressions + - Type: enum | cron + - Default: '@once' + -## 10.5 Specialized Functions in NQL +### 10.5 Specialized Functions in NQL NQL also supports a variety of specialized functions or User-Defined Functions (UDFs) to cater to specific use-cases. -### 10.5.1 `ADDRESS_HASHES()` +#### 10.5.1 `ADDRESS_HASHES()` The `ADDRESS_HASHES()` function generates libpostal address hashes from an unstructured address string. This is especially useful for conducting fuzzy address comparisons where exact string matching isn't sufficient. By hashing the addresses and then joining two input lists based on these hashes, users can find approximate address matches with high efficiency. -#### Example: +##### Example: ```sql CREATE MATERIALIZED VIEW "address_hashes_sample_v2" AS @@ -290,6 +302,117 @@ SELECT ) as address_hashes, ... ``` +#### 10.5.2 `country_code_3_to_2()` + +The country_code_3_to_2('column') function takes in a single column of ISO 3166-1 alpha-3 country code(s) and converts it to ISO 3166-1 alpha-2 country code(s). The function is useful for matching the standard output of the `iso_3166_1_country` Rosetta Stone attribute, expressed as ISO 3166-1 alpha-2 country codes. + +##### Example: + +```sql +CREATE MATERIALIZED VIEW "country_code_sample" AS +SELECT + country_code_3_to_2(my_dataset.country_code) as two_letter_codes +FROM + company_data.my_dataset +... +``` + +### 10.6 Querying an Access Rule Directly + +An access rule has two identifiers: an `access_rule_name` and an `access_rule_id`. Access rule names are human readable and must be created explicitly, while access rule ids are created automatically during the initial set up for each access rule. NQL supports querying access rules directly through `access_rule_name` and not `access_rule_id`. + +NQL supports querying internal access rules (access rules on datasets in the same company seat) or external access rules (access rules on datasets in a different company seat) directly. Querying an access rule is the third method of querying datasets in NQL, in addition to the Rosetta Stone attribute catalog and dataset ids. + +#### 10.6.1 Querying Internal Access Rules + +An access rule name is added after the company identifier. When querying data in your own company seat, an access rule name always follows `company_data`. + +##### Example + +```sql +SELECT pd.hashed_emails +FROM company_data.access_rule_for_private_deal pd +``` + +#### 10.6.2 Querying External Access Rules + +An access rule name is added after the company identifier. When querying data in your own company seat, an access rule name always follows `company_slug`. + +##### Example + +```sql +SELECT teams.baseball_teams +FROM company_slug.access_rule_unique_name_1 teams +``` + +### 10.7 Embedded Namespaces in NQL + +#### 10.7.1 `_rosetta_stone` + + NQL supports attribute querying via the Rosetta Embedded Namespace. This namespace is facilitated by `_rosetta_stone`, a direct method to query Rosetta Stone attributes. `_rosetta_stone` acts as an attribute reference within the dataset or access rule. `_rosetta_stone` must follow either a dataset's `unique_name`, a dataset's `id`, or an access rule's `name`. In case of an absence of mappings or an incorrect attribute reference, the query will return an error. + + ##### Basic Usage + + ```sql + SELECT ds_identifier._rosetta_stone."attribute_name" AS alias_name + FROM dataset_source + ``` + + - **`ds_identifier`**: Alias or identifier for the dataset. A dataset can be referenced by its `id` or `unique_name`. + - **`attribute_name`**: The name of the Rosetta Stone attribute that is being selected. + - **`alias_name`**: An optional alias for the selected attribute. + +##### Example with Single Dataset + + ```sql + SELECT ds_123._rosetta_stone."event_timestamp" AS event_time + FROM company_data.ds_123 AS ds_123 + ``` + +##### Example Joining Multiple Datasets + + ```sql + SELECT + ds_123._rosetta_stone."attribute_1" AS attribute_from_a, + ds_456._rosetta_stone."attribute_2" AS attribute_from_b, + ds_123.email, + ds_456.username + FROM + company_data.ds_123 AS ds_123 + JOIN + company_data.ds_456 AS ds_456 + ON + ds_123.user_id = ds_456.user_id + ``` + + In this example: + + - The first Rosetta Stone attribute (**`attribute_1`**) is being pulled from dataset **`ds_123`**. + - The second Rosetta Stone attribute (**`attribute_2`**) is being pulled from dataset **`ds_456`**. + +##### Example of Nested Properties + For nested properties, the same dot notation is used within the **`_rosetta_stone`** namespace. + + ```sql + SELECT + ds_123._rosetta_stone."nested"."attribute" AS nested_attribute + FROM + company_data.ds_123 AS ds_123 + ``` + +##### Example of Filtering with Rosetta Attributes + + ```sql + SELECT + ds_123._rosetta_stone."unique_id"."value" AS id + FROM + company_data.ds_123 AS ds_123 + WHERE + ds_123.id = 123 + ``` + + Here, the **`WHERE`** clause uses the Rosetta attribute **`unique_id.value`** from dataset **`ds_123`** for filtering. + ## 12. Example Queries @@ -350,6 +473,18 @@ WHERE # CHANGE LOG +## Update 2023-12-26 + +### Section 10 - CREATING MATERIALIZED VIEWS + +- Create Materialized View syntax was updated to include `REFRESH_SCHEDULE`, which defines the frequency of updates for the materialized view. +- The UDF section includes a new function: `country_code_3_to_2()`. +- NQL supports targeting access rules directly. + - Internal access rules are targeted using `access_rule_name` and the `company_data` identifier. + - External access rules are targeted using `company_slug` and `access_rule_name`. +- Introduction of the Rosetta Embedded Namespace as a way to query attributes from specific access rules or datasets. The Rosetta Embedded Namespace is facilitated by `_rosetta_stone`. + + ## Update 2023-11-05 ### Section 2 - Scope - Revised to highlight NQL's integration with the Narrative platform.