From 8a01d5e30a116b813aea435b4385a8dbecaaa36c Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Tue, 30 Apr 2024 11:10:56 +0100 Subject: [PATCH 01/15] add percentile --- website/snippets/_sl-measures-parameters.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/snippets/_sl-measures-parameters.md b/website/snippets/_sl-measures-parameters.md index e41268a5867..66d66173e96 100644 --- a/website/snippets/_sl-measures-parameters.md +++ b/website/snippets/_sl-measures-parameters.md @@ -2,7 +2,7 @@ | --- | --- | --- | | [`name`](/docs/build/measures#name) | Provide a name for the measure, which must be unique and can't be repeated across all semantic models in your dbt project. | Required | | [`description`](/docs/build/measures#description) | Describes the calculated measure. | Optional | -| [`agg`](/docs/build/measures#description) | dbt supports the following aggregations: `sum`, `max`, `min`, `avg`, `median`, `count_distinct`, and `sum_boolean`. | Required | +| [`agg`](/docs/build/measures#aggregation) | dbt supports the following aggregations: `sum`, `max`, `min`, `avg`, `median`, `count_distinct`, `percentile`, and `sum_boolean`. | Required | | [`expr`](/docs/build/measures#expr) | Either reference an existing column in the table or use a SQL expression to create or derive a new one. | Optional | | [`non_additive_dimension`](/docs/build/measures#non-additive-dimensions) | Non-additive dimensions can be specified for measures that cannot be aggregated over certain dimensions, such as bank account balances, to avoid producing incorrect results. | Optional | | `agg_params` | Specific aggregation properties such as a percentile. | Optional | From eb4b396369d7a0b7de75032dc46b5765843ec5c8 Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Tue, 30 Apr 2024 11:28:09 +0100 Subject: [PATCH 02/15] add table --- website/docs/docs/build/measures.md | 35 ++++++++++++++++++--- website/snippets/_sl-measures-parameters.md | 2 +- 2 files changed, 31 insertions(+), 6 deletions(-) diff --git a/website/docs/docs/build/measures.md b/website/docs/docs/build/measures.md index 49ed524d6b5..ae5673a5c99 100644 --- a/website/docs/docs/build/measures.md +++ b/website/docs/docs/build/measures.md @@ -41,7 +41,7 @@ Measure names must be unique across all semantic models in a project and can not The description describes the calculated measure. It's strongly recommended you create verbose and human-readable descriptions in this field. -### Aggregation +### Aggregation The aggregation determines how the field will be aggregated. For example, a `sum` aggregation type over a granularity of `day` would sum the values across a given day. @@ -56,8 +56,33 @@ Supported aggregations include: | sum_boolean | A sum for a boolean type | | count_distinct | Distinct count of values | | median | Median (p50) calculation across the values | -| percentile | Percentile calculation across the values | +| percentile | Percentile calculation across the values. Add `agg_params` field to specify percentile details. | +**Percentile aggregation example** +If you're using the `percentile` aggregation, you must use the `agg_params` field under it to specify details for the percentile aggregation (such as what percentile to calculate and whether to use discrete or continuous calculations). + +```yaml +name: p99_transaction_value +description: The 99th percentile transaction value +expr: transaction_amount_usd +agg: percentile +agg_params: + percentile: .99 + use_discrete_percentile: False # False calculates the discrete percentile, True calculates the continuous percentile +``` + +**Percentile across supported engine types** + +The following table displays which SQL engine supports continuous, discrete, approximate, continuous, and approximate discrete percentiles. + +| | Cont. | Disc. | Approx. cont | Approx. disc | +| -- | -- | -- | -- | -- | +|Snowflake | [Y](https://docs.snowflake.com/en/sql-reference/functions/percentile_cont.html) | [Y](https://docs.snowflake.com/en/sql-reference/functions/percentile_disc.html) | [Y](https://docs.snowflake.com/en/sql-reference/functions/approx_percentile.html) (t-digest) | N | +| Bigquery | N (window) | N (window) | [Y](https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#approx_quantiles) | N | +| Databricks | [Y](https://docs.databricks.com/sql/language-manual/functions/percentile_cont.html) | [N](https://docs.databricks.com/sql/language-manual/functions/percentile_disc.html) | N | [Y](https://docs.databricks.com/sql/language-manual/functions/approx_percentile.html) | +| Redshift | [Y](https://docs.aws.amazon.com/redshift/latest/dg/r_PERCENTILE_CONT.html) | N (window) | N | [Y](https://docs.aws.amazon.com/redshift/latest/dg/r_APPROXIMATE_PERCENTILE_DISC.html) | +| [Postgres](https://www.postgresql.org/docs/9.4/functions-aggregate.html) | Y | Y | N | N | +| [DuckDB](https://duckdb.org/docs/sql/aggregates.html) | Y | Y | Y (t-digest) | N | ### Expr @@ -123,7 +148,7 @@ semantic_models: description: The average value of transactions expr: transaction_amount_usd agg: average - - name: transactions_amount_usd_valid #Notice here how we use expr to compute the aggregation based on a condition + - name: transactions_amount_usd_valid # Notice here how we use expr to compute the aggregation based on a condition description: The total USD value of valid transactions only expr: CASE WHEN is_valid = True then 1 else 0 end agg: sum @@ -137,7 +162,7 @@ semantic_models: agg: percentile agg_params: percentile: .99 - use_discrete_percentile: False #False will calculate the discrete percentile and True will calculate the continuous percentile + use_discrete_percentile: False # False calculates the discrete percentile and True calculates the continuous percentile - name: median_transaction_value description: The median transaction value expr: transaction_amount_usd @@ -147,7 +172,7 @@ semantic_models: dimensions: - name: metric_time type: time - expr: date_trunc('day', ts) #expr refers to underlying column ts + expr: date_trunc('day', ts) # expr refers to underlying column ts type_params: time_granularity: day - name: is_bulk_transaction diff --git a/website/snippets/_sl-measures-parameters.md b/website/snippets/_sl-measures-parameters.md index 66d66173e96..c7f76561009 100644 --- a/website/snippets/_sl-measures-parameters.md +++ b/website/snippets/_sl-measures-parameters.md @@ -5,7 +5,7 @@ | [`agg`](/docs/build/measures#aggregation) | dbt supports the following aggregations: `sum`, `max`, `min`, `avg`, `median`, `count_distinct`, `percentile`, and `sum_boolean`. | Required | | [`expr`](/docs/build/measures#expr) | Either reference an existing column in the table or use a SQL expression to create or derive a new one. | Optional | | [`non_additive_dimension`](/docs/build/measures#non-additive-dimensions) | Non-additive dimensions can be specified for measures that cannot be aggregated over certain dimensions, such as bank account balances, to avoid producing incorrect results. | Optional | -| `agg_params` | Specific aggregation properties such as a percentile. | Optional | +| `agg_params` | Specific aggregation properties, such as a percentile. | Optional | | `agg_time_dimension` | The time field. Defaults to the default agg time dimension for the semantic model. | Optional | 1.6 and higher | | `label`* | How the metric appears in project docs and downstream integrations. | Required | | `create_metric`* | You can create a metric directly from a measure with `create_metric: True` and specify its display name with `create_metric_display_name`. | Optional | From ef5e918ac7a5817733a0562e89e9352b62c70e53 Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Tue, 30 Apr 2024 11:31:38 +0100 Subject: [PATCH 03/15] clarify --- website/docs/docs/build/measures.md | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/website/docs/docs/build/measures.md b/website/docs/docs/build/measures.md index ae5673a5c99..7463e47986c 100644 --- a/website/docs/docs/build/measures.md +++ b/website/docs/docs/build/measures.md @@ -56,9 +56,9 @@ Supported aggregations include: | sum_boolean | A sum for a boolean type | | count_distinct | Distinct count of values | | median | Median (p50) calculation across the values | -| percentile | Percentile calculation across the values. Add `agg_params` field to specify percentile details. | +| percentile | Percentile calculation across the values. | -**Percentile aggregation example** +**Percentile aggregation example**
If you're using the `percentile` aggregation, you must use the `agg_params` field under it to specify details for the percentile aggregation (such as what percentile to calculate and whether to use discrete or continuous calculations). ```yaml @@ -71,18 +71,17 @@ agg_params: use_discrete_percentile: False # False calculates the discrete percentile, True calculates the continuous percentile ``` -**Percentile across supported engine types** - +**Percentile across supported engine types**
The following table displays which SQL engine supports continuous, discrete, approximate, continuous, and approximate discrete percentiles. | | Cont. | Disc. | Approx. cont | Approx. disc | | -- | -- | -- | -- | -- | -|Snowflake | [Y](https://docs.snowflake.com/en/sql-reference/functions/percentile_cont.html) | [Y](https://docs.snowflake.com/en/sql-reference/functions/percentile_disc.html) | [Y](https://docs.snowflake.com/en/sql-reference/functions/approx_percentile.html) (t-digest) | N | -| Bigquery | N (window) | N (window) | [Y](https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#approx_quantiles) | N | -| Databricks | [Y](https://docs.databricks.com/sql/language-manual/functions/percentile_cont.html) | [N](https://docs.databricks.com/sql/language-manual/functions/percentile_disc.html) | N | [Y](https://docs.databricks.com/sql/language-manual/functions/approx_percentile.html) | -| Redshift | [Y](https://docs.aws.amazon.com/redshift/latest/dg/r_PERCENTILE_CONT.html) | N (window) | N | [Y](https://docs.aws.amazon.com/redshift/latest/dg/r_APPROXIMATE_PERCENTILE_DISC.html) | -| [Postgres](https://www.postgresql.org/docs/9.4/functions-aggregate.html) | Y | Y | N | N | -| [DuckDB](https://duckdb.org/docs/sql/aggregates.html) | Y | Y | Y (t-digest) | N | +|Snowflake | [Yes](https://docs.snowflake.com/en/sql-reference/functions/percentile_cont.html) | [Yes](https://docs.snowflake.com/en/sql-reference/functions/percentile_disc.html) | [Yes](https://docs.snowflake.com/en/sql-reference/functions/approx_percentile.html) (t-digest) | No | +| Bigquery | No (window) | No (window) | [Yes](https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#approx_quantiles) | No | +| Databricks | [Yes](https://docs.databricks.com/sql/language-manual/functions/percentile_cont.html) | [No](https://docs.databricks.com/sql/language-manual/functions/percentile_disc.html) | No | [Yes](https://docs.databricks.com/sql/language-manual/functions/approx_percentile.html) | +| Redshift | [Yes](https://docs.aws.amazon.com/redshift/latest/dg/r_PERCENTILE_CONT.html) | No (window) | No | [Yes](https://docs.aws.amazon.com/redshift/latest/dg/r_APPROXIMATE_PERCENTILE_DISC.html) | +| [Postgres](https://www.postgresql.org/docs/9.4/functions-aggregate.html) | Yes | Yes | No | No | +| [DuckDB](https://duckdb.org/docs/sql/aggregates.html) | Yes | Yes | Yes (t-digest) | No | ### Expr From a393ce9d8316b61ae435a64bd95feae8f60497db Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Tue, 30 Apr 2024 12:25:25 +0100 Subject: [PATCH 04/15] add add'l context and remove metric_time from example --- website/docs/docs/build/dimensions.md | 43 ++++++++++++++++----------- 1 file changed, 26 insertions(+), 17 deletions(-) diff --git a/website/docs/docs/build/dimensions.md b/website/docs/docs/build/dimensions.md index 2735eef0729..4dd9449afaf 100644 --- a/website/docs/docs/build/dimensions.md +++ b/website/docs/docs/build/dimensions.md @@ -10,18 +10,16 @@ Dimensions is a way to group or filter information based on categories or time. In a data platform, dimensions is part of a larger structure called a semantic model. It's created along with other elements like [entities](/docs/build/entities) and [measures](/docs/build/measures), and used to add more details to your data that can't be easily added up or combined. In SQL, dimensions is typically included in the `group by` clause of your SQL query. - All dimensions require a `name`, `type` and in some cases, an `expr` parameter. The `name` for your dimension must be unique to the semantic model and can not be the same as an existing `entity` or `measure` within that same model. - | Parameter | Description | Type | | --------- | ----------- | ---- | | `name` | Refers to the name of the group that will be visible to the user in downstream tools. It can also serve as an alias if the column name or SQL query reference is different and provided in the `expr` parameter.

Dimension names should be unique within a semantic model, but they can be non-unique across different models as MetricFlow uses [joins](/docs/build/join-logic) to identify the right dimension. | Required | -| `type` | Specifies the type of group created in the semantic model. There are three types:

- **Categorical**: Group rows in a table by categories like geography, color, and so on.
- **Time**: Point to a date field in the data platform. Must be of type TIMESTAMP or equivalent in the data platform engine.
- **Slowly-changing dimensions**: Analyze metrics over time and slice them by groups that change over time, like sales trends by a customer's country. | Required | +| `type` | Specifies the type of group created in the semantic model. There are two types:

- **Categorical**: Group rows in a table by categories like geography, color, and so on.
- **Time**: Point to a date field in the data platform. Must be of type TIMESTAMP or equivalent in the data platform engine.
- You can also use time dimensions to specify time spans for [slowly changing dimensions](/docs/build/dimensions#scd-type-ii) tables. | Required | | `type_params` | Specific type params such as if the time is primary or used as a partition | Required | | `description` | A clear description of the dimension | Optional | | `expr` | Defines the underlying column or SQL query for a dimension. If no `expr` is specified, MetricFlow will use the column with the same name as the group. You can use column name itself to input a SQL expression. | Optional | @@ -54,7 +52,7 @@ semantic_models: ... # --- dimensions --- dimensions: - - name: metric_time + - name: order_date type: time expr: date_trunc('day', ts) - name: is_bulk_transaction @@ -82,11 +80,11 @@ semantic_model: This section further explains the dimension definitions, along with examples. Dimensions have the following types: - [Dimensions types](#dimensions-types) - - [Categorical](#categorical) - - [Time](#time) +- [Categorical](#categorical) +- [Time](#time) - [SCD Type II](#scd-type-ii) -### Categorical +## Categorical Categorical is used to group metrics by different categories such as product type, color, or geographical area. They can refer to existing columns in your dbt model or be calculated using a SQL expression with the `expr` parameter. An example of a category dimension is `is_bulk_transaction`, which is a group created by applying a case statement to the underlying column `quantity`. This allows users to group or filter the data based on bulk transactions. @@ -97,7 +95,7 @@ dimensions: expr: case when quantity > 10 then true else false end ``` -### Time +## Time :::tip use datetime data type if using BigQuery @@ -202,30 +200,40 @@ measures: -### SCD Type II +### SCD Type II -:::caution -Currently, there are limitations in supporting SCD's. +:::caution +Currently, there are limitations in supporting SCD's. ::: -MetricFlow supports joins against dimensions values in a semantic model built on top of an SCD Type II table (slowly changing dimension) Type II table. This is useful when you need a particular metric sliced by a group that changes over time, such as the historical trends of sales by a customer's country. +MetricFlow supports joins against dimensions values in a semantic model built on top of an SCD Type II table (slowly changing dimension) Type II table. This is useful when you need a particular metric organized by a group that changes over time, such as the historical trends of sales by a customer's country. + +SCD Type II are groups that change values at a coarser time granularity. This results in a range of valid rows with different dimensions values for a given metric or measure. MetricFlow associates the metric with the first (minimum) available dimensions value within a coarser time window, such as month. By default, MetricFlow uses the group that is valid at the beginning of the time granularity. -As their name suggests SCD Type II are groups that change values at a coarser time granularity. This results in a range of valid rows with different dimensions values for a given metric or measure. MetricFlow associates the metric with the first (minimum) available dimensions value within a coarser time window, such as month. By default, MetricFlow uses the group that is valid at the beginning of the time granularity. +**Basic structure** The following basic structure of an SCD Type II data platform table is supported: | entity_key | dimensions_1 | dimensions_2 | ... | dimensions_x | valid_from | valid_to | |------------|-------------|-------------|-----|-------------|------------|----------| -* `entity_key` (required): An entity_key (or some sort of identifier) must be present +* `entity_key` (required): An entity_key (or some sort of identifier) must be present. * `valid_from` (required): A timestamp indicating the start of a changing dimensions value must be present * `valid_to` (required): A timestamp indicating the end of a changing dimensions value must be present -**Note**: The SCD dimensions table must have `valid_to` and `valid_from` columns. +**Implementation** -This is an example of SQL code that shows how a sample metric called `num_events` is joined with versioned dimensions data (stored in a table called `scd_dimensions`) using a primary key made up of the `entity_key` and `timestamp` columns. +Here are some guidelines to follow when implementing SCD Type II tables: + +- The SCD dimensions table must have `valid_to` and `valid_from` columns, which are logical constructs. +- The columns must be specified exactly once per dimension. +- The columns shouldn't be used or specified on the same dimension. +- The columns must cover a non-overlapping period where one row matches each natural key value (meaning they must not overlap and should be distinct). +- We recommend defining the underlying dbt model with [dbt snapshots](/docs/build/snapshots). This supports the SCD Type II table layout and ensures that the table is updated with the latest data. +This is an example of SQL code that shows how a sample metric called `num_events` is joined with versioned dimensions data (stored in a table called `scd_dimensions`) using a primary key made up of the `entity_key` and `timestamp` columns. + ```sql select metric_time, dimensions_1, sum(1) as num_events from events a @@ -237,6 +245,8 @@ on group by 1, 2 ``` +**SCD example** + @@ -251,7 +261,6 @@ This example shows how to create slowly changing dimensions (SCD) using a semant | 333 | 2 | 2020-08-19 | 2021-10-22| | 333 | 3 | 2021-10-22 | 2048-01-01| - The `validity_params` include two important arguments — `is_start` and `is_end`. These specify the columns in the SCD table that mark the start and end dates (or timestamps) for each tier or dimension. Additionally, the entity is tagged as `natural` to differentiate it from a `primary` entity. In a `primary` entity, each entity value has one row. In contrast, a `natural` entity has one row for each combination of entity value and its validity period. ```yaml From 12ab06d2682481f723bc8ad79c8f84f1c3d6ade3 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Tue, 30 Apr 2024 12:28:49 +0100 Subject: [PATCH 05/15] Update website/docs/docs/build/dimensions.md --- website/docs/docs/build/dimensions.md | 1 - 1 file changed, 1 deletion(-) diff --git a/website/docs/docs/build/dimensions.md b/website/docs/docs/build/dimensions.md index 4dd9449afaf..6d535526d99 100644 --- a/website/docs/docs/build/dimensions.md +++ b/website/docs/docs/build/dimensions.md @@ -208,7 +208,6 @@ Currently, there are limitations in supporting SCD's. MetricFlow supports joins against dimensions values in a semantic model built on top of an SCD Type II table (slowly changing dimension) Type II table. This is useful when you need a particular metric organized by a group that changes over time, such as the historical trends of sales by a customer's country. -SCD Type II are groups that change values at a coarser time granularity. This results in a range of valid rows with different dimensions values for a given metric or measure. MetricFlow associates the metric with the first (minimum) available dimensions value within a coarser time window, such as month. By default, MetricFlow uses the group that is valid at the beginning of the time granularity. **Basic structure** From 22a2296e1ac412400daff9ff0bb4368a6a877249 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Tue, 30 Apr 2024 12:29:00 +0100 Subject: [PATCH 06/15] Update website/docs/docs/build/dimensions.md --- website/docs/docs/build/dimensions.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/website/docs/docs/build/dimensions.md b/website/docs/docs/build/dimensions.md index 6d535526d99..4ee1cddc144 100644 --- a/website/docs/docs/build/dimensions.md +++ b/website/docs/docs/build/dimensions.md @@ -211,6 +211,8 @@ MetricFlow supports joins against dimensions values in a semantic model built on **Basic structure** +SCD Type II are groups that change values at a coarser time granularity. This results in a range of valid rows with different dimensions values for a given metric or measure. MetricFlow associates the metric with the first (minimum) available dimensions value within a coarser time window, such as month. By default, MetricFlow uses the group that is valid at the beginning of the time granularity. + The following basic structure of an SCD Type II data platform table is supported: | entity_key | dimensions_1 | dimensions_2 | ... | dimensions_x | valid_from | valid_to | From 26ab4d674d96129d8de6fb025dc15da0dbde7dad Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Tue, 30 Apr 2024 12:46:52 +0100 Subject: [PATCH 07/15] update filter --- website/docs/docs/build/metrics-overview.md | 21 ++++++++++----------- 1 file changed, 10 insertions(+), 11 deletions(-) diff --git a/website/docs/docs/build/metrics-overview.md b/website/docs/docs/build/metrics-overview.md index 483098cc47d..92a0d67911e 100644 --- a/website/docs/docs/build/metrics-overview.md +++ b/website/docs/docs/build/metrics-overview.md @@ -185,8 +185,8 @@ metrics: type_params: numerator: cancellations denominator: transaction_amount - filter: | - {{ Dimension('customer__country') }} = 'MX' + filter: | + {{ Dimension('customer__country') }} = 'MX' - name: enterprise_cancellation_rate owners: - support@getdbt.com @@ -194,10 +194,10 @@ metrics: type_params: numerator: name: cancellations - filter: {{ Dimension('company__tier' )}} = 'enterprise' + filter: {{ Dimension('company__tier') }} = 'enterprise' denominator: transaction_amount - filter: | - {{ Dimension('customer__country') }} = 'MX' + filter: | + {{ Dimension('customer__country') }} = 'MX' ``` ### Simple metrics @@ -206,9 +206,8 @@ metrics: - `name`— Use this parameter to define the reference name of the metric. The name must be unique amongst metrics and can include lowercase letters, numbers, and underscores. You can use this name to call the metric from the dbt Semantic Layer API. - +**Note:** If you've already defined the measure using the `create_metric: True` parameter, you don't need to create simple metrics. However, if you would like to include a constraint on top of the measure, you will need to create a simple type metric. + ```yaml metrics: - name: cancellations @@ -219,9 +218,9 @@ metrics: measure: name: cancellations_usd # Specify the measure you are creating a proxy for. fill_nulls_with: 0 - filter: | - {{ Dimension('order__value')}} > 100 and {{Dimension('user__acquisition')}} - join_to_timespine: true + filter: | + {{ Dimension('order__value')}} > 100 and {{Dimension('user__acquisition')}} is not null + join_to_timespine: true ``` ## Filters From 5e29f2615113e2f61483c36731f17ab28fdaf21b Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Wed, 1 May 2024 10:16:13 +0100 Subject: [PATCH 08/15] Update website/docs/docs/build/dimensions.md Co-authored-by: Thomas Lento --- website/docs/docs/build/dimensions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/dimensions.md b/website/docs/docs/build/dimensions.md index 4ee1cddc144..26efb168832 100644 --- a/website/docs/docs/build/dimensions.md +++ b/website/docs/docs/build/dimensions.md @@ -226,7 +226,7 @@ The following basic structure of an SCD Type II data platform table is supported Here are some guidelines to follow when implementing SCD Type II tables: -- The SCD dimensions table must have `valid_to` and `valid_from` columns, which are logical constructs. +- The SCD semantic model must have `valid_to` and `valid_from` time dimensions, which are logical constructs. - The columns must be specified exactly once per dimension. - The columns shouldn't be used or specified on the same dimension. - The columns must cover a non-overlapping period where one row matches each natural key value (meaning they must not overlap and should be distinct). From 95588f71f106d91aa6ec8c83cd3c4d1128b52870 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Wed, 1 May 2024 10:16:54 +0100 Subject: [PATCH 09/15] Update website/docs/docs/build/dimensions.md Co-authored-by: Thomas Lento --- website/docs/docs/build/dimensions.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/website/docs/docs/build/dimensions.md b/website/docs/docs/build/dimensions.md index 26efb168832..242a6c7550e 100644 --- a/website/docs/docs/build/dimensions.md +++ b/website/docs/docs/build/dimensions.md @@ -227,9 +227,9 @@ The following basic structure of an SCD Type II data platform table is supported Here are some guidelines to follow when implementing SCD Type II tables: - The SCD semantic model must have `valid_to` and `valid_from` time dimensions, which are logical constructs. -- The columns must be specified exactly once per dimension. -- The columns shouldn't be used or specified on the same dimension. -- The columns must cover a non-overlapping period where one row matches each natural key value (meaning they must not overlap and should be distinct). +- The `valid_from` and `valid_to` properties must be specified exactly once per SCD semantic model. +- The `valid_from` and `valid_to` properties shouldn't be used or specified on the same time dimension. +- The `valid_from` and 'valid_to` time dimensions must cover a non-overlapping period where one row matches each natural key value (meaning they must not overlap and should be distinct). - We recommend defining the underlying dbt model with [dbt snapshots](/docs/build/snapshots). This supports the SCD Type II table layout and ensures that the table is updated with the latest data. From cc4126c1279a6563479f2cad3e1bd0af91cc586c Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Wed, 1 May 2024 15:14:53 +0100 Subject: [PATCH 10/15] Update website/docs/docs/build/measures.md Co-authored-by: Ly Nguyen <107218380+nghi-ly@users.noreply.github.com> --- website/docs/docs/build/measures.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/measures.md b/website/docs/docs/build/measures.md index 7463e47986c..40b57733dc5 100644 --- a/website/docs/docs/build/measures.md +++ b/website/docs/docs/build/measures.md @@ -58,7 +58,7 @@ Supported aggregations include: | median | Median (p50) calculation across the values | | percentile | Percentile calculation across the values. | -**Percentile aggregation example**
+#### Percentile aggregation example If you're using the `percentile` aggregation, you must use the `agg_params` field under it to specify details for the percentile aggregation (such as what percentile to calculate and whether to use discrete or continuous calculations). ```yaml From f6581f9569508e4486b0a06be57944a2e367d90e Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Wed, 1 May 2024 15:15:01 +0100 Subject: [PATCH 11/15] Update website/docs/docs/build/measures.md Co-authored-by: Ly Nguyen <107218380+nghi-ly@users.noreply.github.com> --- website/docs/docs/build/measures.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/measures.md b/website/docs/docs/build/measures.md index 40b57733dc5..c565b445a11 100644 --- a/website/docs/docs/build/measures.md +++ b/website/docs/docs/build/measures.md @@ -59,7 +59,7 @@ Supported aggregations include: | percentile | Percentile calculation across the values. | #### Percentile aggregation example -If you're using the `percentile` aggregation, you must use the `agg_params` field under it to specify details for the percentile aggregation (such as what percentile to calculate and whether to use discrete or continuous calculations). +If you're using the `percentile` aggregation, you must use the `agg_params` field to specify details for the percentile aggregation (such as what percentile to calculate and whether to use discrete or continuous calculations). ```yaml name: p99_transaction_value From 2a722de3f6afb0fb7ed70076d4f0109db2c6214a Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Wed, 1 May 2024 15:15:07 +0100 Subject: [PATCH 12/15] Update website/docs/docs/build/measures.md Co-authored-by: Ly Nguyen <107218380+nghi-ly@users.noreply.github.com> --- website/docs/docs/build/measures.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/measures.md b/website/docs/docs/build/measures.md index c565b445a11..272dde81973 100644 --- a/website/docs/docs/build/measures.md +++ b/website/docs/docs/build/measures.md @@ -71,7 +71,7 @@ agg_params: use_discrete_percentile: False # False calculates the discrete percentile, True calculates the continuous percentile ``` -**Percentile across supported engine types**
+#### Percentile across supported engine types The following table displays which SQL engine supports continuous, discrete, approximate, continuous, and approximate discrete percentiles. | | Cont. | Disc. | Approx. cont | Approx. disc | From 93f0ad0e2d3995371c2b7aa96b725a86c9b2d1d6 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Wed, 1 May 2024 15:15:13 +0100 Subject: [PATCH 13/15] Update website/docs/docs/build/measures.md Co-authored-by: Ly Nguyen <107218380+nghi-ly@users.noreply.github.com> --- website/docs/docs/build/measures.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/measures.md b/website/docs/docs/build/measures.md index 272dde81973..24d48d0c27e 100644 --- a/website/docs/docs/build/measures.md +++ b/website/docs/docs/build/measures.md @@ -72,7 +72,7 @@ agg_params: ``` #### Percentile across supported engine types -The following table displays which SQL engine supports continuous, discrete, approximate, continuous, and approximate discrete percentiles. +The following table lists which SQL engine supports continuous, discrete, approximate, continuous, and approximate discrete percentiles. | | Cont. | Disc. | Approx. cont | Approx. disc | | -- | -- | -- | -- | -- | From 30eb97e93a932d4f47bd613089be5da97262d735 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Wed, 1 May 2024 15:16:50 +0100 Subject: [PATCH 14/15] Update dimensions.md --- website/docs/docs/build/dimensions.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/website/docs/docs/build/dimensions.md b/website/docs/docs/build/dimensions.md index f3d89d52d2e..ade5690fef2 100644 --- a/website/docs/docs/build/dimensions.md +++ b/website/docs/docs/build/dimensions.md @@ -143,14 +143,14 @@ dimensions: - name: created_at type: time label: "Date of creation" - expr: date_trunc('day', ts_created) #ts_created is the underlying column name from the table + expr: date_trunc('day', ts_created) # ts_created is the underlying column name from the table is_partition: True type_params: time_granularity: day - name: deleted_at type: time label: "Date of deletion" - expr: date_trunc('day', ts_deleted) #ts_deleted is the underlying column name from the table + expr: date_trunc('day', ts_deleted) # ts_deleted is the underlying column name from the table is_partition: True type_params: time_granularity: day @@ -180,14 +180,14 @@ dimensions: - name: created_at type: time label: "Date of creation" - expr: date_trunc('day', ts_created) #ts_created is the underlying column name from the table + expr: date_trunc('day', ts_created) # ts_created is the underlying column name from the table is_partition: True type_params: time_granularity: day - name: deleted_at type: time label: "Date of deletion" - expr: date_trunc('day', ts_deleted) #ts_deleted is the underlying column name from the table + expr: date_trunc('day', ts_deleted) # ts_deleted is the underlying column name from the table is_partition: True type_params: time_granularity: day From 1147e703134ca56a64642b9e10c930eb52e29ed9 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Wed, 1 May 2024 15:18:44 +0100 Subject: [PATCH 15/15] Update measures.md --- website/docs/docs/build/measures.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/website/docs/docs/build/measures.md b/website/docs/docs/build/measures.md index 1a42fc76c16..3c548f20c8c 100644 --- a/website/docs/docs/build/measures.md +++ b/website/docs/docs/build/measures.md @@ -66,7 +66,7 @@ expr: transaction_amount_usd agg: percentile agg_params: percentile: .99 - use_discrete_percentile: False # False calculates the discrete percentile, True calculates the continuous percentile + use_discrete_percentile: False # False calculates the continuous percentile, True calculates the discrete percentile. ``` #### Percentile across supported engine types @@ -159,7 +159,7 @@ semantic_models: agg: percentile agg_params: percentile: .99 - use_discrete_percentile: False # False calculates the discrete percentile and True calculates the continuous percentile + use_discrete_percentile: False # False calculates the continuous percentile, True calculates the discrete percentile. - name: median_transaction_value description: The median transaction value expr: transaction_amount_usd @@ -185,7 +185,7 @@ Some measures cannot be aggregated over certain dimensions, like time, because i To demonstrate the configuration for non-additive measures, consider a subscription table that includes one row per date of the registered user, the user's active subscription plan(s), and the plan's subscription value (revenue) with the following columns: - `date_transaction`: The daily date-spine. -- `user_id`: The ID pertaining to the registered user. +- `user_id`: The ID of the registered user. - `subscription_plan`: A column to indicate the subscription plan ID. - `subscription_value`: A column to indicate the monthly subscription value (revenue) of a particular subscription plan ID.