Skip to content

Commit

Permalink
Merge branch 'current' into mwan-control-plane-blog
Browse files Browse the repository at this point in the history
  • Loading branch information
mirnawong1 authored Jan 21, 2025
2 parents 5082e6c + 0458e50 commit 89592f6
Show file tree
Hide file tree
Showing 52 changed files with 1,011 additions and 858 deletions.
5 changes: 5 additions & 0 deletions website/docs/docs/build/custom-aliases.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,3 +157,8 @@ If these models should indeed have the same database identifier, you can work ar

By default, dbt will create versioned models with the alias `<model_name>_v<v>`, where `<v>` is that version's unique identifier. You can customize this behavior just like for non-versioned models by configuring a custom `alias` or re-implementing the `generate_alias_name` macro.

## Related docs

- [Customize dbt models database, schema, and alias](/guides/customize-schema-alias?step=1) to learn how to customize dbt models database, schema, and alias
- [Custom schema](/docs/build/custom-schemas) to learn how to customize dbt schema
- [Custom database](/docs/build/custom-databases) to learn how to customize dbt database
6 changes: 6 additions & 0 deletions website/docs/docs/build/custom-databases.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,3 +98,9 @@ See docs on macro `dispatch`: ["Managing different global overrides across packa
### BigQuery

When dbt opens a BigQuery connection, it will do so using the `project_id` defined in your active `profiles.yml` target. This `project_id` will be billed for the queries that are executed in the dbt run, even if some models are configured to be built in other projects.

## Related docs

- [Customize dbt models database, schema, and alias](/guides/customize-schema-alias?step=1) to learn how to customize dbt models database, schema, and alias
- [Custom schema](/docs/build/custom-schemas) to learn how to customize dbt model schema
- [Custom aliases](/docs/build/custom-aliases) to learn how to customize dbt model alias name
6 changes: 6 additions & 0 deletions website/docs/docs/build/custom-schemas.md
Original file line number Diff line number Diff line change
Expand Up @@ -207,3 +207,9 @@ In the `generate_schema_name` macro examples shown in the [built-in alternative
If your schema names are being generated incorrectly, double-check your target name in the relevant environment.

For more information, consult the [managing environments in dbt Core](/docs/core/dbt-core-environments) guide.

## Related docs

- [Customize dbt models database, schema, and alias](/guides/customize-schema-alias?step=1) to learn how to customize dbt models database, schema, and alias
- [Custom database](/docs/build/custom-databases) to learn how to customize dbt model database
- [Custom aliases](/docs/build/custom-aliases) to learn how to customize dbt model alias name
4 changes: 3 additions & 1 deletion website/docs/docs/build/data-tests.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,9 @@ having total_amount < 0

The name of this test is the name of the file: `assert_total_payment_amount_is_positive`.

Note, you won't need to include semicolons (;) at the end of the SQL statement in your singular test files as it can cause your test to fail.
Note:
- Omit semicolons (;) at the end of the SQL statement in your singular test files, as they can cause your test to fail.
- Singular tests placed in the tests directory are automatically executed when running `dbt test`. Don't reference singular tests in `model_name.yml`, as they are not treated as generic tests or macros, and doing so will result in an error.

To add a description to a singular test in your project, add a `.yml` file to your `tests` directory, for example, `tests/schema.yml` with the following content:

Expand Down
24 changes: 10 additions & 14 deletions website/docs/docs/build/enhance-your-code.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,21 +7,17 @@ pagination_prev: null

<div className="grid--2-col">

<Card
title="Project variables"
body="Learn how to use project variables to provide data to models for compilation."
link="/docs/build/project-variables"
icon="dbt-bit"/>

<Card
title="Environment variables"
body="Learn how you can use environment variables to customize the behavior of a dbt project."
link="/docs/build/environment-variables"
icon="dbt-bit"/>

</div>
<br />
<div className="grid--2-col">
<Card
title="Hooks and operations"
body="Learn how to use hooks to trigger actions and operations to invoke macros."
link="/docs/build/hooks-operations"
icon="dbt-bit"/>

<Card
title="Packages"
Expand All @@ -30,9 +26,9 @@ pagination_prev: null
icon="dbt-bit"/>

<Card
title="Hooks and operations"
body="Learn how to use hooks to trigger actions and operations to invoke macros."
link="/docs/build/hooks-operations"
icon="dbt-bit"/>
title="Project variables"
body="Learn how to use project variables to provide data to models for compilation."
link="/docs/build/project-variables"
icon="dbt-bit"/>

</div>
</div>
17 changes: 15 additions & 2 deletions website/docs/docs/build/incremental-microbatch.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,14 +29,27 @@ Microbatch is an incremental strategy designed for large time-series datasets:

- Note, microbatch might not be the best strategy for all use cases. Consider other strategies for use cases such as not having a reliable `event_time` column or if you want more control over the incremental logic. Read more in [How `microbatch` compares to other incremental strategies](#how-microbatch-compares-to-other-incremental-strategies).

### How microbatch works
## How microbatch works

When dbt runs a microbatch model — whether for the first time, during incremental runs, or in specified backfills — it will split the processing into multiple queries (or "batches"), based on the `event_time` and `batch_size` you configure.

Each "batch" corresponds to a single bounded time period (by default, a single day of data). Where other incremental strategies operate only on "old" and "new" data, microbatch models treat every batch as an atomic unit that can be built or replaced on its own. Each batch is independent and <Term id="idempotent" />.

This is a powerful abstraction that makes it possible for dbt to run batches [separately](#backfills), concurrently, and [retry](#retry) them independently.

### Adapter-specific behavior

dbt's microbatch strategy uses the most efficient mechanism available for "full batch" replacement on each adapter. This can vary depending on the adapter:

- `dbt-postgres`: Uses the `merge` strategy, which performs "update" or "insert" operations.
- `dbt-redshift`: Uses the `delete+insert` strategy, which "inserts" or "replaces."
- `dbt-snowflake`: Uses the `delete+insert` strategy, which "inserts" or "replaces."
- `dbt-bigquery`: Uses the `insert_overwrite` strategy, which "inserts" or "replaces."
- `dbt-spark`: Uses the `insert_overwrite` strategy, which "inserts" or "replaces."
- `dbt-databricks`: Uses the `replace_where` strategy, which "inserts" or "replaces."

Check out the [supported incremental strategies by adapter](/docs/build/incremental-strategy#supported-incremental-strategies-by-adapter) for more info.

## Example

A `sessions` model aggregates and enriches data that comes from two other models:
Expand Down Expand Up @@ -170,7 +183,7 @@ customers as (

</Tabs>

dbt will instruct the data platform to take the result of each batch query and insert, update, or replace the contents of the `analytics.sessions` table for the same day of data. To perform this operation, dbt will use the most efficient atomic mechanism for "full batch" replacement that is available on each data platform.
dbt will instruct the data platform to take the result of each batch query and [insert, update, or replace](#adapter-specific-behavior) the contents of the `analytics.sessions` table for the same day of data. To perform this operation, dbt will use the most efficient atomic mechanism for "full batch" replacement that is available on each data platform. For details, see [How microbatch works](#how-microbatch-works).

It does not matter whether the table already contains data for that day. Given the same input data, the resulting table is the same no matter how many times a batch is reprocessed.

Expand Down
1 change: 1 addition & 0 deletions website/docs/docs/build/incremental-strategy.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
title: "About incremental strategy"
sidebar_label: "About incremental strategy"
description: "Learn about the various ways (strategies) to implement incremental materializations."
id: "incremental-strategy"
---
Expand Down
81 changes: 40 additions & 41 deletions website/docs/docs/build/metricflow-commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ The following table lists the commands compatible with the dbt Cloud IDE and dbt
| [`list dimension-values`](#list-dimension-values) | List dimensions with metrics. |||
| [`list entities`](#list-entities) | Lists all unique entities. |||
| [`list saved-queries`](#list-saved-queries) | Lists available saved queries. Use the `--show-exports` flag to display each export listed under a saved query or `--show-parameters` to show the full query parameters each saved query uses. |||
| [`query`](#query) | Query metrics, saved queries, and dimensions you want to see in the command line interface. Refer to [query examples](#query-examples) to help you get started. |||
| [`query`](#query) | Query metrics, saved queries, and dimensions you want to see in the command line interface. Refer to [query examples](#query-examples) to query metrics and dimensions (such as querying metrics, using the `where` filter, adding an `order`, and more). |||
| [`validate`](#validate) | Validates semantic model configurations. |||
| [`export`](#export) | Runs exports for a singular saved query for testing and generating exports in your development environment. You can also use the `--select` flag to specify particular exports from a saved query. |||
| [`export-all`](#export-all) | Runs exports for multiple saved queries at once, saving time and effort. |||
Expand Down Expand Up @@ -118,7 +118,7 @@ Use the `mf` prefix before the command name to execute them in dbt Core. For exa
</TabItem>
</Tabs>

### List metrics
## List metrics
This command lists the metrics with their available dimensions:

```bash
Expand All @@ -132,7 +132,7 @@ Options:
--help Show this message and exit.
```

### List dimensions
## List dimensions

This command lists all unique dimensions for a metric or multiple metrics. It displays only common dimensions when querying multiple metrics:

Expand All @@ -146,7 +146,7 @@ Options:
--help Show this message and exit.
```

### List dimension-values
## List dimension-values

This command lists all dimension values with the corresponding metric:

Expand All @@ -168,7 +168,7 @@ Options:
--help Show this message and exit.
```

### List entities
## List entities

This command lists all unique entities:

Expand All @@ -182,7 +182,7 @@ Options:
--help Show this message and exit.
```

### List saved queries
## List saved queries

This command lists all available saved queries:

Expand All @@ -209,7 +209,7 @@ The list of available saved queries:
- Export(new_customer_orders, alias=orders, schemas=customer_schema, exportAs=TABLE)
```

### Validate
## Validate

The following command performs validations against the defined semantic model configurations.

Expand All @@ -234,7 +234,7 @@ Options:
--help Show this message and exit.
```
### Health checks
## Health checks
The following command performs a health check against the data platform you provided in the configs.
Expand All @@ -244,7 +244,7 @@ Note, in dbt Cloud the `health-checks` command isn't required since it uses dbt
mf health-checks # In dbt Core
```
### Tutorial
## Tutorial
Follow the dedicated MetricFlow tutorial to help you get started:
<!--dbt sl tutorial # In dbt Cloud-->
Expand All @@ -253,7 +253,7 @@ Follow the dedicated MetricFlow tutorial to help you get started:
mf tutorial # In dbt Core
```
### Query
## Query
Create a new query with MetricFlow and execute it against your data platform. The query returns the following result:
Expand Down Expand Up @@ -284,10 +284,11 @@ Options:
time of the data (inclusive)
*Not available in dbt Cloud yet

--where TEXT SQL-like where statement provided as a string and wrapped in quotes: --where "condition_statement"
For example, to query a single statement: --where "revenue > 100"
To query multiple statements: --where "revenue > 100 and user_count < 1000"
To add a dimension filter to a where filter, ensure the filter item is part of your model.
--where TEXT SQL-like where statement provided as a string and wrapped in quotes.
All filter items must explicitly reference fields or dimensions that are part of your model.
To query a single statement: ---where "{{ Dimension('order_id__revenue') }} > 100"
To query multiple statements: --where "{{ Dimension('order_id__revenue') }} > 100 and {{ Dimension('user_count') }} < 1000"
To add a dimension filter, use the `Dimension()` template wrapper to indicate that the filter item is part of your model.
Refer to the [FAQ](#faqs) for more info on how to do this using a template wrapper.
--limit TEXT Limit the number of rows out using an int or leave
Expand Down Expand Up @@ -318,13 +319,18 @@ Options:
```
### Query examples
## Query examples
The following tabs present various types of query examples that you can use to query metrics and dimensions. Select the tab that best suits your needs:
This section shares various types of query examples that you can use to query metrics and dimensions. The query examples listed are:
<Tabs>
- [Query metrics](#query-metrics)
- [Query dimensions](#query-dimensions)
- [Add `order`/`limit` function](#add-orderlimit)
- [Add `where` clause](#add-where-clause)
- [Filter by time](#filter-by-time)
- [Query saved queries](#query-saved-queries)
<TabItem value="eg1" label="Metrics">
### Query metrics
Use the example to query multiple metrics by dimension and return the `order_total` and `users_active` metrics by `metric_time.`
Expand All @@ -347,9 +353,8 @@ mf query --metrics order_total,users_active --group-by metric_time # In dbt Core
| 2017-06-20 | 712.51 |
| 2017-06-21 | 541.65 |
```
</TabItem>
<TabItem value="eg2" label="Dimensions">
### Query dimensions
You can include multiple dimensions in a query. For example, you can group by the `is_food_order` dimension to confirm if orders were for food or not. Note that when you query a dimension, you need to specify the primary entity for that dimension. In the following example, the primary entity is `order_id`.
Expand All @@ -375,9 +380,7 @@ mf query --metrics order_total --group-by order_id__is_food_order # In dbt Core
| 2017-06-19 | True | 448.11 |
```
</TabItem>
<TabItem value="eg3" label="Order/limit">
### Add order/limit
You can add order and limit functions to filter and present the data in a readable format. The following query limits the data set to 10 records and orders them by `metric_time`, descending. Note that using the `-` prefix will sort the query in descending order. Without the `-` prefix sorts the query in ascending order.
Expand Down Expand Up @@ -405,21 +408,24 @@ mf query --metrics order_total --group-by order_id__is_food_order --limit 10 --o
| 2017-08-29 | False | 333.65 |
| 2017-08-28 | False | 334.73 |
```
</TabItem>
<TabItem value="eg4" label="where clause">
### Add where clause
You can further filter the data set by adding a `where` clause to your query. The following example shows you how to query the `order_total` metric, grouped by `is_food_order` with multiple where statements (orders that are food orders and orders from the week starting on or after Feb 1st, 2024). Note that when you query a dimension, you need to specify the primary entity for that dimension. In the following example, the primary entity is `order_id`.
You can further filter the data set by adding a `where` clause to your query. The following example shows you how to query the `order_total` metric, grouped by `is_food_order` with multiple `where` statements (orders that are food orders and orders from the week starting on or after Feb 1st, 2024).
**Query**
```bash
# In dbt Cloud
dbt sl query --metrics order_total --group-by order_id__is_food_order --where "{{ Dimension('order_id__is_food_order') }} = True and metric_time__week >= '2024-02-01'"
dbt sl query --metrics order_total --group-by order_id__is_food_order --where "{{ Dimension('order_id__is_food_order') }} = True and {{ TimeDimension('metric_time', 'week') }} >= '2024-02-01'"
# In dbt Core
mf query --metrics order_total --group-by order_id__is_food_order --where "{{ Dimension('order_id__is_food_order') }} = True and metric_time__week >= '2024-02-01'"
mf query --metrics order_total --group-by order_id__is_food_order --where "{{ Dimension('order_id__is_food_order') }} = True and TimeDimension('metric_time', 'week') }} >= '2024-02-01'"
```
Notes:
- The type of dimension changes the syntax you use. So if you have a date field, use `TimeDimension` instead of `Dimension`.
- When you query a dimension, you need to specify the primary entity for that dimension. In the example just shared, the primary entity is `order_id`.
**Result**
```bash
✔ Success 🦄 - query completed after 1.06 seconds
Expand All @@ -437,9 +443,7 @@ mf query --metrics order_total --group-by order_id__is_food_order --where "{{ Di
| 2017-08-22 | True | 401.91 |
```
</TabItem>
<TabItem value="eg5" label=" Filter by time">
### Filter by time
To filter by time, there are dedicated start and end time options. Using these options to filter by time allows MetricFlow to further optimize query performance by pushing down the where filter when appropriate.
Expand Down Expand Up @@ -468,9 +472,7 @@ mf query --metrics order_total --group-by order_id__is_food_order --limit 10 --o
| 2017-08-22 | True | 401.91 |
```
</TabItem>
<TabItem value="eg6" label=" Saved queries">
### Query saved queries
You can use this for frequently used queries. Replace `<name>` with the name of your [saved query](/docs/build/saved-queries).
Expand All @@ -487,10 +489,7 @@ For example, if you use dbt Cloud and have a saved query named `new_customer_ord
When querying [saved queries](/docs/build/saved-queries), you can use parameters such as `where`, `limit`, `order`, `compile`, and so on. However, keep in mind that you can't access `metric` or `group_by` parameters in this context. This is because they are predetermined and fixed parameters for saved queries, and you can't change them at query time. If you would like to query more metrics or dimensions, you can build the query using the standard format.
:::
</TabItem>
</Tabs>
### Additional query examples
## Additional query examples
The following tabs present additional query examples, like exporting to a CSV. Select the tab that best suits your needs:
Expand Down Expand Up @@ -559,7 +558,7 @@ mf query --metrics order_total --group-by metric_time,is_food_order --limit 10 -
</TabItem>
</Tabs>
### Time granularity
## Time granularity
Optionally, you can specify the time granularity you want your data to be aggregated at by appending two underscores and the unit of granularity you want to `metric_time`, the global time dimension. You can group the granularity by: `day`, `week`, `month`, `quarter`, and `year`.
Expand All @@ -571,7 +570,7 @@ dbt sl query --metrics revenue --group-by metric_time__month # In dbt Cloud
mf query --metrics revenue --group-by metric_time__month # In dbt Core
```
### Export
## Export
Run [exports for a specific saved query](/docs/use-dbt-semantic-layer/exports#exports-for-single-saved-query). Use this command to test and generate exports in your development environment. You can also use the `--select` flag to specify particular exports from a saved query. Refer to [exports in development](/docs/use-dbt-semantic-layer/exports#exports-in-development) for more info.
Expand All @@ -581,7 +580,7 @@ Export is available in dbt Cloud.
dbt sl export
```
### Export-all
## Export-all
Run [exports for multiple saved queries](/docs/use-dbt-semantic-layer/exports#exports-for-multiple-saved-queries) at once. This command provides a convenient way to manage and execute exports for several queries simultaneously, saving time and effort. Refer to [exports in development](/docs/use-dbt-semantic-layer/exports#exports-in-development) for more info.
Expand Down
Loading

0 comments on commit 89592f6

Please sign in to comment.