diff --git a/website/docs/docs/build/snapshots.md b/website/docs/docs/build/snapshots.md index f70fdfbdb31..23098fe867a 100644 --- a/website/docs/docs/build/snapshots.md +++ b/website/docs/docs/build/snapshots.md @@ -37,10 +37,176 @@ This order is now in the "shipped" state, but we've lost the information about w -- To configure snapshots in versions 1.8 and earlier, refer to [Configure snapshots in versions 1.8 and earlier](#configure-snapshots-in-versions-18-and-earlier). These versions use an older syntax where snapshots are defined within a snapshot block in a `.sql` file, typically located in your `snapshots` directory. -- Note that defining multiple resources in a single file can significantly slow down parsing and compilation. For faster and more efficient management, consider the updated snapshot YAML syntax, [available now in the "Latest" release track in dbt Cloud](/docs/dbt-versions/cloud-release-tracks) or [dbt Core v1.9 and later](/docs/dbt-versions/core). - - For more information on how to migrate from the legacy snapshot configurations to the updated snapshot YAML syntax, refer to [Snapshot configuration migration](/reference/snapshot-configs#snapshot-configuration-migration). +In old versions of dbt Core (v1.8 and earlier), snapshots must be defined in snapshot blocks inside of your [snapshots directory](/reference/project-configs/snapshot-paths). These snapshots do not have native support for environments or deferral, making previewing changes in development difficult. +The modern, environment-aware way to create snapshots is to define them in YAML. This requires dbt Core v1.9 or later, or to be on any [dbt Cloud release track](/docs/dbt-versions/cloud-release-tracks). + +- For more information about configuring snapshots in a `.sql` file, refer to the [Legacy snapshot configurations](/reference/resource-configs/snapshots-jinja-legacy) page. + +The following example shows how to configure a snapshot using the legacy syntax: + + + +```sql +{% snapshot orders_snapshot %} + +{{ + config( + target_database='analytics', + target_schema='snapshots', + unique_key='id', + + strategy='timestamp', + updated_at='updated_at', + ) +}} + +select * from {{ source('jaffle_shop', 'orders') }} + +{% endsnapshot %} +``` + + + +The following table outlines the configurations available for snapshots in versions 1.8 and earlier: + +| Config | Description | Required? | Example | +| ------ | ----------- | --------- | ------- | +| [target_database](/reference/resource-configs/target_database) | The database that dbt should render the snapshot table into | No | analytics | +| [target_schema](/reference/resource-configs/target_schema) | The schema that dbt should render the snapshot table into | Yes | snapshots | +| [strategy](/reference/resource-configs/strategy) | The snapshot strategy to use. One of `timestamp` or `check` | Yes | timestamp | +| [unique_key](/reference/resource-configs/unique_key) | A column or expression for the record | Yes | id | +| [check_cols](/reference/resource-configs/check_cols) | If using the `check` strategy, then the columns to check | Only if using the `check` strategy | ["status"] | +| [updated_at](/reference/resource-configs/updated_at) | If using the `timestamp` strategy, the timestamp column to compare | Only if using the `timestamp` strategy | updated_at | +| [invalidate_hard_deletes](/reference/resource-configs/invalidate_hard_deletes) | Find hard deleted records in source, and set `dbt_valid_to` current time if no longer exists | No | True | + +- A number of other configurations are also supported (like `tags` and `post-hook`), check out the full list [here](/reference/snapshot-configs). +- Snapshots can be configured from both your `dbt_project.yml` file and a `config` block, check out the [configuration docs](/reference/snapshot-configs) for more information. +- Note: BigQuery users can use `target_project` and `target_dataset` as aliases for `target_database` and `target_schema`, respectively. +- Before v1.9, `target_schema` (required) and `target_database` (optional) set a fixed schema or database for snapshots, making it hard to separate dev and prod environments. In v1.9, `target_schema` became optional, allowing environment-aware snapshots. By default, snapshots now use `generate_schema_name` or `generate_database_name`, but developers can still specify a custom location using [schema](/reference/resource-configs/schema) and [database](/reference/resource-configs/database), consistent with other resource types. + +### Configuration example + +To add a snapshot to your project: + +1. Create a file in your `snapshots` directory with a `.sql` file extension. For example, `snapshots/orders.sql` +2. Use a `snapshot` block to define the start and end of a snapshot: + + + +```sql +{% snapshot orders_snapshot %} + +{% endsnapshot %} +``` + + + +3. Write a `select` statement within the snapshot block (tips for writing a good snapshot query are below). This select statement defines the results that you want to snapshot over time. You can use `sources` and `refs` here. + + + +```sql +{% snapshot orders_snapshot %} + +select * from {{ source('jaffle_shop', 'orders') }} + +{% endsnapshot %} +``` + + + +4. Check whether the result set of your query includes a reliable timestamp column that indicates when a record was last updated. For our example, the `updated_at` column reliably indicates record changes, so we can use the `timestamp` strategy. If your query result set does not have a reliable timestamp, you'll need to instead use the `check` strategy — more details on this below. + +5. Add configurations to your snapshot using a `config` block (more details below). You can also configure your snapshot from your `dbt_project.yml` file ([docs](/reference/snapshot-configs)). + + + + + +```sql +{% snapshot orders_snapshot %} + +{{ + config( + target_database='analytics', + target_schema='snapshots', + unique_key='id', + + strategy='timestamp', + updated_at='updated_at', + ) +}} + +select * from {{ source('jaffle_shop', 'orders') }} + +{% endsnapshot %} +``` + + + +6. Run the `dbt snapshot` [command](/reference/commands/snapshot) — for our example, a new table will be created at `analytics.snapshots.orders_snapshot`. You can change the `target_database` configuration, the `target_schema` configuration and the name of the snapshot (as defined in `{% snapshot .. %}`) will change how dbt names this table. + + + + + + + +```sql +{% snapshot orders_snapshot %} + +{{ + config( + schema='snapshots', + unique_key='id', + strategy='timestamp', + updated_at='updated_at', + ) +}} + +select * from {{ source('jaffle_shop', 'orders') }} + +{% endsnapshot %} +``` + + + +6. Run the `dbt snapshot` [command](/reference/commands/snapshot) — for our example, a new table will be created at `analytics.snapshots.orders_snapshot`. The [`schema`](/reference/resource-configs/schema) config will utilize the `generate_schema_name` macro. + + + +``` +$ dbt snapshot +Running with dbt=1.8.0 + +15:07:36 | Concurrency: 8 threads (target='dev') +15:07:36 | +15:07:36 | 1 of 1 START snapshot snapshots.orders_snapshot...... [RUN] +15:07:36 | 1 of 1 OK snapshot snapshots.orders_snapshot..........[SELECT 3 in 1.82s] +15:07:36 | +15:07:36 | Finished running 1 snapshots in 0.68s. + +Completed successfully + +Done. PASS=2 ERROR=0 SKIP=0 TOTAL=1 +``` + +7. Inspect the results by selecting from the table dbt created. After the first run, you should see the results of your query, plus the [snapshot meta fields](#snapshot-meta-fields) as described earlier. + +8. Run the `dbt snapshot` command again, and inspect the results. If any records have been updated, the snapshot should reflect this. + +9. Select from the `snapshot` in downstream models using the `ref` function. + + + +```sql +select * from {{ ref('orders_snapshot') }} +``` + + + +10. Snapshots are only useful if you run them frequently — schedule the `snapshot` command to run regularly. @@ -85,14 +251,14 @@ The following table outlines the configurations available for snapshots: | [snapshot_meta_column_names](/reference/resource-configs/snapshot_meta_column_names) | Customize the names of the snapshot meta fields | No | dictionary | | [hard_deletes](/reference/resource-configs/hard-deletes) | Specify how to handle deleted rows from the source. Supported options are `ignore` (default), `invalidate` (replaces the legacy `invalidate_hard_deletes=true`), and `new_record`.| No | string | - -- In versions prior to v1.9, the `target_schema` (required) and `target_database` (optional) configurations defined a single schema or database to build a snapshot across users and environment. This created problems when testing or developing a snapshot, as there was no clear separation between development and production environments. In v1.9, `target_schema` became optional, allowing snapshots to be environment-aware. By default, without `target_schema` or `target_database` defined, snapshots now use the `generate_schema_name` or `generate_database_name` macros to determine where to build. Developers can still set a custom location with [`schema`](/reference/resource-configs/schema) and [`database`](/reference/resource-configs/database) configs, consistent with other resource types. +- In v1.9, `target_schema` became optional, allowing snapshots to be environment-aware. By default, without `target_schema` or `target_database` defined, snapshots now use the `generate_schema_name` or `generate_database_name` macros to determine where to build. +- Developers can still set a custom location with [`schema`](/reference/resource-configs/schema) and [`database`](/reference/resource-configs/database) configs, consistent with other resource types. - A number of other configurations are also supported (for example, `tags` and `post-hook`). For the complete list, refer to [Snapshot configurations](/reference/snapshot-configs). - You can configure snapshots from both the `dbt_project.yml` file and a `config` block. For more information, refer to the [configuration docs](/reference/snapshot-configs). ### Add a snapshot to your project -To add a snapshot to your project follow these steps. For users on versions 1.8 and earlier, refer to [Configure snapshots in versions 1.8 and earlier](#configure-snapshots-in-versions-18-and-earlier). +To add a snapshot to your project follow these steps. For users on versions 1.8 and earlier, refer to [Legacy snapshot configurations](/reference/resource-configs/snapshots-jinja-legacy). 1. Create a YAML file in your `snapshots` directory: `snapshots/orders_snapshot.yml` and add your configuration details. You can also configure your snapshot from your `dbt_project.yml` file ([docs](/reference/snapshot-configs)). @@ -561,189 +727,6 @@ All of these column names can be customized using the `snapshot_meta_column_name -## Configure snapshots in versions 1.8 and earlier - - - -For information about configuring snapshots in dbt versions 1.8 and earlier, select **1.8** from the documentation version picker, and it will appear in this section. - -To configure snapshots in versions 1.9 and later, refer to [Configuring snapshots](#configuring-snapshots). The latest versions use an updated snapshot configuration syntax that optimizes performance. - - - - - -- In dbt versions 1.8 and earlier, snapshots are `select` statements, defined within a snapshot block in a `.sql` file (typically in your `snapshots` directory). You'll also need to configure your snapshot to tell dbt how to detect record changes. -- The earlier dbt versions use an older syntax that allows for defining multiple resources in a single file. This syntax can significantly slow down parsing and compilation. -- For faster and more efficient management, consider [choosing the "Latest" release track in dbt Cloud](/docs/dbt-versions/cloud-release-tracks) or the [latest version of dbt Core](/docs/dbt-versions/core), which introduces an updated snapshot configuration syntax that optimizes performance. - - For more information on how to migrate from the legacy snapshot configurations to the updated snapshot YAML syntax, refer to [Snapshot configuration migration](/reference/snapshot-configs#snapshot-configuration-migration). - -The following example shows how to configure a snapshot: - - - -```sql -{% snapshot orders_snapshot %} - -{{ - config( - target_database='analytics', - target_schema='snapshots', - unique_key='id', - - strategy='timestamp', - updated_at='updated_at', - ) -}} - -select * from {{ source('jaffle_shop', 'orders') }} - -{% endsnapshot %} -``` - - - -The following table outlines the configurations available for snapshots in versions 1.8 and earlier: - -| Config | Description | Required? | Example | -| ------ | ----------- | --------- | ------- | -| [target_database](/reference/resource-configs/target_database) | The database that dbt should render the snapshot table into | No | analytics | -| [target_schema](/reference/resource-configs/target_schema) | The schema that dbt should render the snapshot table into | Yes | snapshots | -| [strategy](/reference/resource-configs/strategy) | The snapshot strategy to use. One of `timestamp` or `check` | Yes | timestamp | -| [unique_key](/reference/resource-configs/unique_key) | A column or expression for the record | Yes | id | -| [check_cols](/reference/resource-configs/check_cols) | If using the `check` strategy, then the columns to check | Only if using the `check` strategy | ["status"] | -| [updated_at](/reference/resource-configs/updated_at) | A column in your snapshot query results that indicates when each record was last updated, used in the `timestamp` strategy. May support ISO date strings and unix epoch integers, depending on the data platform you use. | Only if using the `timestamp` strategy | updated_at | -| [invalidate_hard_deletes](/reference/resource-configs/invalidate_hard_deletes) | Find hard deleted records in source, and set `dbt_valid_to` current time if no longer exists | No | True | - -- A number of other configurations are also supported (e.g. `tags` and `post-hook`), check out the full list [here](/reference/snapshot-configs). -- Snapshots can be configured from both your `dbt_project.yml` file and a `config` block, check out the [configuration docs](/reference/snapshot-configs) for more information. -- Note: BigQuery users can use `target_project` and `target_dataset` as aliases for `target_database` and `target_schema`, respectively. - -### Configuration example - -To add a snapshot to your project: - -1. Create a file in your `snapshots` directory with a `.sql` file extension, e.g. `snapshots/orders.sql` -2. Use a `snapshot` block to define the start and end of a snapshot: - - - -```sql -{% snapshot orders_snapshot %} - -{% endsnapshot %} -``` - - - -3. Write a `select` statement within the snapshot block (tips for writing a good snapshot query are below). This select statement defines the results that you want to snapshot over time. You can use `sources` and `refs` here. - - - -```sql -{% snapshot orders_snapshot %} - -select * from {{ source('jaffle_shop', 'orders') }} - -{% endsnapshot %} -``` - - - -4. Check whether the result set of your query includes a reliable timestamp column that indicates when a record was last updated. For our example, the `updated_at` column reliably indicates record changes, so we can use the `timestamp` strategy. If your query result set does not have a reliable timestamp, you'll need to instead use the `check` strategy — more details on this below. - -5. Add configurations to your snapshot using a `config` block (more details below). You can also configure your snapshot from your `dbt_project.yml` file ([docs](/reference/snapshot-configs)). - - - - - -```sql -{% snapshot orders_snapshot %} - -{{ - config( - target_database='analytics', - target_schema='snapshots', - unique_key='id', - - strategy='timestamp', - updated_at='updated_at', - ) -}} - -select * from {{ source('jaffle_shop', 'orders') }} - -{% endsnapshot %} -``` - - - -6. Run the `dbt snapshot` [command](/reference/commands/snapshot) — for our example a new table will be created at `analytics.snapshots.orders_snapshot`. You can change the `target_database` configuration, the `target_schema` configuration and the name of the snapshot (as defined in `{% snapshot .. %}`) will change how dbt names this table. - - - - - - - -```sql -{% snapshot orders_snapshot %} - -{{ - config( - schema='snapshots', - unique_key='id', - strategy='timestamp', - updated_at='updated_at', - ) -}} - -select * from {{ source('jaffle_shop', 'orders') }} - -{% endsnapshot %} -``` - - - -6. Run the `dbt snapshot` [command](/reference/commands/snapshot) — for our example, a new table will be created at `analytics.snapshots.orders_snapshot`. The [`schema`](/reference/resource-configs/schema) config will utilize the `generate_schema_name` macro. - - - -``` -$ dbt snapshot -Running with dbt=1.8.0 - -15:07:36 | Concurrency: 8 threads (target='dev') -15:07:36 | -15:07:36 | 1 of 1 START snapshot snapshots.orders_snapshot...... [RUN] -15:07:36 | 1 of 1 OK snapshot snapshots.orders_snapshot..........[SELECT 3 in 1.82s] -15:07:36 | -15:07:36 | Finished running 1 snapshots in 0.68s. - -Completed successfully - -Done. PASS=2 ERROR=0 SKIP=0 TOTAL=1 -``` - -7. Inspect the results by selecting from the table dbt created. After the first run, you should see the results of your query, plus the [snapshot meta fields](#snapshot-meta-fields) as described earlier. - -8. Run the `dbt snapshot` command again, and inspect the results. If any records have been updated, the snapshot should reflect this. - -9. Select from the `snapshot` in downstream models using the `ref` function. - - - -```sql -select * from {{ ref('orders_snapshot') }} -``` - - - -10. Snapshots are only useful if you run them frequently — schedule the `snapshot` command to run regularly. - - - ## FAQs diff --git a/website/docs/reference/resource-configs/snapshots-jinja-legacy.md b/website/docs/reference/resource-configs/snapshots-jinja-legacy.md new file mode 100644 index 00000000000..ec43c01c1fe --- /dev/null +++ b/website/docs/reference/resource-configs/snapshots-jinja-legacy.md @@ -0,0 +1,591 @@ +--- +title: Legacy snapshot configurations +description: Read about how to configure snapshots using legacy jinja blocks +sidebar: "Legacy configuration" +--- + +# Legacy snapshot configuration + + + +Use legacy SQL-based snapshot configurations with Jinja blocks in any dbt version. dbt v1.9 introduced YAML-based configs for better readability and environment awareness. + + + +There are situations where you want to use the legacy syntax for [snapshots](/docs/build/snapshots) in any dbt version or release track. This page details how you can use the legacy SQL-based configurations if you need to. + +In dbt v1.9, this syntax was replaced with a [YAML-based configuration](/reference/snapshot-configs#configuring-snapshots) in [dbt Cloud's "Latest" release track](/docs/dbt-versions/cloud-release-tracks). The benefits of YAML-based configurations are that the snapshots are environment aware, meaning you don't have to specify `schema` or `database`, and the syntax is more concise. + +For new snapshots, we recommend using these latest YAML-based configs. If you'd like to move to the YAML-based configuration for existing snapshots, you can [migrate over](/reference/snapshot-configs#snapshot-configuration-migration). + +When would you want to use the SQL-based syntax and YAML-based syntax? + +- SQL-based syntax: + - Defined in `.sql` files within a snapshot Jinja block, typically located in your `snapshots` directory. Available in all versions. + - Useful for existing snapshots already using this syntax. + - Suitable for performing very light transformations (but creating a separate ephemeral model for transformations is recommended for better maintainability). + +- YAML-based syntax: + - Defined in `whatever_name.yml` or in the `snapshots` or `models` directory you prefer. Available in dbt Cloud's "Latest" release track and dbt v1.9 and later. + - Ideal for new snapshots or existing snapshots that need to be [migrated](/reference/snapshot-configs#snapshot-configuration-migration). + - Create transformations separate from the snapshot file by creating an ephemeral model and referencing it in the snapshot using the `relation` field. + +## Snapshot configurations + +Although you can use the more performant YAML-based configuration, you might still want to use the legacy configuration to define your snapshots if it suits your needs. + +Snapshots can be configured in two main ways: +- Using [snapshot-specific configurations](#snapshot-specific-configurations) +- Or using [general configurations](#general-configuration) + +These configurations allow you to control how dbt detects changes in your data and where snapshots are stored. Both types of configurations can coexist in your project in the same `config` block (or from your `dbt_project.yml` file or `properties.yaml` file). + +One of the most important configs you can decide is [strategies](#snapshot-strategies), which tells dbt how to detect modified rows. + +### Snapshot specific configurations +Snapshot-specific configurations are applicable to only one dbt resource type rather than multiple resource types. You can define these settings within the resource’s file using the `{{ config() }}` macro (as well as in the project file (`dbt_project.yml`) or a property file (`models/properties.yml` for models, similarly for other resources)). + + + +```sql +{ % snapshot orders_snapshot %} + +{{ config( + [target_schema](/reference/resource-configs/target_schema)="", + [target_database](/reference/resource-configs/target_database)="", + [unique_key](/reference/resource-configs/unique_key)="", + [strategy](/reference/resource-configs/strategy)="timestamp" | "check", + [updated_at](/reference/resource-configs/updated_at)="", + [check_cols](/reference/resource-configs/check_cols)=[""] | "all" + [invalidate_hard_deletes](/reference/resource-configs/invalidate_hard_deletes) : true | false +) +}} + +select * from {{ source('jaffle_shop', 'orders') }} + +{% endsnapshot %} +``` + + +### General configuration +Use general configurations for broader operational settings applicable across multiple resource types. Like resource-specific configurations, these can also be set in the project file, property files, or within resource-specific files using a config block. + + + +```sql +{{ config( + [enabled](/reference/resource-configs/enabled)=true | false, + [tags](/reference/resource-configs/tags)="" | [""], + [alias](/reference/resource-configs/alias)="", + [pre_hook](/reference/resource-configs/pre-hook-post-hook)="" | [""], + [post_hook](/reference/resource-configs/pre-hook-post-hook)="" | [""] + [persist_docs](/reference/resource-configs/persist_docs)={} + [grants](/reference/resource-configs/grants)={} +) }} +``` + + +### Snapshot strategies +Snapshot "strategies" define how dbt knows if a row has changed. There are two strategies built-in to dbt that require the `strategy` parameter: + +- [Timestamp](/reference/resource-configs/snapshots-jinja-legacy?strategy=timestamp#snapshot-strategies) — Uses an `updated_at` column to determine if a row has changed. +- [Check](/reference/resource-configs/snapshots-jinja-legacy?strategy=check#snapshot-strategies) — Compares a list of columns between their current and historical values to determine if a row has changed. Uses the `check_cols` parameter. + + + + +The timestamp strategy uses an `updated_at` field to determine if a row has changed. If the configured `updated_at` column for a row is more recent than the last time the snapshot ran, then dbt will invalidate the old record and record the new one. If the timestamps are unchanged, then dbt will not take any action. + +#### Example + + + +```sql +{% snapshot orders_snapshot_timestamp %} + + {{ + config( + target_schema='snapshots', + strategy='timestamp', + unique_key='id', + updated_at='updated_at', + ) + }} + + select * from {{ source('jaffle_shop', 'orders') }} + +{% endsnapshot %} +``` + + + + + +The check strategy is useful for tables which do not have a reliable `updated_at` column. It requires the `check_cols` parameter, which is a list of columns within the results of your snapshot query to check for changes. Alternatively, use all columns using the all value (however this may be less performant). + +#### Example + + + +```sql +{% snapshot orders_snapshot_check %} + + {{ + config( + strategy='check', + unique_key='id', + check_cols=['status', 'is_cancelled'], + ) + }} + + select * from {{ source('jaffle_shop', 'orders') }} + +{% endsnapshot %} +``` + + +#### Examples + + + + +```sql +{% snapshot orders_snapshot_check %} + + {{ + config( + strategy='check', + unique_key='id', + check_cols=['status', 'is_cancelled'], + ) + }} + + select * from {{ source('jaffle_shop', 'orders') }} + +{% endsnapshot %} +``` + + + + + + + +```sql +{% snapshot orders_snapshot_check %} + + {{ + config( + strategy='check', + unique_key='id', + check_cols='all', + ) + }} + + select * from {{ source('jaffle_shop', 'orders') }} + +{% endsnapshot %} +``` + + + + + +## Configuration reference + +Configure your snapshot to tell dbt how to detect record changes. Snapshots are `select` statements, defined within a snapshot block in a `.sql` file (typically in your `snapshots` directory or any other directory). + +The following table outlines the configurations available for snapshots: + + + +| Config | Description | Required? | Example | +| ------ | ----------- | --------- | ------- | +| [target_database](/reference/resource-configs/target_database) | The database that dbt should render the snapshot table into | No | analytics | +| [target_schema](/reference/resource-configs/target_schema) | The schema that dbt should render the snapshot table into | Yes | snapshots | +| [strategy](/reference/resource-configs/strategy) | The snapshot strategy to use. One of `timestamp` or `check` | Yes | timestamp | +| [unique_key](/reference/resource-configs/unique_key) | A column or expression for the record | Yes | id | +| [check_cols](/reference/resource-configs/check_cols) | If using the `check` strategy, then the columns to check | Only if using the `check` strategy | ["status"] | +| [updated_at](/reference/resource-configs/updated_at) | If using the `timestamp` strategy, the timestamp column to compare | Only if using the `timestamp` strategy | updated_at | +| [invalidate_hard_deletes](/reference/resource-configs/invalidate_hard_deletes) | Find hard deleted records in source, and set `dbt_valid_to` current time if no longer exists | No | True | + +- A number of other configurations are also supported (like, `tags` and `post-hook`), check out the full list [here](/reference/snapshot-configs). +- Snapshots can be configured from both your `dbt_project.yml` file and a `config` block, check out the [configuration docs](/reference/snapshot-configs) for more information. +- Note: BigQuery users can use `target_project` and `target_dataset` as aliases for `target_database` and `target_schema`, respectively. + + + + + +| Config | Description | Required? | Example | +| ------ | ----------- | --------- | ------- | +| [database](/reference/resource-configs/database) |Specify a custom database for the snapshot | No | analytics | +| [schema](/reference/resource-configs/schema) | Specify a custom schema for the snapshot | No | snapshots | +| [strategy](/reference/resource-configs/strategy) | The snapshot strategy to use. One of `timestamp` or `check` | Yes | timestamp | +| [unique_key](/reference/resource-configs/unique_key) | A column or expression for the record | Yes | id | +| [check_cols](/reference/resource-configs/check_cols) | If using the `check` strategy, then the columns to check | Only if using the `check` strategy | ["status"] | +| [updated_at](/reference/resource-configs/updated_at) | If using the `timestamp` strategy, the timestamp column to compare | Only if using the `timestamp` strategy | updated_at | +| [invalidate_hard_deletes](/reference/resource-configs/invalidate_hard_deletes) | Find hard deleted records in source, and set `dbt_valid_to` current time if no longer exists | No | True | + +- A number of other configurations are also supported (like, `tags` and `post-hook`), check out the full list [here](/reference/snapshot-configs). +- Snapshots can be configured from both your `dbt_project.yml` file and a `config` block, check out the [configuration docs](/reference/snapshot-configs) for more information. +- Note: BigQuery users can use `target_project` and `target_dataset` as aliases for `target_database` and `target_schema`, respectively. +- Before v1.9, `target_schema` (required) and `target_database` (optional) set a fixed schema or database for snapshots, making it hard to separate dev and prod environments. In v1.9, `target_schema` became optional, allowing environment-aware snapshots. By default, snapshots now use `generate_schema_name` or `generate_database_name`, but developers can still specify a custom location using [schema](/reference/resource-configs/schema) and [database](/reference/resource-configs/database), consistent with other resource types. + + + + +## Add snapshot to a project + +To add a snapshot to your project: + +1. Create a file in your `snapshots` directory with a `.sql` file extension. For example,`snapshots/orders.sql` +2. Use a `snapshot` block to define the start and end of a snapshot: + + + +```sql +{% snapshot orders_snapshot %} + +{% endsnapshot %} +``` + + + +3. Write a `select` statement within the snapshot block (tips for writing a good snapshot query are below). This select statement defines the results that you want to snapshot over time. You can use `sources` or `refs` here. + + + +```sql +{% snapshot orders_snapshot %} + +select * from {{ source('jaffle_shop', 'orders') }} + +{% endsnapshot %} +``` + + + +4. Check whether the result set of your query includes a reliable timestamp column that indicates when a record was last updated. For our example, the `updated_at` column reliably indicates record changes, so we can use the `timestamp` strategy. If your query result set does not have a reliable timestamp, you'll need to instead use the `check` strategy — more details on this in the next step. + +5. Add configurations to your snapshot using a `config` block. You can also [configure your snapshot from your `dbt_project.yml` file](/reference/snapshot-configs). + + + + + +```sql +{% snapshot orders_snapshot %} + +{{ + config( + target_database='analytics', + target_schema='snapshots', + unique_key='id', + + strategy='timestamp', + updated_at='updated_at', + ) +The following table outlines the configurations available for snapshots: + +select * from {{ source('jaffle_shop', 'orders') }} + +{% endsnapshot %} +``` + + + + + + + + +```sql +{% snapshot orders_snapshot %} + +{{ + config( + database='analytics', + schema='snapshots', + unique_key='id', + + strategy='timestamp', + updated_at='updated_at', + ) +The following table outlines the configurations available for snapshots: + +select * from {{ source('jaffle_shop', 'orders') }} + +{% endsnapshot %} +``` + + + + + +6. Run the `dbt snapshot` [command](/reference/commands/snapshot). For our example, a new table will be created at `analytics.snapshots.orders_snapshot`. You can change the `target_database` configuration, the `target_schema` configuration and the name of the snapshot (as defined in `{% snapshot .. %}`) will change how dbt names this table. + +``` +dbt snapshot +Running with dbt=1.8.0 + +15:07:36 | Concurrency: 8 threads (target='dev') +15:07:36 | +15:07:36 | 1 of 1 START snapshot snapshots.orders_snapshot...... [RUN] +15:07:36 | 1 of 1 OK snapshot snapshots.orders_snapshot..........[SELECT 3 in 1.82s] +15:07:36 | +15:07:36 | Finished running 1 snapshots in 0.68s. + +Completed successfully + +Done. PASS=2 ERROR=0 SKIP=0 TOTAL=1 +``` + +1. Inspect the results by selecting from the table dbt created. After the first run, you should see the results of your query, plus the [snapshot meta fields](/docs/build/snapshots#snapshot-meta-fields) as described earlier. + +2. Run the `dbt snapshot` command again, and inspect the results. If any records have been updated, the snapshot should reflect this. + +3. Select from the `snapshot` in downstream models using the `ref` function. + + + +```sql +select * from {{ ref('orders_snapshot') }} +``` + + + +10. Snapshots are only useful if you run them frequently — schedule the `snapshot` command to run regularly. + + +## Examples + +This section outlines some examples of how to apply configurations to snapshots using the legacy method. + + + +Use config blocks if you need to apply a configuration to one snapshot only. + + + +```sql +{% snapshot orders_snapshot %} + {{ + config( + unique_key='id', + strategy='timestamp', + updated_at='updated_at' + ) + }} + -- Pro-Tip: Use sources in snapshots! + select * from {{ source('jaffle_shop', 'orders') }} +{% endsnapshot %} +``` + + + + + +The `updated_at` parameter is required if using the timestamp strategy. The `updated_at` parameter is a column within the results of your snapshot query that represents when the record row was last updated. + + + +```sql +{{ config( + strategy="timestamp", + updated_at="column_name" +) }} +``` + + +#### Examples + +- #### Using a column name `updated_at`: + + + + + ```sql + {% snapshot orders_snapshot %} + + {{ + config( + target_schema='snapshots', + unique_key='id', + + strategy='timestamp', + updated_at='updated_at' + ) + }} + + select * from {{ source('jaffle_shop', 'orders') }} + + {% endsnapshot %} + ``` + + + + + + + ```sql + {% snapshot orders_snapshot %} + + {{ + config( + schema='snapshots', + unique_key='id', + + strategy='timestamp', + updated_at='updated_at' + ) + }} + + select * from {{ source('jaffle_shop', 'orders') }} + + {% endsnapshot %} + ``` + + + +- #### Coalescing two columns to create a reliable `updated_at` column: + + Consider a data source that only has an `updated_at` column filled in when a record is updated (so a `null` value indicates that the record hasn't been updated after it was created). + + Since the `updated_at` configuration only takes a column name, rather than an expression, you should update your snapshot query to include the coalesced column. + + + + + ```sql + {% snapshot orders_snapshot %} + + {{ + config( + schema='snapshots', + unique_key='id', + + strategy='timestamp', + updated_at='updated_at_for_snapshot' + ) + }} + + select + *, + coalesce(updated_at, created_at) as updated_at_for_snapshot + + from {{ source('jaffle_shop', 'orders') }} + + {% endsnapshot %} + ``` + + + + + + + ```sql + {% snapshot orders_snapshot %} + + {{ + config( + target_schema='snapshots', + unique_key='id', + + strategy='timestamp', + updated_at='updated_at_for_snapshot' + ) + }} + + select + *, + coalesce(updated_at, created_at) as updated_at_for_snapshot + + from {{ source('jaffle_shop', 'orders') }} + + {% endsnapshot %} + ``` + + + + + + + +The `unique_key` is a column name or expression that is unique for the inputs of a snapshot. dbt uses [`unique_key`](/reference/resource-configs/unique_key) to match records between a result set and an existing snapshot, so that changes can be captured correctly. + + + +```sql +{{ config( + unique_key="column_name" +) }} +``` + + +#### Examples + +- Using an `id` column as a unique key + + + + ```sql + {{ + config( + unique_key="id" + ) + }} + ``` + + + You can also write this in YAML. This might be a good idea if multiple snapshots share the same `unique_key` (though we prefer to apply this configuration in a config block, as above). + +- #### Using a combination of two columns as a unique key + + This configuration accepts a valid column expression. As such, you can concatenate two columns together as a unique key if required. It's a good idea to use a separator (like, '-') to ensure uniqueness. + + + + ```sql + {% snapshot transaction_items_snapshot %} + + {{ + config( + unique_key="transaction_id||'-'||line_item_id", + ... + ) + }} + + select + transaction_id||'-'||line_item_id as id, + * + from {{ source('erp', 'transactions') }} + + {% endsnapshot %} + ``` + + + + Though, it's probably a better idea to construct this column in your query and use that as the `unique_key`: + + + + ```sql + {% snapshot transaction_items_snapshot %} + + {{ + config( + unique_key="id", + ... + ) + }} + + select + transaction_id || '-' || line_item_id as id, + * + from {{ source('erp', 'transactions') }} + + {% endsnapshot %} + ``` + + diff --git a/website/sidebars.js b/website/sidebars.js index 964ab97431a..d7293ba0501 100644 --- a/website/sidebars.js +++ b/website/sidebars.js @@ -1047,6 +1047,7 @@ const sidebarSettings = { items: [ "reference/snapshot-properties", "reference/snapshot-configs", + "reference/resource-configs/snapshots-jinja-legacy", "reference/resource-configs/check_cols", "reference/resource-configs/dbt_valid_to_current", "reference/resource-configs/hard-deletes",