From c205a9508ab997eac67e1408e7c13cd553bead2c Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Mon, 6 Jan 2025 15:15:18 +0000 Subject: [PATCH 01/18] add new legacy page --- .../snapshots-jinja-legacy.md | 491 ++++++++++++++++++ website/sidebars.js | 1 + 2 files changed, 492 insertions(+) create mode 100644 website/docs/reference/resource-configs/snapshots-jinja-legacy.md diff --git a/website/docs/reference/resource-configs/snapshots-jinja-legacy.md b/website/docs/reference/resource-configs/snapshots-jinja-legacy.md new file mode 100644 index 00000000000..b8c930e9a21 --- /dev/null +++ b/website/docs/reference/resource-configs/snapshots-jinja-legacy.md @@ -0,0 +1,491 @@ +--- +title: Legacy snapshot jinja block +description: Read about legacy snapshot jinja blocks and how to migrate to the updated syntax +sidebar: Legacy snapshot jinja block +--- + +For dbt versions 1.8 and earlier, you were able to configure snapshots using jinja blocks in your .sql files. Configuring snapshots using jinja blocks is considered legacy syntax and was replaced with a YAML-based configuration in dbt Cloud Versionless and dbt v1.9 for faster and more efficient management. + +This page details how to use those legacy configurations and provides a path to migrate to the more efficient YAML configuration introduced in dbt Cloud Versionless and dbt v1.9. + +## Snapshot configuration + +Although there's a more performant method, you may still want to use the legacy way to define your snapshots if it suits your needs. This page will list out the types of jinja blocks suitable for snapshots and how to migrate from the legacy way to the updated method: + +- List out each header so there's a high-level overview of what's covered + +### Resource-specific configurations +Resource-specific configurations are applicable to only one dbt resource type rather than multiple resource types. You can define these settings in the project file (`dbt_project.yml`), a property file (`models/properties.yml` for models, similarly for other resources), or within the resource’s file using the `{{ config() }}` macro. + + + +```sql +{ % snapshot orders_snapshot %} + +{{ config( + target_schema="", + target_database="", + unique_key="", + strategy="timestamp" | "check", + updated_at="", + check_cols=[""] | "all" + invalidate_hard_deletes : true | false +) +}} + +select * from {{ source('jaffle_shop', 'orders') }} + +{% endsnapshot %} +``` + + +### General configuration +Use general configurations for broader operational settings applicable across multiple resource types. Like resource-specific configurations, these can also be set in the project file, property files, or within resource-specific files using a config block. + +```sql + +{{ config( + enabled=true | false, + tags="" | [""], + alias="", + pre_hook="" | [""], + post_hook="" | [""] + persist_docs={} + grants={} +) }} +``` + +#### Apply configurations to one snapshot only +Use config blocks if you need to apply a configuration to one snapshot only. + + + +```sql +{% snapshot orders_snapshot %} + {{ + config( + unique_key='id', + strategy='timestamp', + updated_at='updated_at' + ) + }} + -- Pro-Tip: Use sources in snapshots! + select * from {{ source('jaffle_shop', 'orders') }} +{% endsnapshot %} +``` + + +## Snapshot strategies +Snapshot "strategies" define how dbt knows if a row has changed. There are two strategies built-in to dbt that require the `strategy` parameter: + +- Timestamp — Uses an updated_at column to determine if a row has changed. +- Check — Compares a list of columns between their current and historical values to determine if a row has changed. Uses the `check_cols` parameter. + +```sql +{% snapshot [snapshot_name](snapshot_name) %} + +{{ config( + strategy="timestamp", + updated_at="column_name" +) }} + +select ... + +{% endsnapshot %} +``` + +```sql +{% snapshot [snapshot_name](snapshot_name) %} + +{{ config( + strategy="check", + check_cols=[column_name] | "all" +) }} + +{% endsnapshot %} +``` + + +### Timestamp +The timestamp strategy uses an `updated_at` field to determine if a row has changed. If the configured `updated_at` column for a row is more recent than the last time the snapshot ran, then dbt will invalidate the old record and record the new one. If the timestamps are unchanged, then dbt will not take any action. + +#### Example + + + + + +```sql +{% snapshot orders_snapshot_timestamp %} + + {{ + config( + target_schema='snapshots', + strategy='timestamp', + unique_key='id', + updated_at='updated_at', + ) + }} + + select * from {{ source('jaffle_shop', 'orders') }} + +{% endsnapshot %} +``` + + + + +### Check +The check strategy is useful for tables which do not have a reliable `updated_at` column. It requires the `check_cols` parameter, which is a list of columns within the results of your snapshot query to check for changes. Alternatively, use all columns using the all value (however this may be less performant). + + + +```sql +{% snapshot orders_snapshot_check %} + + {{ + config( + strategy='check', + unique_key='id', + check_cols=['status', 'is_cancelled'], + ) + }} + + select * from {{ source('jaffle_shop', 'orders') }} + +{% endsnapshot %} +``` + + +#### Examples + + + + +```sql +{% snapshot orders_snapshot_check %} + + {{ + config( + strategy='check', + unique_key='id', + check_cols=['status', 'is_cancelled'], + ) + }} + + select * from {{ source('jaffle_shop', 'orders') }} + +{% endsnapshot %} +``` + + + + + + + +```sql +{% snapshot orders_snapshot_check %} + + {{ + config( + strategy='check', + unique_key='id', + check_cols='all', + ) + }} + + select * from {{ source('jaffle_shop', 'orders') }} + +{% endsnapshot %} +``` + + + +## updated_at + +The `updated_at` parameter is required if using the timestamp strategy. The `updated_at` parameter is a column within the results of your snapshot query that represents when the record row was last updated. + + + +```sql +{{ config( + strategy="timestamp", + updated_at="column_name" +) }} +``` + + + +#### Examples + + + + +```sql +{% snapshot orders_snapshot %} + +{{ + config( + target_schema='snapshots', + unique_key='id', + + strategy='timestamp', + updated_at='updated_at' + ) +}} + +select * from {{ source('jaffle_shop', 'orders') }} + +{% endsnapshot %} +``` + + + + +Consider a data source that only has an updated_at column filled in when a record is updated (so a null value indicates that the record hasn't been updated after it was created). + +Since the updated_at configuration only takes a column name, rather than an expression, you should update your snapshot query to include the coalesced column. + + + +```sql +{% snapshot orders_snapshot %} + +{{ + config( + target_schema='snapshots', + unique_key='id', + + strategy='timestamp', + updated_at='updated_at_for_snapshot' + ) +}} + +select + *, + coalesce(updated_at, created_at) as updated_at_for_snapshot + +from {{ source('jaffle_shop', 'orders') }} + +{% endsnapshot %} +``` + + + +## unique_key +A column name or expression that is unique for the inputs of a snapshot. dbt uses [`unique_key`](/reference/resource-configs/unique_key) to match records between a result set and an existing snapshot, so that changes can be captured correctly. + + + +```sql +{{ config( + unique_key="column_name" +) }} +``` + + +#### Examples + + + + + +```sql +{{ + config( + unique_key="id" + ) +}} +``` + + + +You can also write this in yaml. This might be a good idea if multiple snapshots share the same `unique_key` (though we prefer to apply this configuration in a config block, as above). + + + + +This configuration accepts a valid column expression. As such, you can concatenate two columns together as a unique key if required. It's a good idea to use a separator (e.g. '-') to ensure uniqueness. + + + +```sql +{% snapshot transaction_items_snapshot %} + + {{ + config( + unique_key="transaction_id||'-'||line_item_id", + ... + ) + }} + +select + transaction_id||'-'||line_item_id as id, + * +from {{ source('erp', 'transactions') }} + +{% endsnapshot %} +``` + + +Though, it's probably a better idea to construct this column in your query and use that as the `unique_key`: + + + +```sql +{% snapshot transaction_items_snapshot %} + + {{ + config( + unique_key="id", + ... + ) + }} + +select + transaction_id || '-' || line_item_id as id, + * +from {{ source('erp', 'transactions') }} + +{% endsnapshot %} +``` + + + +## invalidate_hard_deletes + +Opt-in feature to enable invalidating hard deleted records while snapshotting the query. + +By default the feature is disabled. + + + +```sql +{% snapshot orders_snapshot %} + + {{ + config( + target_schema='snapshots', + strategy='timestamp', + unique_key='id', + updated_at='updated_at', + invalidate_hard_deletes=True, + ) + }} + + select * from {{ source('jaffle_shop', 'orders') }} + +{% endsnapshot %} +``` + + +## Pre hook and post hook +A SQL statement (or list of SQL statements) to be run before or after a model, seed, or snapshot is built. + +Pre- and post-hooks can also call macros that return SQL statements. + +In this example, we use the | symbol to separate two different formatting options for SQL statements in pre-hooks and post-hooks. The first option (without brackets) accepts a single SQL statement as a string, while the second (with brackets) accepts multiple SQL statements as an array of strings. Replace SQL-STATEMENT with your SQL. + +```sql +{% snapshot snapshot_name %} +{{ config( + pre_hook="SQL-statement" | ["SQL-statement"], + post_hook="SQL-statement" | ["SQL-statement"], +) }} + +select ... + +{% end_snapshot %} +``` +## Migrate from legacy to update + +This page outlines the steps you need to follow to convert legacy jinja block snapshot configurations into the updaetd YAML-based configuration format. + +### Why use the updated YAML spec? + +- Performance: YAML-based configurations are processed faster by dbt, leading to improved performance, especially during parsing and compilation. +- Maintainability: Centralizing configuration in YAML makes it easier to manage and update snapshot settings without editing the SQL logic directly. +- Consistency: YAML configuration aligns snapshot definitions with other dbt resources, such as models and seeds, leading to a more consistent project structure. + + +Note: In versions prior to v1.9, the target_schema (required) and target_database (optional) configurations defined a single schema or database to build a snapshot across users and environment. This created problems when testing or developing a snapshot, as there was no clear separation between development and production environments. In v1.9, target_schema became optional, allowing snapshots to be environment-aware. By default, without target_schema or target_database defined, snapshots now use the generate_schema_name or generate_database_name macros to determine where to build. Developers can still set a custom location with schema and database configs, consistent with other resource types. + +### How to migrate + +1. Move any configurations currently written within the jinja block (like unique_key, strategy, updated_at, and so on) into the YAML file. The configurations are structured similarly to how you would define a model in dbt_project.yml. + +Here's an example conversion: + +```yaml +# snapshots.yml +snapshots: + - name: orders_snapshot + schema: snapshots + unique_key: id + strategy: timestamp + updated_at: updated_at + ``` +Note: The unique_key, strategy, and updated_at fields must match the settings previously defined in your jinja block. + +2. Before removing the old jinja block, run the dbt snapshot command using the new YAML configuration to confirm that the snapshot behaves as expected. + - Verify that the data is processed correctly (e.g., no data loss or incorrect records). + - Make surethe performance is either the same or improved compared to the old configuration. + - After running the new snapshot, inspect the snapshot tables in your data warehouse to confirm the new snapshot records match the old data. + +3. Once you’ve confirmed that the new YAML configuration works properly, safely remove the old snapshot jinja block from your .sql file. This keeps your codebase clean and fully migrated to the new method. + +4. If your snapshots require more complex transformations, consider using an ephemeral model to handle the transformations before referencing it in the snapshot. An ephemeral model can encapsulate transformations and simplify the snapshot query itself. + +Example of using an ephemeral model: + +``` +yaml +# models/ephemeral/orders_ephemeral.sql +{{ + config(materialized='ephemeral') +}} +select * from {{ source('jaffle_shop', 'orders') }} + +# snapshots.yml +snapshots: + - name: orders_snapshot + relation: ref('orders_ephemeral') + target_schema: snapshots + unique_key: id + strategy: timestamp + updated_at: updated_at + ``` + +## Full migration example +Here’s a complete example of migrating from a legacy jinja block snapshot to a YAML-based snapshot configuration: + +```yaml + +# Legacy method (jinja block) + +snapshots/orders_snapshot.sql +{% snapshot orders_snapshot %} +{{ + config( + target_schema='snapshots', + unique_key='id', + strategy='timestamp', + updated_at='updated_at', + ) +}} +select * from {{ source('jaffle_shop', 'orders') }} +{% endsnapshot %} + +# New method (YAML configuration) + +snapshots.yml +snapshots: + - name: orders_snapshot + schema: snapshots + unique_key: id + strategy: timestamp + updated_at: updated_at + ``` + +By following these steps, you can smoothly transition from legacy jinja-based snapshots to the modern, more efficient YAML-based configurations. diff --git a/website/sidebars.js b/website/sidebars.js index 3a8f560c297..23501bc57f4 100644 --- a/website/sidebars.js +++ b/website/sidebars.js @@ -989,6 +989,7 @@ const sidebarSettings = { "reference/resource-configs/target_database", "reference/resource-configs/target_schema", "reference/resource-configs/updated_at", + "reference/resource-configs/snapshots-jinja-legacy", ], }, { From 2d2a4288898aa07acfdcd56376bea027c845d6b5 Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Tue, 14 Jan 2025 13:12:25 +0000 Subject: [PATCH 02/18] format legacy page --- .../snapshots-jinja-legacy.md | 461 ++++++++++-------- website/sidebars.js | 2 +- 2 files changed, 257 insertions(+), 206 deletions(-) diff --git a/website/docs/reference/resource-configs/snapshots-jinja-legacy.md b/website/docs/reference/resource-configs/snapshots-jinja-legacy.md index b8c930e9a21..e2cc0a53db2 100644 --- a/website/docs/reference/resource-configs/snapshots-jinja-legacy.md +++ b/website/docs/reference/resource-configs/snapshots-jinja-legacy.md @@ -4,13 +4,97 @@ description: Read about legacy snapshot jinja blocks and how to migrate to the u sidebar: Legacy snapshot jinja block --- -For dbt versions 1.8 and earlier, you were able to configure snapshots using jinja blocks in your .sql files. Configuring snapshots using jinja blocks is considered legacy syntax and was replaced with a YAML-based configuration in dbt Cloud Versionless and dbt v1.9 for faster and more efficient management. +From dbt versions 1.8 and earlier, you were able to configure snapshots using jinja blocks in your `.sql` files. Configuring snapshots this way is considered legacy syntax. It was replaced with a YAML-based configuration in [dbt Cloud's "Latest" release track](/docs/dbt-versions/cloud-release-tracks) and dbt v1.9 for faster and more efficient management. -This page details how to use those legacy configurations and provides a path to migrate to the more efficient YAML configuration introduced in dbt Cloud Versionless and dbt v1.9. +This page details how to use the legacy SQL-based configurations and provides a path to migrate to the more efficient YAML configuration. For new snapshots, we recommend using these latest YAML-based configs. If applying them to existing snapshots, you'll need to [migrate over](/reference/snapshot-configs#snapshot-configuration-migration). -## Snapshot configuration +The following table outlines the differences between the legacy SQL-based syntax and the updated YAML-based syntax. -Although there's a more performant method, you may still want to use the legacy way to define your snapshots if it suits your needs. This page will list out the types of jinja blocks suitable for snapshots and how to migrate from the legacy way to the updated method: +| Snapshot syntax | Description | Example | +| --------------- | ----------- | ------- | +| [SQL-based](#sql-based-snapshot-syntax) | Legacy syntax for defining snapshots in `.sql` files within a snapshot jinja block. Available in dbt v1.8 and earlier. |`{% snapshot orders_snapshot %}` in a `.sql` file using `{{ config() }}` | +| [YAML-based](#yaml-based-snapshot-syntax) | Updated syntax for defining snapshot configurations in YAML files. Found in `snapshots.yml`. Available in dbt Cloud's "Latest" release track and dbt v1.9 and later. |`snapshots.yml`| + +### SQL-based snapshot syntax +Legacy syntax for defining snapshots in `.sql` files within a snapshot Jinja block, typically located in your `snapshots` directory. + +#### Use cases + +- Defining multiple resources in a single file, though it can significantly slow down parsing and compilation. +- Useful for existing snapshots already using this syntax. +- Suitable for performing very light transformations (but creating a separate ephemeral model for transformations is recommended for better maintainability). + +#### Example + +In this example, we created a snapshot in the `snapshots` directory and used the `{{ config() }}` block within the jinja block to define the snapshot configuration. + + + +```sql +{% snapshot orders_snapshot %} + +{{ config( + target_database='analytics', + target_schema='snapshots', + unique_key='id', + strategy='timestamp', + updated_at='updated_at' +) }} + +select * from {{ source('jaffle_shop', 'orders') }} + +{% endsnapshot %} +``` + + +### YAML-based snapshot syntax +Updated syntax for defining snapshot configurations in YAML files. + +#### Use cases + +- More performant and easier to manage. +- Ideal for new snapshots or existing snapshots that need to be [migrated](/reference/snapshot-configs#snapshot-configuration-migration). +- Create transformations separate from the snapshot file by creating an ephemeral model and referencing it in the snapshot using the `relation` field. + +#### Example + +In this example, we created a snapshot in the `snapshots` directory (and separately an ephemeral model in the `models` directory). We then used the [`ref` function](/reference/dbt-jinja-functions/ref) to reference the ephemeral model in the `snapshots.yml` file in the `relation` field. + + + +```yaml +snapshots: + - name: orders_snapshot + relation: ref('orders_ephemeral') + config: + unique_key: id + strategy: timestamp + updated_at: updated_at +``` + + +In this second example, we used the `relation` field to reference the source table using the [`source` function](/reference/dbt-jinja-functions/source). + + + +```yaml +snapshots: + - name: orders_snapshot + relation: source('jaffle_shop', 'orders') + config: + schema: snapshots + database: analytics + unique_key: id + strategy: timestamp + updated_at: updated_at + dbt_valid_to_current: "to_date('9999-12-31')" + +``` + + +## Legacy snapshot configuration + +Although there's a more performant method, you may still want to use the legacy way to define your snapshots if it suits your needs. This page will list out the types of configurations suitable for snapshots and how to migrate from the legacy way to the updated method: - List out each header so there's a high-level overview of what's covered @@ -55,26 +139,6 @@ Use general configurations for broader operational settings applicable across mu ) }} ``` -#### Apply configurations to one snapshot only -Use config blocks if you need to apply a configuration to one snapshot only. - - - -```sql -{% snapshot orders_snapshot %} - {{ - config( - unique_key='id', - strategy='timestamp', - updated_at='updated_at' - ) - }} - -- Pro-Tip: Use sources in snapshots! - select * from {{ source('jaffle_shop', 'orders') }} -{% endsnapshot %} -``` - - ## Snapshot strategies Snapshot "strategies" define how dbt knows if a row has changed. There are two strategies built-in to dbt that require the `strategy` parameter: @@ -105,7 +169,6 @@ select ... {% endsnapshot %} ``` - ### Timestamp The timestamp strategy uses an `updated_at` field to determine if a row has changed. If the configured `updated_at` column for a row is more recent than the last time the snapshot ran, then dbt will invalidate the old record and record the new one. If the timestamps are unchanged, then dbt will not take any action. @@ -202,7 +265,31 @@ The check strategy is useful for tables which do not have a reliable `updated_at -## updated_at +## Examples + +This section outlines some examples of how to apply configurations to snapshots using the legacy method. + +### Apply configurations to one snapshot only +Use config blocks if you need to apply a configuration to one snapshot only. + + + +```sql +{% snapshot orders_snapshot %} + {{ + config( + unique_key='id', + strategy='timestamp', + updated_at='updated_at' + ) + }} + -- Pro-Tip: Use sources in snapshots! + select * from {{ source('jaffle_shop', 'orders') }} +{% endsnapshot %} +``` + + +### Using the updated_at parameter The `updated_at` parameter is required if using the timestamp strategy. The `updated_at` parameter is a column within the results of your snapshot query that represents when the record row was last updated. @@ -216,65 +303,61 @@ The `updated_at` parameter is required if using the timestamp strategy. The `upd ``` +- #### Using a column name `updated_at`: + -#### Examples - + ```sql + {% snapshot orders_snapshot %} - + {{ + config( + target_schema='snapshots', + unique_key='id', -```sql -{% snapshot orders_snapshot %} + strategy='timestamp', + updated_at='updated_at' + ) + }} -{{ - config( - target_schema='snapshots', - unique_key='id', + select * from {{ source('jaffle_shop', 'orders') }} - strategy='timestamp', - updated_at='updated_at' - ) -}} + {% endsnapshot %} + ``` + -select * from {{ source('jaffle_shop', 'orders') }} +- #### Coalescing two columns to create a reliable `updated_at` column: + + Consider a data source that only has an `updated_at` column filled in when a record is updated (so a `null` value indicates that the record hasn't been updated after it was created). + + Since the `updated_at` configuration only takes a column name, rather than an expression, you should update your snapshot query to include the coalesced column. -{% endsnapshot %} -``` - - + - -Consider a data source that only has an updated_at column filled in when a record is updated (so a null value indicates that the record hasn't been updated after it was created). + ```sql + {% snapshot orders_snapshot %} -Since the updated_at configuration only takes a column name, rather than an expression, you should update your snapshot query to include the coalesced column. + {{ + config( + target_schema='snapshots', + unique_key='id', - + strategy='timestamp', + updated_at='updated_at_for_snapshot' + ) + }} -```sql -{% snapshot orders_snapshot %} + select + *, + coalesce(updated_at, created_at) as updated_at_for_snapshot -{{ - config( - target_schema='snapshots', - unique_key='id', + from {{ source('jaffle_shop', 'orders') }} - strategy='timestamp', - updated_at='updated_at_for_snapshot' - ) -}} - -select - *, - coalesce(updated_at, created_at) as updated_at_for_snapshot - -from {{ source('jaffle_shop', 'orders') }} - -{% endsnapshot %} -``` - - + {% endsnapshot %} + ``` + -## unique_key -A column name or expression that is unique for the inputs of a snapshot. dbt uses [`unique_key`](/reference/resource-configs/unique_key) to match records between a result set and an existing snapshot, so that changes can be captured correctly. +### Using the unique_key parameter +The `unique_key` is a column name or expression that is unique for the inputs of a snapshot. dbt uses [`unique_key`](/reference/resource-configs/unique_key) to match records between a result set and an existing snapshot, so that changes can be captured correctly. @@ -287,122 +370,75 @@ A column name or expression that is unique for the inputs of a snapshot. dbt use #### Examples - - - - -```sql -{{ - config( - unique_key="id" - ) -}} -``` - - - -You can also write this in yaml. This might be a good idea if multiple snapshots share the same `unique_key` (though we prefer to apply this configuration in a config block, as above). +- Using an `id` column as a unique key + - + ```sql + {{ + config( + unique_key="id" + ) + }} + ``` + -This configuration accepts a valid column expression. As such, you can concatenate two columns together as a unique key if required. It's a good idea to use a separator (e.g. '-') to ensure uniqueness. + You can also write this in YAML. This might be a good idea if multiple snapshots share the same `unique_key` (though we prefer to apply this configuration in a config block, as above). - - -```sql -{% snapshot transaction_items_snapshot %} - - {{ - config( - unique_key="transaction_id||'-'||line_item_id", - ... - ) - }} - -select - transaction_id||'-'||line_item_id as id, - * -from {{ source('erp', 'transactions') }} - -{% endsnapshot %} -``` - - -Though, it's probably a better idea to construct this column in your query and use that as the `unique_key`: - - - -```sql -{% snapshot transaction_items_snapshot %} - - {{ - config( - unique_key="id", - ... - ) - }} - -select - transaction_id || '-' || line_item_id as id, - * -from {{ source('erp', 'transactions') }} - -{% endsnapshot %} -``` - - +- #### Using a combination of two columns as a unique key -## invalidate_hard_deletes + This configuration accepts a valid column expression. As such, you can concatenate two columns together as a unique key if required. It's a good idea to use a separator (e.g. '-') to ensure uniqueness. -Opt-in feature to enable invalidating hard deleted records while snapshotting the query. + -By default the feature is disabled. + ```sql + {% snapshot transaction_items_snapshot %} - + {{ + config( + unique_key="transaction_id||'-'||line_item_id", + ... + ) + }} -```sql -{% snapshot orders_snapshot %} + select + transaction_id||'-'||line_item_id as id, + * + from {{ source('erp', 'transactions') }} - {{ - config( - target_schema='snapshots', - strategy='timestamp', - unique_key='id', - updated_at='updated_at', - invalidate_hard_deletes=True, - ) - }} + {% endsnapshot %} + ``` - select * from {{ source('jaffle_shop', 'orders') }} + -{% endsnapshot %} -``` - + Though, it's probably a better idea to construct this column in your query and use that as the `unique_key`: -## Pre hook and post hook -A SQL statement (or list of SQL statements) to be run before or after a model, seed, or snapshot is built. + -Pre- and post-hooks can also call macros that return SQL statements. + ```sql + {% snapshot transaction_items_snapshot %} -In this example, we use the | symbol to separate two different formatting options for SQL statements in pre-hooks and post-hooks. The first option (without brackets) accepts a single SQL statement as a string, while the second (with brackets) accepts multiple SQL statements as an array of strings. Replace SQL-STATEMENT with your SQL. + {{ + config( + unique_key="id", + ... + ) + }} -```sql -{% snapshot snapshot_name %} -{{ config( - pre_hook="SQL-statement" | ["SQL-statement"], - post_hook="SQL-statement" | ["SQL-statement"], -) }} + select + transaction_id || '-' || line_item_id as id, + * + from {{ source('erp', 'transactions') }} -select ... + {% endsnapshot %} + ``` + -{% end_snapshot %} -``` ## Migrate from legacy to update This page outlines the steps you need to follow to convert legacy jinja block snapshot configurations into the updaetd YAML-based configuration format. -### Why use the updated YAML spec? +Why use the updated YAML spec? - Performance: YAML-based configurations are processed faster by dbt, leading to improved performance, especially during parsing and compilation. - Maintainability: Centralizing configuration in YAML makes it easier to manage and update snapshot settings without editing the SQL logic directly. @@ -413,58 +449,68 @@ Note: In versions prior to v1.9, the target_schema (required) and target_databas ### How to migrate -1. Move any configurations currently written within the jinja block (like unique_key, strategy, updated_at, and so on) into the YAML file. The configurations are structured similarly to how you would define a model in dbt_project.yml. +1. Move any configurations currently written within the jinja block (like unique_key, strategy, updated_at, and so on) into the YAML file. + + The configurations are structured similarly to how you would define a model in `dbt_project.yml.` -Here's an example conversion: + Here's an example conversion: -```yaml -# snapshots.yml -snapshots: - - name: orders_snapshot - schema: snapshots - unique_key: id - strategy: timestamp - updated_at: updated_at - ``` -Note: The unique_key, strategy, and updated_at fields must match the settings previously defined in your jinja block. + + ```yaml + snapshots: + - name: orders_snapshot + schema: snapshots + unique_key: id + strategy: timestamp + updated_at: updated_at + ``` + + +Note: The `unique_key`, strategy, and `updated_at` fields must match the settings previously defined in your jinja block. 2. Before removing the old jinja block, run the dbt snapshot command using the new YAML configuration to confirm that the snapshot behaves as expected. - - Verify that the data is processed correctly (e.g., no data loss or incorrect records). + - Verify that the data is processed correctly (for example,no data loss or incorrect records). - Make surethe performance is either the same or improved compared to the old configuration. - After running the new snapshot, inspect the snapshot tables in your data warehouse to confirm the new snapshot records match the old data. -3. Once you’ve confirmed that the new YAML configuration works properly, safely remove the old snapshot jinja block from your .sql file. This keeps your codebase clean and fully migrated to the new method. +3. Once you’ve confirmed that the new YAML configuration works properly, safely remove the old snapshot jinja block from your `.sql` file. This keeps your codebase clean and fully migrated to the new method. 4. If your snapshots require more complex transformations, consider using an ephemeral model to handle the transformations before referencing it in the snapshot. An ephemeral model can encapsulate transformations and simplify the snapshot query itself. -Example of using an ephemeral model: + Example of using an ephemeral model: -``` -yaml -# models/ephemeral/orders_ephemeral.sql -{{ - config(materialized='ephemeral') -}} -select * from {{ source('jaffle_shop', 'orders') }} + -# snapshots.yml -snapshots: - - name: orders_snapshot - relation: ref('orders_ephemeral') - target_schema: snapshots - unique_key: id - strategy: timestamp - updated_at: updated_at - ``` + ```sql + {{ + config(materialized='ephemeral') + }} + select * from {{ source('jaffle_shop', 'orders') }} + ``` + + + Example of the snapshot YAML configuration referencing the ephemeral model: + + + + ```yaml + snapshots: + - name: orders_snapshot + relation: ref('orders_ephemeral') + target_schema: snapshots + unique_key: id + strategy: timestamp + updated_at: updated_at + ``` + -## Full migration example +### Full migration example Here’s a complete example of migrating from a legacy jinja block snapshot to a YAML-based snapshot configuration: -```yaml - -# Legacy method (jinja block) +#### Legacy method (jinja block) + -snapshots/orders_snapshot.sql +```sql {% snapshot orders_snapshot %} {{ config( @@ -476,16 +522,21 @@ snapshots/orders_snapshot.sql }} select * from {{ source('jaffle_shop', 'orders') }} {% endsnapshot %} +``` + + +#### Recommended method (YAML configuration) -# New method (YAML configuration) + -snapshots.yml +```yaml snapshots: - name: orders_snapshot schema: snapshots unique_key: id strategy: timestamp updated_at: updated_at - ``` - + ``` + + By following these steps, you can smoothly transition from legacy jinja-based snapshots to the modern, more efficient YAML-based configurations. diff --git a/website/sidebars.js b/website/sidebars.js index 23501bc57f4..9d8f0e1e75e 100644 --- a/website/sidebars.js +++ b/website/sidebars.js @@ -979,6 +979,7 @@ const sidebarSettings = { items: [ "reference/snapshot-properties", "reference/snapshot-configs", + "reference/resource-configs/snapshots-jinja-legacy", "reference/resource-configs/check_cols", "reference/resource-configs/dbt_valid_to_current", "reference/resource-configs/hard-deletes", @@ -989,7 +990,6 @@ const sidebarSettings = { "reference/resource-configs/target_database", "reference/resource-configs/target_schema", "reference/resource-configs/updated_at", - "reference/resource-configs/snapshots-jinja-legacy", ], }, { From 9889082e9d99dd246ea2bab33d04473d2ed41583 Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Tue, 21 Jan 2025 15:50:19 +0000 Subject: [PATCH 03/18] refine legacy page --- website/docs/docs/build/snapshots.md | 361 +++++++++--------- .../snapshots-jinja-legacy.md | 278 ++++++++++---- 2 files changed, 366 insertions(+), 273 deletions(-) diff --git a/website/docs/docs/build/snapshots.md b/website/docs/docs/build/snapshots.md index f72f1eb75de..f465a2ec55c 100644 --- a/website/docs/docs/build/snapshots.md +++ b/website/docs/docs/build/snapshots.md @@ -37,10 +37,174 @@ This order is now in the "shipped" state, but we've lost the information about w -- To configure snapshots in versions 1.8 and earlier, refer to [Configure snapshots in versions 1.8 and earlier](#configure-snapshots-in-versions-18-and-earlier). These versions use an older syntax where snapshots are defined within a snapshot block in a `.sql` file, typically located in your `snapshots` directory. -- Note that defining multiple resources in a single file can significantly slow down parsing and compilation. For faster and more efficient management, consider the updated snapshot YAML syntax, [available now in the "Latest" release track in dbt Cloud](/docs/dbt-versions/cloud-release-tracks) or [dbt Core v1.9 and later](/docs/dbt-versions/core). - - For more information on how to migrate from the legacy snapshot configurations to the updated snapshot YAML syntax, refer to [Snapshot configuration migration](/reference/snapshot-configs#snapshot-configuration-migration). +In dbt versions 1.8 and earlier, snapshots are `select` statements, defined within a snapshot block in a `.sql` file (typically in your `snapshots` directory or any other directory). This is considered a legacy syntax that allows for defining multiple resources in a single file and can slow down parsing and compilation. You'll also need to configure your snapshot to tell dbt how to detect record changes. +- For more information about configuring snapshots in dbt versions 1.8 and earlier, examples, and [migration steps](/reference/resource-configs/snapshots-jinja-legacy#migrate-legacy-snapshot-configs) — refer to the [Legacy snapshot configurations](/reference/resource-configs/snapshots-jinja-legacy) page. +- For faster and more efficient management, consider [choosing the "Latest" release track in dbt Cloud](/docs/dbt-versions/cloud-release-tracks) or the [latest version of dbt Core](/docs/dbt-versions/core), which introduces an updated snapshot configuration syntax that optimizes performance. + +The following example shows how to configure a snapshot using the legacy syntax: + + + +```sql +{% snapshot orders_snapshot %} + +{{ + config( + target_database='analytics', + target_schema='snapshots', + unique_key='id', + + strategy='timestamp', + updated_at='updated_at', + ) +}} + +select * from {{ source('jaffle_shop', 'orders') }} + +{% endsnapshot %} +``` + + + +The following table outlines the configurations available for snapshots in versions 1.8 and earlier: + +| Config | Description | Required? | Example | +| ------ | ----------- | --------- | ------- | +| [target_database](/reference/resource-configs/target_database) | The database that dbt should render the snapshot table into | No | analytics | +| [target_schema](/reference/resource-configs/target_schema) | The schema that dbt should render the snapshot table into | Yes | snapshots | +| [strategy](/reference/resource-configs/strategy) | The snapshot strategy to use. One of `timestamp` or `check` | Yes | timestamp | +| [unique_key](/reference/resource-configs/unique_key) | A column or expression for the record | Yes | id | +| [check_cols](/reference/resource-configs/check_cols) | If using the `check` strategy, then the columns to check | Only if using the `check` strategy | ["status"] | +| [updated_at](/reference/resource-configs/updated_at) | If using the `timestamp` strategy, the timestamp column to compare | Only if using the `timestamp` strategy | updated_at | +| [invalidate_hard_deletes](/reference/resource-configs/invalidate_hard_deletes) | Find hard deleted records in source, and set `dbt_valid_to` current time if no longer exists | No | True | + +- A number of other configurations are also supported (e.g. `tags` and `post-hook`), check out the full list [here](/reference/snapshot-configs). +- Snapshots can be configured from both your `dbt_project.yml` file and a `config` block, check out the [configuration docs](/reference/snapshot-configs) for more information. +- Note: BigQuery users can use `target_project` and `target_dataset` as aliases for `target_database` and `target_schema`, respectively. + +### Configuration example + +To add a snapshot to your project: + +1. Create a file in your `snapshots` directory with a `.sql` file extension, e.g. `snapshots/orders.sql` +2. Use a `snapshot` block to define the start and end of a snapshot: + + + +```sql +{% snapshot orders_snapshot %} + +{% endsnapshot %} +``` + + + +3. Write a `select` statement within the snapshot block (tips for writing a good snapshot query are below). This select statement defines the results that you want to snapshot over time. You can use `sources` and `refs` here. + + + +```sql +{% snapshot orders_snapshot %} + +select * from {{ source('jaffle_shop', 'orders') }} + +{% endsnapshot %} +``` + + + +4. Check whether the result set of your query includes a reliable timestamp column that indicates when a record was last updated. For our example, the `updated_at` column reliably indicates record changes, so we can use the `timestamp` strategy. If your query result set does not have a reliable timestamp, you'll need to instead use the `check` strategy — more details on this below. + +5. Add configurations to your snapshot using a `config` block (more details below). You can also configure your snapshot from your `dbt_project.yml` file ([docs](/reference/snapshot-configs)). + + + + + +```sql +{% snapshot orders_snapshot %} + +{{ + config( + target_database='analytics', + target_schema='snapshots', + unique_key='id', + + strategy='timestamp', + updated_at='updated_at', + ) +}} + +select * from {{ source('jaffle_shop', 'orders') }} + +{% endsnapshot %} +``` + + + +6. Run the `dbt snapshot` [command](/reference/commands/snapshot) — for our example a new table will be created at `analytics.snapshots.orders_snapshot`. You can change the `target_database` configuration, the `target_schema` configuration and the name of the snapshot (as defined in `{% snapshot .. %}`) will change how dbt names this table. + + + + + + + +```sql +{% snapshot orders_snapshot %} + +{{ + config( + schema='snapshots', + unique_key='id', + strategy='timestamp', + updated_at='updated_at', + ) +}} + +select * from {{ source('jaffle_shop', 'orders') }} + +{% endsnapshot %} +``` + + + +6. Run the `dbt snapshot` [command](/reference/commands/snapshot) — for our example, a new table will be created at `analytics.snapshots.orders_snapshot`. The [`schema`](/reference/resource-configs/schema) config will utilize the `generate_schema_name` macro. + + + +``` +$ dbt snapshot +Running with dbt=1.8.0 + +15:07:36 | Concurrency: 8 threads (target='dev') +15:07:36 | +15:07:36 | 1 of 1 START snapshot snapshots.orders_snapshot...... [RUN] +15:07:36 | 1 of 1 OK snapshot snapshots.orders_snapshot..........[SELECT 3 in 1.82s] +15:07:36 | +15:07:36 | Finished running 1 snapshots in 0.68s. + +Completed successfully + +Done. PASS=2 ERROR=0 SKIP=0 TOTAL=1 +``` + +7. Inspect the results by selecting from the table dbt created. After the first run, you should see the results of your query, plus the [snapshot meta fields](#snapshot-meta-fields) as described earlier. + +8. Run the `dbt snapshot` command again, and inspect the results. If any records have been updated, the snapshot should reflect this. + +9. Select from the `snapshot` in downstream models using the `ref` function. + + + +```sql +select * from {{ ref('orders_snapshot') }} +``` + + + +10. Snapshots are only useful if you run them frequently — schedule the `snapshot` command to run regularly. @@ -85,14 +249,15 @@ The following table outlines the configurations available for snapshots: | [snapshot_meta_column_names](/reference/resource-configs/snapshot_meta_column_names) | Customize the names of the snapshot meta fields | No | dictionary | | [hard_deletes](/reference/resource-configs/hard-deletes) | Specify how to handle deleted rows from the source. Supported options are `ignore` (default), `invalidate` (replaces the legacy `invalidate_hard_deletes=true`), and `new_record`.| No | string | - -- In versions prior to v1.9, the `target_schema` (required) and `target_database` (optional) configurations defined a single schema or database to build a snapshot across users and environment. This created problems when testing or developing a snapshot, as there was no clear separation between development and production environments. In v1.9, `target_schema` became optional, allowing snapshots to be environment-aware. By default, without `target_schema` or `target_database` defined, snapshots now use the `generate_schema_name` or `generate_database_name` macros to determine where to build. Developers can still set a custom location with [`schema`](/reference/resource-configs/schema) and [`database`](/reference/resource-configs/database) configs, consistent with other resource types. +- In versions prior to v1.9, the `target_schema` (required) and `target_database` (optional) configurations defined a single schema or database to build a snapshot across users and environment. This created problems when testing or developing a snapshot, as there was no clear separation between development and production environments. +- In v1.9, `target_schema` became optional, allowing snapshots to be environment-aware. By default, without `target_schema` or `target_database` defined, snapshots now use the `generate_schema_name` or `generate_database_name` macros to determine where to build. +- Developers can still set a custom location with [`schema`](/reference/resource-configs/schema) and [`database`](/reference/resource-configs/database) configs, consistent with other resource types. - A number of other configurations are also supported (for example, `tags` and `post-hook`). For the complete list, refer to [Snapshot configurations](/reference/snapshot-configs). - You can configure snapshots from both the `dbt_project.yml` file and a `config` block. For more information, refer to the [configuration docs](/reference/snapshot-configs). ### Add a snapshot to your project -To add a snapshot to your project follow these steps. For users on versions 1.8 and earlier, refer to [Configure snapshots in versions 1.8 and earlier](#configure-snapshots-in-versions-18-and-earlier). +To add a snapshot to your project follow these steps. For users on versions 1.8 and earlier, refer to [Legacy snapshot configurations](/reference/resource-configs/snapshots-jinja-legacy). 1. Create a YAML file in your `snapshots` directory: `snapshots/orders_snapshot.yml` and add your configuration details. You can also configure your snapshot from your `dbt_project.yml` file ([docs](/reference/snapshot-configs)). @@ -535,190 +700,6 @@ Snapshot results with `hard_deletes='new_record'`: -## Configure snapshots in versions 1.8 and earlier - - - -For information about configuring snapshots in dbt versions 1.8 and earlier, select **1.8** from the documentation version picker, and it will appear in this section. - -To configure snapshots in versions 1.9 and later, refer to [Configuring snapshots](#configuring-snapshots). The latest versions use an updated snapshot configuration syntax that optimizes performance. - - - - - -- In dbt versions 1.8 and earlier, snapshots are `select` statements, defined within a snapshot block in a `.sql` file (typically in your `snapshots` directory). You'll also need to configure your snapshot to tell dbt how to detect record changes. -- The earlier dbt versions use an older syntax that allows for defining multiple resources in a single file. This syntax can significantly slow down parsing and compilation. -- For faster and more efficient management, consider [choosing the "Latest" release track in dbt Cloud](/docs/dbt-versions/cloud-release-tracks) or the [latest version of dbt Core](/docs/dbt-versions/core), which introduces an updated snapshot configuration syntax that optimizes performance. - - For more information on how to migrate from the legacy snapshot configurations to the updated snapshot YAML syntax, refer to [Snapshot configuration migration](/reference/snapshot-configs#snapshot-configuration-migration). - -The following example shows how to configure a snapshot: - - - -```sql -{% snapshot orders_snapshot %} - -{{ - config( - target_database='analytics', - target_schema='snapshots', - unique_key='id', - - strategy='timestamp', - updated_at='updated_at', - ) -}} - -select * from {{ source('jaffle_shop', 'orders') }} - -{% endsnapshot %} -``` - - - -The following table outlines the configurations available for snapshots in versions 1.8 and earlier: - -| Config | Description | Required? | Example | -| ------ | ----------- | --------- | ------- | -| [target_database](/reference/resource-configs/target_database) | The database that dbt should render the snapshot table into | No | analytics | -| [target_schema](/reference/resource-configs/target_schema) | The schema that dbt should render the snapshot table into | Yes | snapshots | -| [strategy](/reference/resource-configs/strategy) | The snapshot strategy to use. One of `timestamp` or `check` | Yes | timestamp | -| [unique_key](/reference/resource-configs/unique_key) | A column or expression for the record | Yes | id | -| [check_cols](/reference/resource-configs/check_cols) | If using the `check` strategy, then the columns to check | Only if using the `check` strategy | ["status"] | -| [updated_at](/reference/resource-configs/updated_at) | If using the `timestamp` strategy, the timestamp column to compare | Only if using the `timestamp` strategy | updated_at | -| [invalidate_hard_deletes](/reference/resource-configs/invalidate_hard_deletes) | Find hard deleted records in source, and set `dbt_valid_to` current time if no longer exists | No | True | - -- A number of other configurations are also supported (e.g. `tags` and `post-hook`), check out the full list [here](/reference/snapshot-configs). -- Snapshots can be configured from both your `dbt_project.yml` file and a `config` block, check out the [configuration docs](/reference/snapshot-configs) for more information. -- Note: BigQuery users can use `target_project` and `target_dataset` as aliases for `target_database` and `target_schema`, respectively. - -### Configuration example - -To add a snapshot to your project: - -1. Create a file in your `snapshots` directory with a `.sql` file extension, e.g. `snapshots/orders.sql` -2. Use a `snapshot` block to define the start and end of a snapshot: - - - -```sql -{% snapshot orders_snapshot %} - -{% endsnapshot %} -``` - - - -3. Write a `select` statement within the snapshot block (tips for writing a good snapshot query are below). This select statement defines the results that you want to snapshot over time. You can use `sources` and `refs` here. - - - -```sql -{% snapshot orders_snapshot %} - -select * from {{ source('jaffle_shop', 'orders') }} - -{% endsnapshot %} -``` - - - -4. Check whether the result set of your query includes a reliable timestamp column that indicates when a record was last updated. For our example, the `updated_at` column reliably indicates record changes, so we can use the `timestamp` strategy. If your query result set does not have a reliable timestamp, you'll need to instead use the `check` strategy — more details on this below. - -5. Add configurations to your snapshot using a `config` block (more details below). You can also configure your snapshot from your `dbt_project.yml` file ([docs](/reference/snapshot-configs)). - - - - - -```sql -{% snapshot orders_snapshot %} - -{{ - config( - target_database='analytics', - target_schema='snapshots', - unique_key='id', - - strategy='timestamp', - updated_at='updated_at', - ) -}} - -select * from {{ source('jaffle_shop', 'orders') }} - -{% endsnapshot %} -``` - - - -6. Run the `dbt snapshot` [command](/reference/commands/snapshot) — for our example a new table will be created at `analytics.snapshots.orders_snapshot`. You can change the `target_database` configuration, the `target_schema` configuration and the name of the snapshot (as defined in `{% snapshot .. %}`) will change how dbt names this table. - - - - - - - -```sql -{% snapshot orders_snapshot %} - -{{ - config( - schema='snapshots', - unique_key='id', - strategy='timestamp', - updated_at='updated_at', - ) -}} - -select * from {{ source('jaffle_shop', 'orders') }} - -{% endsnapshot %} -``` - - - -6. Run the `dbt snapshot` [command](/reference/commands/snapshot) — for our example, a new table will be created at `analytics.snapshots.orders_snapshot`. The [`schema`](/reference/resource-configs/schema) config will utilize the `generate_schema_name` macro. - - - -``` -$ dbt snapshot -Running with dbt=1.8.0 - -15:07:36 | Concurrency: 8 threads (target='dev') -15:07:36 | -15:07:36 | 1 of 1 START snapshot snapshots.orders_snapshot...... [RUN] -15:07:36 | 1 of 1 OK snapshot snapshots.orders_snapshot..........[SELECT 3 in 1.82s] -15:07:36 | -15:07:36 | Finished running 1 snapshots in 0.68s. - -Completed successfully - -Done. PASS=2 ERROR=0 SKIP=0 TOTAL=1 -``` - -7. Inspect the results by selecting from the table dbt created. After the first run, you should see the results of your query, plus the [snapshot meta fields](#snapshot-meta-fields) as described earlier. - -8. Run the `dbt snapshot` command again, and inspect the results. If any records have been updated, the snapshot should reflect this. - -9. Select from the `snapshot` in downstream models using the `ref` function. - - - -```sql -select * from {{ ref('orders_snapshot') }} -``` - - - -10. Snapshots are only useful if you run them frequently — schedule the `snapshot` command to run regularly. - - - - ## FAQs diff --git a/website/docs/reference/resource-configs/snapshots-jinja-legacy.md b/website/docs/reference/resource-configs/snapshots-jinja-legacy.md index e2cc0a53db2..16b6d143554 100644 --- a/website/docs/reference/resource-configs/snapshots-jinja-legacy.md +++ b/website/docs/reference/resource-configs/snapshots-jinja-legacy.md @@ -1,22 +1,20 @@ --- -title: Legacy snapshot jinja block +title: Legacy snapshot configurations description: Read about legacy snapshot jinja blocks and how to migrate to the updated syntax -sidebar: Legacy snapshot jinja block +sidebar: Legacy snapshot configurations --- -From dbt versions 1.8 and earlier, you were able to configure snapshots using jinja blocks in your `.sql` files. Configuring snapshots this way is considered legacy syntax. It was replaced with a YAML-based configuration in [dbt Cloud's "Latest" release track](/docs/dbt-versions/cloud-release-tracks) and dbt v1.9 for faster and more efficient management. +# Legacy snapshot configuration + +From dbt versions 1.8 and earlier, you were able to configure [snapshots](/docs/build/snapshots) using jinja blocks in your `.sql` files. This is considered legacy syntax and was replaced with a [YAML-based configuration](/reference/snapshot-configs#configuring-snapshots) in [dbt Cloud's "Latest" release track](/docs/dbt-versions/cloud-release-tracks) and dbt v1.9 for faster and more efficient management. This page details how to use the legacy SQL-based configurations and provides a path to migrate to the more efficient YAML configuration. For new snapshots, we recommend using these latest YAML-based configs. If applying them to existing snapshots, you'll need to [migrate over](/reference/snapshot-configs#snapshot-configuration-migration). -The following table outlines the differences between the legacy SQL-based syntax and the updated YAML-based syntax. +The following outlines the differences between the legacy SQL-based syntax and the updated YAML-based syntax: -| Snapshot syntax | Description | Example | -| --------------- | ----------- | ------- | -| [SQL-based](#sql-based-snapshot-syntax) | Legacy syntax for defining snapshots in `.sql` files within a snapshot jinja block. Available in dbt v1.8 and earlier. |`{% snapshot orders_snapshot %}` in a `.sql` file using `{{ config() }}` | -| [YAML-based](#yaml-based-snapshot-syntax) | Updated syntax for defining snapshot configurations in YAML files. Found in `snapshots.yml`. Available in dbt Cloud's "Latest" release track and dbt v1.9 and later. |`snapshots.yml`| + -### SQL-based snapshot syntax -Legacy syntax for defining snapshots in `.sql` files within a snapshot Jinja block, typically located in your `snapshots` directory. +Legacy syntax for defining snapshots in `.sql` files within a snapshot Jinja block, typically located in your `snapshots` directory. Available in dbt v1.8 and earlier. #### Use cases @@ -46,9 +44,11 @@ select * from {{ source('jaffle_shop', 'orders') }} {% endsnapshot %} ``` + -### YAML-based snapshot syntax -Updated syntax for defining snapshot configurations in YAML files. + + +Updated syntax for defining snapshot configurations in YAML files. Found in `snapshots.yml`. Available in dbt Cloud's "Latest" release track and dbt v1.9 and later. #### Use cases @@ -91,15 +91,20 @@ snapshots: ``` + -## Legacy snapshot configuration +## Snapshot configurations -Although there's a more performant method, you may still want to use the legacy way to define your snapshots if it suits your needs. This page will list out the types of configurations suitable for snapshots and how to migrate from the legacy way to the updated method: +Although you can use the more performant YAML-based configuration, you might still want to use the legacy configuration to define your snapshots if it suits your needs. -- List out each header so there's a high-level overview of what's covered +Snapshots can be configured in two main ways: +- Using [snapshot-specific configurations](#snapshot-specific-configurations) +- Or using [general configurations](#general-configuration) -### Resource-specific configurations -Resource-specific configurations are applicable to only one dbt resource type rather than multiple resource types. You can define these settings in the project file (`dbt_project.yml`), a property file (`models/properties.yml` for models, similarly for other resources), or within the resource’s file using the `{{ config() }}` macro. +These configurations allow you to control how dbt detects changes in your data and where snapshots are stored. Both types of configurations can coexist in your project in the same `config` block (or from your `dbt_project.yml` file or `properties.yaml` file). You can also configure snapshots using [strategies](#snapshot-strategies), which define how dbt knows if a row has changed. + +### Snapshot specific configurations +Snapshot-specific configurations are applicable to only one dbt resource type rather than multiple resource types. You can define these settings within the resource’s file using the `{{ config() }}` macro (as well as in the project file (`dbt_project.yml`) or a property file (`models/properties.yml` for models, similarly for other resources)). @@ -107,13 +112,13 @@ Resource-specific configurations are applicable to only one dbt resource type ra { % snapshot orders_snapshot %} {{ config( - target_schema="", - target_database="", - unique_key="", - strategy="timestamp" | "check", - updated_at="", - check_cols=[""] | "all" - invalidate_hard_deletes : true | false + [target_schema](/reference/resource-configs/target_schema)="", + [target_database](/reference/resource-configs/target_database)="", + [unique_key](/reference/resource-configs/unique_key)="", + [strategy](/reference/resource-configs/strategy)="timestamp" | "check", + [updated_at](/reference/resource-configs/updated_at)="", + [check_cols](/reference/resource-configs/check_cols)=[""] | "all" + [invalidate_hard_deletes](/reference/resource-configs/check_cols) : true | false ) }} @@ -126,56 +131,34 @@ select * from {{ source('jaffle_shop', 'orders') }} ### General configuration Use general configurations for broader operational settings applicable across multiple resource types. Like resource-specific configurations, these can also be set in the project file, property files, or within resource-specific files using a config block. -```sql - -{{ config( - enabled=true | false, - tags="" | [""], - alias="", - pre_hook="" | [""], - post_hook="" | [""] - persist_docs={} - grants={} -) }} -``` - -## Snapshot strategies -Snapshot "strategies" define how dbt knows if a row has changed. There are two strategies built-in to dbt that require the `strategy` parameter: - -- Timestamp — Uses an updated_at column to determine if a row has changed. -- Check — Compares a list of columns between their current and historical values to determine if a row has changed. Uses the `check_cols` parameter. + ```sql -{% snapshot [snapshot_name](snapshot_name) %} - {{ config( - strategy="timestamp", - updated_at="column_name" + [enabled](/reference/resource-configs/check_cols)=true | false, + [tags](/reference/resource-configs/tags)="" | [""], + [alias](/reference/resource-configs/alias)="", + [pre_hook](/reference/resource-configs/pre-hook-post-hook)="" | [""], + [post_hook](/reference/resource-configs/pre-hook-post-hook)="" | [""] + [persist_docs](/reference/resource-configs/persist_docs)={} + [grants](/reference/resource-configs/grants)={} ) }} - -select ... - -{% endsnapshot %} ``` + -```sql -{% snapshot [snapshot_name](snapshot_name) %} +### Snapshot strategies +Snapshot "strategies" define how dbt knows if a row has changed. There are two strategies built-in to dbt that require the `strategy` parameter: -{{ config( - strategy="check", - check_cols=[column_name] | "all" -) }} +- [Timestamp](/reference/resource-configs/snapshots-jinja-legacy?strategy=timestamp#snapshot-strategies) — Uses an `updated_at` column to determine if a row has changed. +- [Check](/reference/resource-configs/snapshots-jinja-legacy?strategy=check#snapshot-strategies) — Compares a list of columns between their current and historical values to determine if a row has changed. Uses the `check_cols` parameter. -{% endsnapshot %} -``` + + -### Timestamp The timestamp strategy uses an `updated_at` field to determine if a row has changed. If the configured `updated_at` column for a row is more recent than the last time the snapshot ran, then dbt will invalidate the old record and record the new one. If the timestamps are unchanged, then dbt will not take any action. #### Example - - ```sql @@ -195,12 +178,14 @@ The timestamp strategy uses an `updated_at` field to determine if a row has chan {% endsnapshot %} ``` + - + -### Check The check strategy is useful for tables which do not have a reliable `updated_at` column. It requires the `check_cols` parameter, which is a list of columns within the results of your snapshot query to check for changes. Alternatively, use all columns using the all value (however this may be less performant). +#### Example + ```sql @@ -264,12 +249,128 @@ The check strategy is useful for tables which do not have a reliable `updated_at ``` + + + +## Configure snapshots + +In dbt versions 1.8 and earlier, snapshots are `select` statements, defined within a snapshot block in a `.sql` file (typically in your `snapshots` directory or any other directory). You'll also need to configure your snapshot to tell dbt how to detect record changes. + +The following table outlines the configurations available for snapshots in versions 1.8 and earlier: + +| Config | Description | Required? | Example | +| ------ | ----------- | --------- | ------- | +| [target_database](/reference/resource-configs/target_database) | The database that dbt should render the snapshot table into | No | analytics | +| [target_schema](/reference/resource-configs/target_schema) | The schema that dbt should render the snapshot table into | Yes | snapshots | +| [strategy](/reference/resource-configs/strategy) | The snapshot strategy to use. One of `timestamp` or `check` | Yes | timestamp | +| [unique_key](/reference/resource-configs/unique_key) | A column or expression for the record | Yes | id | +| [check_cols](/reference/resource-configs/check_cols) | If using the `check` strategy, then the columns to check | Only if using the `check` strategy | ["status"] | +| [updated_at](/reference/resource-configs/updated_at) | If using the `timestamp` strategy, the timestamp column to compare | Only if using the `timestamp` strategy | updated_at | +| [invalidate_hard_deletes](/reference/resource-configs/invalidate_hard_deletes) | Find hard deleted records in source, and set `dbt_valid_to` current time if no longer exists | No | True | + +- A number of other configurations are also supported (e.g. `tags` and `post-hook`), check out the full list [here](/reference/snapshot-configs). +- Snapshots can be configured from both your `dbt_project.yml` file and a `config` block, check out the [configuration docs](/reference/snapshot-configs) for more information. +- Note: BigQuery users can use `target_project` and `target_dataset` as aliases for `target_database` and `target_schema`, respectively. + +## Add snapshot to a project + +To add a snapshot to your project: + +1. Create a file in your `snapshots` directory with a `.sql` file extension, e.g. `snapshots/orders.sql` +2. Use a `snapshot` block to define the start and end of a snapshot: + + + +```sql +{% snapshot orders_snapshot %} + +{% endsnapshot %} +``` + + + +3. Write a `select` statement within the snapshot block (tips for writing a good snapshot query are below). This select statement defines the results that you want to snapshot over time. You can use `sources` or `refs` here. + + + +```sql +{% snapshot orders_snapshot %} + +select * from {{ source('jaffle_shop', 'orders') }} + +{% endsnapshot %} +``` + + + +4. Check whether the result set of your query includes a reliable timestamp column that indicates when a record was last updated. For our example, the `updated_at` column reliably indicates record changes, so we can use the `timestamp` strategy. If your query result set does not have a reliable timestamp, you'll need to instead use the `check` strategy — more details on this in the next step. + +5. Add configurations to your snapshot using a `config` block. You can also configure your snapshot from your `dbt_project.yml` file ([docs](/reference/snapshot-configs)). + + + +```sql +{% snapshot orders_snapshot %} + +{{ + config( + target_database='analytics', + target_schema='snapshots', + unique_key='id', + + strategy='timestamp', + updated_at='updated_at', + ) +}} + +select * from {{ source('jaffle_shop', 'orders') }} + +{% endsnapshot %} +``` + + + +6. Run the `dbt snapshot` [command](/reference/commands/snapshot). For our example, a new table will be created at `analytics.snapshots.orders_snapshot`. You can change the `target_database` configuration, the `target_schema` configuration and the name of the snapshot (as defined in `{% snapshot .. %}`) will change how dbt names this table. + +``` +dbt snapshot +Running with dbt=1.8.0 + +15:07:36 | Concurrency: 8 threads (target='dev') +15:07:36 | +15:07:36 | 1 of 1 START snapshot snapshots.orders_snapshot...... [RUN] +15:07:36 | 1 of 1 OK snapshot snapshots.orders_snapshot..........[SELECT 3 in 1.82s] +15:07:36 | +15:07:36 | Finished running 1 snapshots in 0.68s. + +Completed successfully + +Done. PASS=2 ERROR=0 SKIP=0 TOTAL=1 +``` + +7. Inspect the results by selecting from the table dbt created. After the first run, you should see the results of your query, plus the [snapshot meta fields](#snapshot-meta-fields) as described earlier. + +8. Run the `dbt snapshot` command again, and inspect the results. If any records have been updated, the snapshot should reflect this. + +9. Select from the `snapshot` in downstream models using the `ref` function. + + + +```sql +select * from {{ ref('orders_snapshot') }} +``` + + + +10. Snapshots are only useful if you run them frequently — schedule the `snapshot` command to run regularly. + ## Examples This section outlines some examples of how to apply configurations to snapshots using the legacy method. -### Apply configurations to one snapshot only + + Use config blocks if you need to apply a configuration to one snapshot only. @@ -288,8 +389,9 @@ Use config blocks if you need to apply a configuration to one snapshot only. {% endsnapshot %} ``` + -### Using the updated_at parameter + The `updated_at` parameter is required if using the timestamp strategy. The `updated_at` parameter is a column within the results of your snapshot query that represents when the record row was last updated. @@ -303,6 +405,8 @@ The `updated_at` parameter is required if using the timestamp strategy. The `upd ``` +#### Examples + - #### Using a column name `updated_at`: @@ -355,8 +459,10 @@ The `updated_at` parameter is required if using the timestamp strategy. The `upd {% endsnapshot %} ``` + + + -### Using the unique_key parameter The `unique_key` is a column name or expression that is unique for the inputs of a snapshot. dbt uses [`unique_key`](/reference/resource-configs/unique_key) to match records between a result set and an existing snapshot, so that changes can be captured correctly. @@ -433,27 +539,33 @@ The `unique_key` is a column name or expression that is unique for the inputs of {% endsnapshot %} ``` + -## Migrate from legacy to update +## Migrate legacy snapshot configs -This page outlines the steps you need to follow to convert legacy jinja block snapshot configurations into the updaetd YAML-based configuration format. +This section outlines the steps you need to follow to migrate legacy jinja block snapshot configurations into the updated YAML-based configuration format. Why use the updated YAML spec? -- Performance: YAML-based configurations are processed faster by dbt, leading to improved performance, especially during parsing and compilation. -- Maintainability: Centralizing configuration in YAML makes it easier to manage and update snapshot settings without editing the SQL logic directly. -- Consistency: YAML configuration aligns snapshot definitions with other dbt resources, such as models and seeds, leading to a more consistent project structure. - +- YAML-based configurations are processed faster by dbt, leading to improved performance, especially during parsing and compilation. +- Centralizing configuration in YAML makes it easier to manage and update snapshot settings without editing the SQL logic directly. +- YAML configuration aligns snapshot definitions with other dbt resources, such as models and seeds, leading to a more consistent project structure. -Note: In versions prior to v1.9, the target_schema (required) and target_database (optional) configurations defined a single schema or database to build a snapshot across users and environment. This created problems when testing or developing a snapshot, as there was no clear separation between development and production environments. In v1.9, target_schema became optional, allowing snapshots to be environment-aware. By default, without target_schema or target_database defined, snapshots now use the generate_schema_name or generate_database_name macros to determine where to build. Developers can still set a custom location with schema and database configs, consistent with other resource types. +#### Considerations +- In versions prior to v1.9, the `target_schema` (required) and `target_database` (optional) configurations defined a single schema or database to build a snapshot across users and environment. This created problems when testing or developing a snapshot, as there was no clear separation between development and production environments. +- In v1.9, `target_schema` became optional, allowing snapshots to be environment-aware. +- By default, without `target_schema` or `target_database` defined, snapshots now use the `generate_schema_name` or `generate_database_name` macros to determine where to build. +- Developers can still set a custom location with schema and database configs, consistent with other resource types. ### How to migrate +The latest YAML-based configuration syntax is best suited for new snapshots. If you're migrating existing snapshots, consider the following steps: -1. Move any configurations currently written within the jinja block (like unique_key, strategy, updated_at, and so on) into the YAML file. - - The configurations are structured similarly to how you would define a model in `dbt_project.yml.` +1. Migrate the previous snapshot to the new table schema and values. + - Create a backup copy of your snapshots. + - Use `alter` statements as needed (or a script to apply `alter` statements) to ensure table consistency. +2. Convert any configurations currently written within the jinja block (like `unique_key`, `strategy`, `updated_at`, and so on) into the YAML file structure, one at a time and testing as you go. - Here's an example conversion: + The configurations are structured similarly to how you would define a model in `dbt_project.yml.`: ```yaml @@ -466,16 +578,16 @@ Note: In versions prior to v1.9, the target_schema (required) and target_databas ``` -Note: The `unique_key`, strategy, and `updated_at` fields must match the settings previously defined in your jinja block. + Note: The `unique_key`, strategy, and `updated_at` fields must match the settings previously defined in your jinja block. -2. Before removing the old jinja block, run the dbt snapshot command using the new YAML configuration to confirm that the snapshot behaves as expected. - - Verify that the data is processed correctly (for example,no data loss or incorrect records). - - Make surethe performance is either the same or improved compared to the old configuration. +3. Before removing the old jinja block, run the `dbt snapshot` command using the new YAML configuration to confirm that the snapshot behaves as expected. + - Verify that the data is processed correctly (for example, no data loss or incorrect records). + - Make sure the performance is either the same or improved compared to the old configuration. - After running the new snapshot, inspect the snapshot tables in your data warehouse to confirm the new snapshot records match the old data. -3. Once you’ve confirmed that the new YAML configuration works properly, safely remove the old snapshot jinja block from your `.sql` file. This keeps your codebase clean and fully migrated to the new method. +4. Once you’ve confirmed that the new YAML configuration works properly, safely remove the old snapshot jinja block from your `.sql` file. This keeps your codebase clean and fully migrated to the new method. -4. If your snapshots require more complex transformations, consider using an ephemeral model to handle the transformations before referencing it in the snapshot. An ephemeral model can encapsulate transformations and simplify the snapshot query itself. +5. If your snapshots require more complex transformations, consider using an ephemeral model to handle the transformations before referencing it in the snapshot. An ephemeral model can encapsulate transformations and simplify the snapshot query itself. Example of using an ephemeral model: From 5d66d62a1eb6facd0048c7985a4d92a72cfafb47 Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Tue, 21 Jan 2025 15:58:39 +0000 Subject: [PATCH 04/18] updates to snapshot legacy page --- website/docs/docs/build/snapshots.md | 2 +- .../docs/reference/resource-configs/snapshots-jinja-legacy.md | 4 +++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/website/docs/docs/build/snapshots.md b/website/docs/docs/build/snapshots.md index f465a2ec55c..a2083020109 100644 --- a/website/docs/docs/build/snapshots.md +++ b/website/docs/docs/build/snapshots.md @@ -40,7 +40,7 @@ This order is now in the "shipped" state, but we've lost the information about w In dbt versions 1.8 and earlier, snapshots are `select` statements, defined within a snapshot block in a `.sql` file (typically in your `snapshots` directory or any other directory). This is considered a legacy syntax that allows for defining multiple resources in a single file and can slow down parsing and compilation. You'll also need to configure your snapshot to tell dbt how to detect record changes. - For more information about configuring snapshots in dbt versions 1.8 and earlier, examples, and [migration steps](/reference/resource-configs/snapshots-jinja-legacy#migrate-legacy-snapshot-configs) — refer to the [Legacy snapshot configurations](/reference/resource-configs/snapshots-jinja-legacy) page. -- For faster and more efficient management, consider [choosing the "Latest" release track in dbt Cloud](/docs/dbt-versions/cloud-release-tracks) or the [latest version of dbt Core](/docs/dbt-versions/core), which introduces an updated snapshot configuration syntax that optimizes performance. +- For faster and more efficient management, consider [choosing the "Latest" release track in dbt Cloud](/docs/dbt-versions/cloud-release-tracks) or the [latest version of dbt Core](/docs/dbt-versions/core), which uses a more ergonomic snapshot configuration syntax that also speeds up parsing and compilation. The following example shows how to configure a snapshot using the legacy syntax: diff --git a/website/docs/reference/resource-configs/snapshots-jinja-legacy.md b/website/docs/reference/resource-configs/snapshots-jinja-legacy.md index 16b6d143554..25a14edb131 100644 --- a/website/docs/reference/resource-configs/snapshots-jinja-legacy.md +++ b/website/docs/reference/resource-configs/snapshots-jinja-legacy.md @@ -8,7 +8,9 @@ sidebar: Legacy snapshot configurations From dbt versions 1.8 and earlier, you were able to configure [snapshots](/docs/build/snapshots) using jinja blocks in your `.sql` files. This is considered legacy syntax and was replaced with a [YAML-based configuration](/reference/snapshot-configs#configuring-snapshots) in [dbt Cloud's "Latest" release track](/docs/dbt-versions/cloud-release-tracks) and dbt v1.9 for faster and more efficient management. -This page details how to use the legacy SQL-based configurations and provides a path to migrate to the more efficient YAML configuration. For new snapshots, we recommend using these latest YAML-based configs. If applying them to existing snapshots, you'll need to [migrate over](/reference/snapshot-configs#snapshot-configuration-migration). +However, there are situations where you might still need to use the legacy syntax for snapshots in any dbt version or release track. This page details how you can use the legacy SQL-based configurations and provides a path to migrate to the more efficient YAML configuration. + +For new snapshots, we recommend using these latest YAML-based configs. If applying them to existing snapshots, you'll need to [migrate over](/reference/snapshot-configs#snapshot-configuration-migration). The following outlines the differences between the legacy SQL-based syntax and the updated YAML-based syntax: From 1c081a986777145b189b079b1640e6703f89e138 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Wed, 22 Jan 2025 12:47:20 +0000 Subject: [PATCH 05/18] Update website/docs/docs/build/snapshots.md --- website/docs/docs/build/snapshots.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/snapshots.md b/website/docs/docs/build/snapshots.md index 951954a08f3..1775ad73c5f 100644 --- a/website/docs/docs/build/snapshots.md +++ b/website/docs/docs/build/snapshots.md @@ -79,7 +79,7 @@ The following table outlines the configurations available for snapshots in versi | [updated_at](/reference/resource-configs/updated_at) | If using the `timestamp` strategy, the timestamp column to compare | Only if using the `timestamp` strategy | updated_at | | [invalidate_hard_deletes](/reference/resource-configs/invalidate_hard_deletes) | Find hard deleted records in source, and set `dbt_valid_to` current time if no longer exists | No | True | -- A number of other configurations are also supported (e.g. `tags` and `post-hook`), check out the full list [here](/reference/snapshot-configs). +- A number of other configurations are also supported (like `tags` and `post-hook`), check out the full list [here](/reference/snapshot-configs). - Snapshots can be configured from both your `dbt_project.yml` file and a `config` block, check out the [configuration docs](/reference/snapshot-configs) for more information. - Note: BigQuery users can use `target_project` and `target_dataset` as aliases for `target_database` and `target_schema`, respectively. From 47e0f67a0d3acad670ba1db3471d515e9d675b20 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Wed, 22 Jan 2025 12:47:47 +0000 Subject: [PATCH 06/18] Update website/docs/docs/build/snapshots.md --- website/docs/docs/build/snapshots.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/snapshots.md b/website/docs/docs/build/snapshots.md index 1775ad73c5f..9f2355c4881 100644 --- a/website/docs/docs/build/snapshots.md +++ b/website/docs/docs/build/snapshots.md @@ -87,7 +87,7 @@ The following table outlines the configurations available for snapshots in versi To add a snapshot to your project: -1. Create a file in your `snapshots` directory with a `.sql` file extension, e.g. `snapshots/orders.sql` +1. Create a file in your `snapshots` directory with a `.sql` file extension. For example, `snapshots/orders.sql` 2. Use a `snapshot` block to define the start and end of a snapshot: From 1085f4988e3cf2b88859b0adf0e325b667735926 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Wed, 22 Jan 2025 12:48:05 +0000 Subject: [PATCH 07/18] Update website/docs/reference/resource-configs/snapshots-jinja-legacy.md --- .../docs/reference/resource-configs/snapshots-jinja-legacy.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/reference/resource-configs/snapshots-jinja-legacy.md b/website/docs/reference/resource-configs/snapshots-jinja-legacy.md index 25a14edb131..00a85515778 100644 --- a/website/docs/reference/resource-configs/snapshots-jinja-legacy.md +++ b/website/docs/reference/resource-configs/snapshots-jinja-legacy.md @@ -270,7 +270,7 @@ The following table outlines the configurations available for snapshots in versi | [updated_at](/reference/resource-configs/updated_at) | If using the `timestamp` strategy, the timestamp column to compare | Only if using the `timestamp` strategy | updated_at | | [invalidate_hard_deletes](/reference/resource-configs/invalidate_hard_deletes) | Find hard deleted records in source, and set `dbt_valid_to` current time if no longer exists | No | True | -- A number of other configurations are also supported (e.g. `tags` and `post-hook`), check out the full list [here](/reference/snapshot-configs). +- A number of other configurations are also supported (like, `tags` and `post-hook`), check out the full list [here](/reference/snapshot-configs). - Snapshots can be configured from both your `dbt_project.yml` file and a `config` block, check out the [configuration docs](/reference/snapshot-configs) for more information. - Note: BigQuery users can use `target_project` and `target_dataset` as aliases for `target_database` and `target_schema`, respectively. From 58f6f262d5dc5b2835abcb30e57d648970183571 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Wed, 22 Jan 2025 12:48:31 +0000 Subject: [PATCH 08/18] Update website/docs/reference/resource-configs/snapshots-jinja-legacy.md --- .../docs/reference/resource-configs/snapshots-jinja-legacy.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/reference/resource-configs/snapshots-jinja-legacy.md b/website/docs/reference/resource-configs/snapshots-jinja-legacy.md index 00a85515778..420b95cb97e 100644 --- a/website/docs/reference/resource-configs/snapshots-jinja-legacy.md +++ b/website/docs/reference/resource-configs/snapshots-jinja-legacy.md @@ -495,7 +495,7 @@ The `unique_key` is a column name or expression that is unique for the inputs of - #### Using a combination of two columns as a unique key - This configuration accepts a valid column expression. As such, you can concatenate two columns together as a unique key if required. It's a good idea to use a separator (e.g. '-') to ensure uniqueness. + This configuration accepts a valid column expression. As such, you can concatenate two columns together as a unique key if required. It's a good idea to use a separator (like, '-') to ensure uniqueness. From 4dd74f0a5725dae0b4b277777c0b376882d59819 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Wed, 22 Jan 2025 12:48:52 +0000 Subject: [PATCH 09/18] Update website/docs/reference/resource-configs/snapshots-jinja-legacy.md --- .../docs/reference/resource-configs/snapshots-jinja-legacy.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/reference/resource-configs/snapshots-jinja-legacy.md b/website/docs/reference/resource-configs/snapshots-jinja-legacy.md index 420b95cb97e..054b9497834 100644 --- a/website/docs/reference/resource-configs/snapshots-jinja-legacy.md +++ b/website/docs/reference/resource-configs/snapshots-jinja-legacy.md @@ -278,7 +278,7 @@ The following table outlines the configurations available for snapshots in versi To add a snapshot to your project: -1. Create a file in your `snapshots` directory with a `.sql` file extension, e.g. `snapshots/orders.sql` +1. Create a file in your `snapshots` directory with a `.sql` file extension. For example,`snapshots/orders.sql` 2. Use a `snapshot` block to define the start and end of a snapshot: From 4b2499a715d1e5b4f90b32c083990049884fbc5f Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Wed, 19 Feb 2025 16:56:01 +0000 Subject: [PATCH 10/18] update --- .../reference/resource-configs/snapshots-jinja-legacy.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/website/docs/reference/resource-configs/snapshots-jinja-legacy.md b/website/docs/reference/resource-configs/snapshots-jinja-legacy.md index 054b9497834..a1d6bb16e6f 100644 --- a/website/docs/reference/resource-configs/snapshots-jinja-legacy.md +++ b/website/docs/reference/resource-configs/snapshots-jinja-legacy.md @@ -6,11 +6,11 @@ sidebar: Legacy snapshot configurations # Legacy snapshot configuration -From dbt versions 1.8 and earlier, you were able to configure [snapshots](/docs/build/snapshots) using jinja blocks in your `.sql` files. This is considered legacy syntax and was replaced with a [YAML-based configuration](/reference/snapshot-configs#configuring-snapshots) in [dbt Cloud's "Latest" release track](/docs/dbt-versions/cloud-release-tracks) and dbt v1.9 for faster and more efficient management. +You can configure [snapshots](/docs/build/snapshots) using jinja blocks in your `.sql` files, which is considered legacy syntax. In dbt v1.9, this syntax was replaced with a [YAML-based configuration](/reference/snapshot-configs#configuring-snapshots) in [dbt Cloud's "Latest" release track](/docs/dbt-versions/cloud-release-tracks). -However, there are situations where you might still need to use the legacy syntax for snapshots in any dbt version or release track. This page details how you can use the legacy SQL-based configurations and provides a path to migrate to the more efficient YAML configuration. +However, there are situations where you might still need to use the legacy syntax for snapshots in any dbt version or release track. This page details how you can use the legacy SQL-based configurations if you need to. -For new snapshots, we recommend using these latest YAML-based configs. If applying them to existing snapshots, you'll need to [migrate over](/reference/snapshot-configs#snapshot-configuration-migration). +For new snapshots, we recommend using these latest YAML-based configs. If you'd like to move to the YAML-based configuration for existing snapshots, you can [migrate over](/reference/snapshot-configs#snapshot-configuration-migration). The following outlines the differences between the legacy SQL-based syntax and the updated YAML-based syntax: From f293e47c9149812f041c9b0fcc14e1850894f221 Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Mon, 3 Mar 2025 13:28:12 +0000 Subject: [PATCH 11/18] joels feedback --- website/docs/docs/build/snapshots.md | 6 +- .../snapshots-jinja-legacy.md | 333 +++++++----------- 2 files changed, 137 insertions(+), 202 deletions(-) diff --git a/website/docs/docs/build/snapshots.md b/website/docs/docs/build/snapshots.md index 91a5ac815f2..cceee34905a 100644 --- a/website/docs/docs/build/snapshots.md +++ b/website/docs/docs/build/snapshots.md @@ -37,10 +37,10 @@ This order is now in the "shipped" state, but we've lost the information about w -In dbt versions 1.8 and earlier, snapshots are `select` statements, defined within a snapshot block in a `.sql` file (typically in your `snapshots` directory or any other directory). This is considered a legacy syntax that allows for defining multiple resources in a single file and can slow down parsing and compilation. You'll also need to configure your snapshot to tell dbt how to detect record changes. +Define snapshots within a snapshot block in a `.sql` file (typically in your `snapshots` directory or any other directory). This is considered a legacy syntax that allows for defining multiple resources in a single file and can slow down parsing and compilation. You'll also need to configure your snapshot to tell dbt how to detect record changes. -- For more information about configuring snapshots in dbt versions 1.8 and earlier, examples, and [migration steps](/reference/resource-configs/snapshots-jinja-legacy#migrate-legacy-snapshot-configs) — refer to the [Legacy snapshot configurations](/reference/resource-configs/snapshots-jinja-legacy) page. -- For faster and more efficient management, consider [choosing the "Latest" release track in dbt Cloud](/docs/dbt-versions/cloud-release-tracks) or the [latest version of dbt Core](/docs/dbt-versions/core), which uses a more ergonomic snapshot configuration syntax that also speeds up parsing and compilation. +- For more information about configuring snapshots in a `.sql` file, refer to the [Legacy snapshot configurations](/reference/resource-configs/snapshots-jinja-legacy) page. +- For faster and more efficient management, consider [choosing the "Latest" release track in dbt Cloud](/docs/dbt-versions/cloud-release-tracks) or the [latest version of dbt Core](/docs/dbt-versions/core), which uses a more ergonomic snapshot configuration syntax. The following example shows how to configure a snapshot using the legacy syntax: diff --git a/website/docs/reference/resource-configs/snapshots-jinja-legacy.md b/website/docs/reference/resource-configs/snapshots-jinja-legacy.md index a1d6bb16e6f..6b4e774a8f9 100644 --- a/website/docs/reference/resource-configs/snapshots-jinja-legacy.md +++ b/website/docs/reference/resource-configs/snapshots-jinja-legacy.md @@ -1,99 +1,34 @@ --- title: Legacy snapshot configurations -description: Read about legacy snapshot jinja blocks and how to migrate to the updated syntax -sidebar: Legacy snapshot configurations +description: Read about how to configure snapshots using legacy jinja blocks +sidebar: Legacy configuration --- # Legacy snapshot configuration -You can configure [snapshots](/docs/build/snapshots) using jinja blocks in your `.sql` files, which is considered legacy syntax. In dbt v1.9, this syntax was replaced with a [YAML-based configuration](/reference/snapshot-configs#configuring-snapshots) in [dbt Cloud's "Latest" release track](/docs/dbt-versions/cloud-release-tracks). + -However, there are situations where you might still need to use the legacy syntax for snapshots in any dbt version or release track. This page details how you can use the legacy SQL-based configurations if you need to. +Configure snapshots using jinja blocks in your `.sql` files, which is available now and considered legacy syntax. -For new snapshots, we recommend using these latest YAML-based configs. If you'd like to move to the YAML-based configuration for existing snapshots, you can [migrate over](/reference/snapshot-configs#snapshot-configuration-migration). - -The following outlines the differences between the legacy SQL-based syntax and the updated YAML-based syntax: - - - -Legacy syntax for defining snapshots in `.sql` files within a snapshot Jinja block, typically located in your `snapshots` directory. Available in dbt v1.8 and earlier. - -#### Use cases - -- Defining multiple resources in a single file, though it can significantly slow down parsing and compilation. -- Useful for existing snapshots already using this syntax. -- Suitable for performing very light transformations (but creating a separate ephemeral model for transformations is recommended for better maintainability). - -#### Example - -In this example, we created a snapshot in the `snapshots` directory and used the `{{ config() }}` block within the jinja block to define the snapshot configuration. - - - -```sql -{% snapshot orders_snapshot %} - -{{ config( - target_database='analytics', - target_schema='snapshots', - unique_key='id', - strategy='timestamp', - updated_at='updated_at' -) }} - -select * from {{ source('jaffle_shop', 'orders') }} - -{% endsnapshot %} -``` - - - - + -Updated syntax for defining snapshot configurations in YAML files. Found in `snapshots.yml`. Available in dbt Cloud's "Latest" release track and dbt v1.9 and later. +There are situations where you want to use the legacy syntax for [snapshots](/docs/build/snapshots) in any dbt version or release track. This page details how you can use the legacy SQL-based configurations if you need to. -#### Use cases - -- More performant and easier to manage. -- Ideal for new snapshots or existing snapshots that need to be [migrated](/reference/snapshot-configs#snapshot-configuration-migration). -- Create transformations separate from the snapshot file by creating an ephemeral model and referencing it in the snapshot using the `relation` field. - -#### Example - -In this example, we created a snapshot in the `snapshots` directory (and separately an ephemeral model in the `models` directory). We then used the [`ref` function](/reference/dbt-jinja-functions/ref) to reference the ephemeral model in the `snapshots.yml` file in the `relation` field. - - - -```yaml -snapshots: - - name: orders_snapshot - relation: ref('orders_ephemeral') - config: - unique_key: id - strategy: timestamp - updated_at: updated_at -``` - +In dbt v1.9, this syntax was replaced with a [YAML-based configuration](/reference/snapshot-configs#configuring-snapshots) in [dbt Cloud's "Latest" release track](/docs/dbt-versions/cloud-release-tracks). The benefits of YAML-based configurations are that the snapshots are environment aware, meaning you don't have to specify `schema` or `database`, and the syntax is more concise. -In this second example, we used the `relation` field to reference the source table using the [`source` function](/reference/dbt-jinja-functions/source). +For new snapshots, we recommend using these latest YAML-based configs. If you'd like to move to the YAML-based configuration for existing snapshots, you can [migrate over](/reference/snapshot-configs#snapshot-configuration-migration). - +When would you want to use the SQL-based syntax and YAML-based syntax? -```yaml -snapshots: - - name: orders_snapshot - relation: source('jaffle_shop', 'orders') - config: - schema: snapshots - database: analytics - unique_key: id - strategy: timestamp - updated_at: updated_at - dbt_valid_to_current: "to_date('9999-12-31')" +- SQL-based syntax: + - Defined in `.sql` files within a snapshot Jinja block, typically located in your `snapshots` directory. Available in all versions. + - Useful for existing snapshots already using this syntax. + - Suitable for performing very light transformations (but creating a separate ephemeral model for transformations is recommended for better maintainability). -``` - - +- YAML-based syntax: + - Defined in `whatever_name.yml` or in the `snapshots` or `models` directory you prefer. Available in dbt Cloud's "Latest" release track and dbt v1.9 and later. + - Ideal for new snapshots or existing snapshots that need to be [migrated](/reference/snapshot-configs#snapshot-configuration-migration). + - Create transformations separate from the snapshot file by creating an ephemeral model and referencing it in the snapshot using the `relation` field. ## Snapshot configurations @@ -103,7 +38,9 @@ Snapshots can be configured in two main ways: - Using [snapshot-specific configurations](#snapshot-specific-configurations) - Or using [general configurations](#general-configuration) -These configurations allow you to control how dbt detects changes in your data and where snapshots are stored. Both types of configurations can coexist in your project in the same `config` block (or from your `dbt_project.yml` file or `properties.yaml` file). You can also configure snapshots using [strategies](#snapshot-strategies), which define how dbt knows if a row has changed. +These configurations allow you to control how dbt detects changes in your data and where snapshots are stored. Both types of configurations can coexist in your project in the same `config` block (or from your `dbt_project.yml` file or `properties.yaml` file). + +One of the most important configs you can decide is [strategies](#snapshot-strategies), which tells dbt how to detect modified rows. ### Snapshot specific configurations Snapshot-specific configurations are applicable to only one dbt resource type rather than multiple resource types. You can define these settings within the resource’s file using the `{{ config() }}` macro (as well as in the project file (`dbt_project.yml`) or a property file (`models/properties.yml` for models, similarly for other resources)). @@ -254,11 +191,13 @@ The check strategy is useful for tables which do not have a reliable `updated_at -## Configure snapshots +## Configuration reference + +Configure your snapshot to tell dbt how to detect record changes. Snapshots are `select` statements, defined within a snapshot block in a `.sql` file (typically in your `snapshots` directory or any other directory). -In dbt versions 1.8 and earlier, snapshots are `select` statements, defined within a snapshot block in a `.sql` file (typically in your `snapshots` directory or any other directory). You'll also need to configure your snapshot to tell dbt how to detect record changes. +The following table outlines the configurations available for snapshots: -The following table outlines the configurations available for snapshots in versions 1.8 and earlier: + | Config | Description | Required? | Example | | ------ | ----------- | --------- | ------- | @@ -274,6 +213,28 @@ The following table outlines the configurations available for snapshots in versi - Snapshots can be configured from both your `dbt_project.yml` file and a `config` block, check out the [configuration docs](/reference/snapshot-configs) for more information. - Note: BigQuery users can use `target_project` and `target_dataset` as aliases for `target_database` and `target_schema`, respectively. + + + + +| Config | Description | Required? | Example | +| ------ | ----------- | --------- | ------- | +| [database](/reference/resource-configs/database) |Specify a custom database for the snapshot | No | analytics | +| [schema](/reference/resource-configs/schema) | Specify a custom schema for the snapshot | No | snapshots | +| [strategy](/reference/resource-configs/strategy) | The snapshot strategy to use. One of `timestamp` or `check` | Yes | timestamp | +| [unique_key](/reference/resource-configs/unique_key) | A column or expression for the record | Yes | id | +| [check_cols](/reference/resource-configs/check_cols) | If using the `check` strategy, then the columns to check | Only if using the `check` strategy | ["status"] | +| [updated_at](/reference/resource-configs/updated_at) | If using the `timestamp` strategy, the timestamp column to compare | Only if using the `timestamp` strategy | updated_at | +| [invalidate_hard_deletes](/reference/resource-configs/invalidate_hard_deletes) | Find hard deleted records in source, and set `dbt_valid_to` current time if no longer exists | No | True | + +- A number of other configurations are also supported (like, `tags` and `post-hook`), check out the full list [here](/reference/snapshot-configs). +- Snapshots can be configured from both your `dbt_project.yml` file and a `config` block, check out the [configuration docs](/reference/snapshot-configs) for more information. +- Note: BigQuery users can use `target_project` and `target_dataset` as aliases for `target_database` and `target_schema`, respectively. +- Before v1.9, `target_schema` (required) and `target_database` (optional) set a fixed schema or database for snapshots, making it hard to separate dev and prod environments. In v1.9, `target_schema` became optional, allowing environment-aware snapshots. By default, snapshots now use `generate_schema_name` or `generate_database_name`, but developers can still specify a custom location using [schema](/reference/resource-configs/schema) and [database](/reference/resource-configs/database), consistent with other resource types. + + + + ## Add snapshot to a project To add a snapshot to your project: @@ -309,6 +270,8 @@ select * from {{ source('jaffle_shop', 'orders') }} 5. Add configurations to your snapshot using a `config` block. You can also configure your snapshot from your `dbt_project.yml` file ([docs](/reference/snapshot-configs)). + + ```sql @@ -323,7 +286,7 @@ select * from {{ source('jaffle_shop', 'orders') }} strategy='timestamp', updated_at='updated_at', ) -}} +The following table outlines the configurations available for snapshots: select * from {{ source('jaffle_shop', 'orders') }} @@ -331,6 +294,34 @@ select * from {{ source('jaffle_shop', 'orders') }} ``` + + + + + + +```sql +{% snapshot orders_snapshot %} + +{{ + config( + database='analytics', + schema='snapshots', + unique_key='id', + + strategy='timestamp', + updated_at='updated_at', + ) +The following table outlines the configurations available for snapshots: + +select * from {{ source('jaffle_shop', 'orders') }} + +{% endsnapshot %} +``` + + + + 6. Run the `dbt snapshot` [command](/reference/commands/snapshot). For our example, a new table will be created at `analytics.snapshots.orders_snapshot`. You can change the `target_database` configuration, the `target_schema` configuration and the name of the snapshot (as defined in `{% snapshot .. %}`) will change how dbt names this table. @@ -410,6 +401,8 @@ The `updated_at` parameter is required if using the timestamp strategy. The `upd #### Examples - #### Using a column name `updated_at`: + + ```sql @@ -430,6 +423,30 @@ The `updated_at` parameter is required if using the timestamp strategy. The `upd {% endsnapshot %} ``` + + + + + + ```sql + {% snapshot orders_snapshot %} + + {{ + config( + schema='snapshots', + unique_key='id', + + strategy='timestamp', + updated_at='updated_at' + ) + }} + + select * from {{ source('jaffle_shop', 'orders') }} + + {% endsnapshot %} + ``` + + - #### Coalescing two columns to create a reliable `updated_at` column: @@ -437,6 +454,34 @@ The `updated_at` parameter is required if using the timestamp strategy. The `upd Since the `updated_at` configuration only takes a column name, rather than an expression, you should update your snapshot query to include the coalesced column. + + + + ```sql + {% snapshot orders_snapshot %} + + {{ + config( + schema='snapshots', + unique_key='id', + + strategy='timestamp', + updated_at='updated_at_for_snapshot' + ) + }} + + select + *, + coalesce(updated_at, created_at) as updated_at_for_snapshot + + from {{ source('jaffle_shop', 'orders') }} + + {% endsnapshot %} + ``` + + + + ```sql @@ -461,6 +506,8 @@ The `updated_at` parameter is required if using the timestamp strategy. The `upd {% endsnapshot %} ``` + + @@ -542,115 +589,3 @@ The `unique_key` is a column name or expression that is unique for the inputs of ``` - -## Migrate legacy snapshot configs - -This section outlines the steps you need to follow to migrate legacy jinja block snapshot configurations into the updated YAML-based configuration format. - -Why use the updated YAML spec? - -- YAML-based configurations are processed faster by dbt, leading to improved performance, especially during parsing and compilation. -- Centralizing configuration in YAML makes it easier to manage and update snapshot settings without editing the SQL logic directly. -- YAML configuration aligns snapshot definitions with other dbt resources, such as models and seeds, leading to a more consistent project structure. - -#### Considerations -- In versions prior to v1.9, the `target_schema` (required) and `target_database` (optional) configurations defined a single schema or database to build a snapshot across users and environment. This created problems when testing or developing a snapshot, as there was no clear separation between development and production environments. -- In v1.9, `target_schema` became optional, allowing snapshots to be environment-aware. -- By default, without `target_schema` or `target_database` defined, snapshots now use the `generate_schema_name` or `generate_database_name` macros to determine where to build. -- Developers can still set a custom location with schema and database configs, consistent with other resource types. - -### How to migrate -The latest YAML-based configuration syntax is best suited for new snapshots. If you're migrating existing snapshots, consider the following steps: - -1. Migrate the previous snapshot to the new table schema and values. - - Create a backup copy of your snapshots. - - Use `alter` statements as needed (or a script to apply `alter` statements) to ensure table consistency. -2. Convert any configurations currently written within the jinja block (like `unique_key`, `strategy`, `updated_at`, and so on) into the YAML file structure, one at a time and testing as you go. - - The configurations are structured similarly to how you would define a model in `dbt_project.yml.`: - - - ```yaml - snapshots: - - name: orders_snapshot - schema: snapshots - unique_key: id - strategy: timestamp - updated_at: updated_at - ``` - - - Note: The `unique_key`, strategy, and `updated_at` fields must match the settings previously defined in your jinja block. - -3. Before removing the old jinja block, run the `dbt snapshot` command using the new YAML configuration to confirm that the snapshot behaves as expected. - - Verify that the data is processed correctly (for example, no data loss or incorrect records). - - Make sure the performance is either the same or improved compared to the old configuration. - - After running the new snapshot, inspect the snapshot tables in your data warehouse to confirm the new snapshot records match the old data. - -4. Once you’ve confirmed that the new YAML configuration works properly, safely remove the old snapshot jinja block from your `.sql` file. This keeps your codebase clean and fully migrated to the new method. - -5. If your snapshots require more complex transformations, consider using an ephemeral model to handle the transformations before referencing it in the snapshot. An ephemeral model can encapsulate transformations and simplify the snapshot query itself. - - Example of using an ephemeral model: - - - - ```sql - {{ - config(materialized='ephemeral') - }} - select * from {{ source('jaffle_shop', 'orders') }} - ``` - - - Example of the snapshot YAML configuration referencing the ephemeral model: - - - - ```yaml - snapshots: - - name: orders_snapshot - relation: ref('orders_ephemeral') - target_schema: snapshots - unique_key: id - strategy: timestamp - updated_at: updated_at - ``` - - -### Full migration example -Here’s a complete example of migrating from a legacy jinja block snapshot to a YAML-based snapshot configuration: - -#### Legacy method (jinja block) - - -```sql -{% snapshot orders_snapshot %} -{{ - config( - target_schema='snapshots', - unique_key='id', - strategy='timestamp', - updated_at='updated_at', - ) -}} -select * from {{ source('jaffle_shop', 'orders') }} -{% endsnapshot %} -``` - - -#### Recommended method (YAML configuration) - - - -```yaml -snapshots: - - name: orders_snapshot - schema: snapshots - unique_key: id - strategy: timestamp - updated_at: updated_at - ``` - - -By following these steps, you can smoothly transition from legacy jinja-based snapshots to the modern, more efficient YAML-based configurations. From 536cdee74a135f9f3831d9f29875da80ddbec08b Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Mon, 3 Mar 2025 16:52:23 +0000 Subject: [PATCH 12/18] remove dup --- website/docs/docs/build/snapshots.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/snapshots.md b/website/docs/docs/build/snapshots.md index cceee34905a..3907ee1f414 100644 --- a/website/docs/docs/build/snapshots.md +++ b/website/docs/docs/build/snapshots.md @@ -82,6 +82,7 @@ The following table outlines the configurations available for snapshots in versi - A number of other configurations are also supported (like `tags` and `post-hook`), check out the full list [here](/reference/snapshot-configs). - Snapshots can be configured from both your `dbt_project.yml` file and a `config` block, check out the [configuration docs](/reference/snapshot-configs) for more information. - Note: BigQuery users can use `target_project` and `target_dataset` as aliases for `target_database` and `target_schema`, respectively. +- Before v1.9, `target_schema` (required) and `target_database` (optional) set a fixed schema or database for snapshots, making it hard to separate dev and prod environments. In v1.9, `target_schema` became optional, allowing environment-aware snapshots. By default, snapshots now use `generate_schema_name` or `generate_database_name`, but developers can still specify a custom location using [schema](/reference/resource-configs/schema) and [database](/reference/resource-configs/database), consistent with other resource types. ### Configuration example @@ -249,7 +250,6 @@ The following table outlines the configurations available for snapshots: | [snapshot_meta_column_names](/reference/resource-configs/snapshot_meta_column_names) | Customize the names of the snapshot meta fields | No | dictionary | | [hard_deletes](/reference/resource-configs/hard-deletes) | Specify how to handle deleted rows from the source. Supported options are `ignore` (default), `invalidate` (replaces the legacy `invalidate_hard_deletes=true`), and `new_record`.| No | string | -- In versions prior to v1.9, the `target_schema` (required) and `target_database` (optional) configurations defined a single schema or database to build a snapshot across users and environment. This created problems when testing or developing a snapshot, as there was no clear separation between development and production environments. - In v1.9, `target_schema` became optional, allowing snapshots to be environment-aware. By default, without `target_schema` or `target_database` defined, snapshots now use the `generate_schema_name` or `generate_database_name` macros to determine where to build. - Developers can still set a custom location with [`schema`](/reference/resource-configs/schema) and [`database`](/reference/resource-configs/database) configs, consistent with other resource types. - A number of other configurations are also supported (for example, `tags` and `post-hook`). For the complete list, refer to [Snapshot configurations](/reference/snapshot-configs). From b435d0d514972f3673ce70ee0025b45403819c4f Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Tue, 4 Mar 2025 09:49:50 +0000 Subject: [PATCH 13/18] Update website/docs/docs/build/snapshots.md Co-authored-by: Joel Labes --- website/docs/docs/build/snapshots.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/website/docs/docs/build/snapshots.md b/website/docs/docs/build/snapshots.md index 3907ee1f414..a5987a6fadb 100644 --- a/website/docs/docs/build/snapshots.md +++ b/website/docs/docs/build/snapshots.md @@ -37,7 +37,9 @@ This order is now in the "shipped" state, but we've lost the information about w -Define snapshots within a snapshot block in a `.sql` file (typically in your `snapshots` directory or any other directory). This is considered a legacy syntax that allows for defining multiple resources in a single file and can slow down parsing and compilation. You'll also need to configure your snapshot to tell dbt how to detect record changes. +In old versions of dbt Core (v1.8 and earlier), snapshots must be defined in snapshot blocks inside of your [snapshots directory](/reference/project-configs/snapshot-paths). These snapshots do not have native support for environments or deferral, making previewing changes in development difficult. + +The modern, environment-aware way to create snapshots is to define them in YAML. This requires dbt Core v1.9 or later, or to be on any [dbt Cloud release track](/docs/dbt-versions/cloud-release-tracks). - For more information about configuring snapshots in a `.sql` file, refer to the [Legacy snapshot configurations](/reference/resource-configs/snapshots-jinja-legacy) page. - For faster and more efficient management, consider [choosing the "Latest" release track in dbt Cloud](/docs/dbt-versions/cloud-release-tracks) or the [latest version of dbt Core](/docs/dbt-versions/core), which uses a more ergonomic snapshot configuration syntax. From 53a4eb2d326c926733660853c7e8afb7226c15ae Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Tue, 4 Mar 2025 09:50:08 +0000 Subject: [PATCH 14/18] Update website/docs/docs/build/snapshots.md Co-authored-by: Joel Labes --- website/docs/docs/build/snapshots.md | 1 - 1 file changed, 1 deletion(-) diff --git a/website/docs/docs/build/snapshots.md b/website/docs/docs/build/snapshots.md index a5987a6fadb..daa3f0f573b 100644 --- a/website/docs/docs/build/snapshots.md +++ b/website/docs/docs/build/snapshots.md @@ -42,7 +42,6 @@ In old versions of dbt Core (v1.8 and earlier), snapshots must be defined in sna The modern, environment-aware way to create snapshots is to define them in YAML. This requires dbt Core v1.9 or later, or to be on any [dbt Cloud release track](/docs/dbt-versions/cloud-release-tracks). - For more information about configuring snapshots in a `.sql` file, refer to the [Legacy snapshot configurations](/reference/resource-configs/snapshots-jinja-legacy) page. -- For faster and more efficient management, consider [choosing the "Latest" release track in dbt Cloud](/docs/dbt-versions/cloud-release-tracks) or the [latest version of dbt Core](/docs/dbt-versions/core), which uses a more ergonomic snapshot configuration syntax. The following example shows how to configure a snapshot using the legacy syntax: From 3508a2ca9dcfe716827b72cb866d85514b2a190f Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Tue, 4 Mar 2025 09:50:34 +0000 Subject: [PATCH 15/18] Update website/docs/docs/build/snapshots.md Co-authored-by: nataliefiann <120089939+nataliefiann@users.noreply.github.com> --- website/docs/docs/build/snapshots.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/snapshots.md b/website/docs/docs/build/snapshots.md index daa3f0f573b..23098fe867a 100644 --- a/website/docs/docs/build/snapshots.md +++ b/website/docs/docs/build/snapshots.md @@ -145,7 +145,7 @@ select * from {{ source('jaffle_shop', 'orders') }} -6. Run the `dbt snapshot` [command](/reference/commands/snapshot) — for our example a new table will be created at `analytics.snapshots.orders_snapshot`. You can change the `target_database` configuration, the `target_schema` configuration and the name of the snapshot (as defined in `{% snapshot .. %}`) will change how dbt names this table. +6. Run the `dbt snapshot` [command](/reference/commands/snapshot) — for our example, a new table will be created at `analytics.snapshots.orders_snapshot`. You can change the `target_database` configuration, the `target_schema` configuration and the name of the snapshot (as defined in `{% snapshot .. %}`) will change how dbt names this table. From a468ff8f15ebcd8996be071d2653b1c4ad7d38ed Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Tue, 4 Mar 2025 09:59:31 +0000 Subject: [PATCH 16/18] fold natalie joel feedback --- .../resource-configs/snapshots-jinja-legacy.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/website/docs/reference/resource-configs/snapshots-jinja-legacy.md b/website/docs/reference/resource-configs/snapshots-jinja-legacy.md index 6b4e774a8f9..d56d9e4486d 100644 --- a/website/docs/reference/resource-configs/snapshots-jinja-legacy.md +++ b/website/docs/reference/resource-configs/snapshots-jinja-legacy.md @@ -8,7 +8,7 @@ sidebar: Legacy configuration -Configure snapshots using jinja blocks in your `.sql` files, which is available now and considered legacy syntax. +Configure snapshots using Jinja blocks in your `.sql` files, which is available now and considered legacy syntax. @@ -57,7 +57,7 @@ Snapshot-specific configurations are applicable to only one dbt resource type ra [strategy](/reference/resource-configs/strategy)="timestamp" | "check", [updated_at](/reference/resource-configs/updated_at)="", [check_cols](/reference/resource-configs/check_cols)=[""] | "all" - [invalidate_hard_deletes](/reference/resource-configs/check_cols) : true | false + [invalidate_hard_deletes](/reference/resource-configs/invalidate_hard_deletes) : true | false ) }} @@ -74,7 +74,7 @@ Use general configurations for broader operational settings applicable across mu ```sql {{ config( - [enabled](/reference/resource-configs/check_cols)=true | false, + [enabled](/reference/resource-configs/enabled)=true | false, [tags](/reference/resource-configs/tags)="" | [""], [alias](/reference/resource-configs/alias)="", [pre_hook](/reference/resource-configs/pre-hook-post-hook)="" | [""], @@ -268,7 +268,7 @@ select * from {{ source('jaffle_shop', 'orders') }} 4. Check whether the result set of your query includes a reliable timestamp column that indicates when a record was last updated. For our example, the `updated_at` column reliably indicates record changes, so we can use the `timestamp` strategy. If your query result set does not have a reliable timestamp, you'll need to instead use the `check` strategy — more details on this in the next step. -5. Add configurations to your snapshot using a `config` block. You can also configure your snapshot from your `dbt_project.yml` file ([docs](/reference/snapshot-configs)). +5. Add configurations to your snapshot using a `config` block. You can also [configure your snapshot from your `dbt_project.yml` file](/reference/snapshot-configs). @@ -341,11 +341,11 @@ Completed successfully Done. PASS=2 ERROR=0 SKIP=0 TOTAL=1 ``` -7. Inspect the results by selecting from the table dbt created. After the first run, you should see the results of your query, plus the [snapshot meta fields](#snapshot-meta-fields) as described earlier. +1. Inspect the results by selecting from the table dbt created. After the first run, you should see the results of your query, plus the [snapshot meta fields](/docs/build/snapshots#snapshot-meta-fields) as described earlier. -8. Run the `dbt snapshot` command again, and inspect the results. If any records have been updated, the snapshot should reflect this. +2. Run the `dbt snapshot` command again, and inspect the results. If any records have been updated, the snapshot should reflect this. -9. Select from the `snapshot` in downstream models using the `ref` function. +3. Select from the `snapshot` in downstream models using the `ref` function. From dc9f7d7404d64257352f7d8fe20079fb6b3d2a80 Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Tue, 4 Mar 2025 10:00:27 +0000 Subject: [PATCH 17/18] add quotes --- .../docs/reference/resource-configs/snapshots-jinja-legacy.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/reference/resource-configs/snapshots-jinja-legacy.md b/website/docs/reference/resource-configs/snapshots-jinja-legacy.md index d56d9e4486d..ca3f9730c0b 100644 --- a/website/docs/reference/resource-configs/snapshots-jinja-legacy.md +++ b/website/docs/reference/resource-configs/snapshots-jinja-legacy.md @@ -1,7 +1,7 @@ --- title: Legacy snapshot configurations description: Read about how to configure snapshots using legacy jinja blocks -sidebar: Legacy configuration +sidebar: "Legacy configuration" --- # Legacy snapshot configuration From e9e8431755e93ccf31ed59453e4562e8c8530d4e Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Tue, 4 Mar 2025 10:23:46 +0000 Subject: [PATCH 18/18] Update website/docs/reference/resource-configs/snapshots-jinja-legacy.md --- .../docs/reference/resource-configs/snapshots-jinja-legacy.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/reference/resource-configs/snapshots-jinja-legacy.md b/website/docs/reference/resource-configs/snapshots-jinja-legacy.md index ca3f9730c0b..ec43c01c1fe 100644 --- a/website/docs/reference/resource-configs/snapshots-jinja-legacy.md +++ b/website/docs/reference/resource-configs/snapshots-jinja-legacy.md @@ -8,7 +8,7 @@ sidebar: "Legacy configuration" -Configure snapshots using Jinja blocks in your `.sql` files, which is available now and considered legacy syntax. +Use legacy SQL-based snapshot configurations with Jinja blocks in any dbt version. dbt v1.9 introduced YAML-based configs for better readability and environment awareness.