Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug/databricks sql incremental #49

Merged
merged 12 commits into from
Jul 31, 2024
4 changes: 3 additions & 1 deletion .buildkite/hooks/pre-command
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,6 @@ export CI_SNOWFLAKE_DBT_WAREHOUSE=$(gcloud secrets versions access latest --secr
export CI_DATABRICKS_DBT_HOST=$(gcloud secrets versions access latest --secret="CI_DATABRICKS_DBT_HOST" --project="dbt-package-testing-363917")
export CI_DATABRICKS_DBT_HTTP_PATH=$(gcloud secrets versions access latest --secret="CI_DATABRICKS_DBT_HTTP_PATH" --project="dbt-package-testing-363917")
export CI_DATABRICKS_DBT_TOKEN=$(gcloud secrets versions access latest --secret="CI_DATABRICKS_DBT_TOKEN" --project="dbt-package-testing-363917")
export CI_DATABRICKS_DBT_CATALOG=$(gcloud secrets versions access latest --secret="CI_DATABRICKS_DBT_CATALOG" --project="dbt-package-testing-363917")
export CI_DATABRICKS_DBT_CATALOG=$(gcloud secrets versions access latest --secret="CI_DATABRICKS_DBT_CATALOG" --project="dbt-package-testing-363917")
export CI_DATABRICKS_SQL_DBT_HTTP_PATH=$(gcloud secrets versions access latest --secret="CI_DATABRICKS_SQL_DBT_HTTP_PATH" --project="dbt-package-testing-363917")
export CI_DATABRICKS_SQL_DBT_TOKEN=$(gcloud secrets versions access latest --secret="CI_DATABRICKS_SQL_DBT_TOKEN" --project="dbt-package-testing-363917")
17 changes: 16 additions & 1 deletion .buildkite/pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -71,4 +71,19 @@ steps:
- "CI_DATABRICKS_DBT_TOKEN"
- "CI_DATABRICKS_DBT_CATALOG"
commands: |
bash .buildkite/scripts/run_models.sh databricks
bash .buildkite/scripts/run_models.sh databricks

- label: ":databricks: :database: Run Tests - Databricks SQL Warehouse"
key: "run_dbt_databricks_sql"
plugins:
- docker#v3.13.0:
image: "python:3.8"
shell: [ "/bin/bash", "-e", "-c" ]
environment:
- "BASH_ENV=/tmp/.bashrc"
- "CI_DATABRICKS_DBT_HOST"
- "CI_DATABRICKS_SQL_DBT_HTTP_PATH"
- "CI_DATABRICKS_SQL_DBT_TOKEN"
- "CI_DATABRICKS_DBT_CATALOG"
commands: |
bash .buildkite/scripts/run_models.sh databricks-sql
10 changes: 10 additions & 0 deletions .buildkite/scripts/run_models.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,19 @@ db=$1
echo `pwd`
cd integration_tests
dbt deps
if [ "$db" = "databricks-sql" ]; then
dbt seed --vars '{mixpanel_schema: mixpanel_sqlw_tests}' --target "$db" --full-refresh
dbt compile --vars '{mixpanel_schema: mixpanel_sqlw_tests}' --target "$db"
dbt run --vars '{mixpanel_schema: mixpanel_sqlw_tests}' --target "$db" --full-refresh
dbt test --vars '{mixpanel_schema: mixpanel_sqlw_tests}' --target "$db"
dbt run --vars '{mixpanel_schema: mixpanel_sqlw_tests}' --target "$db"
dbt test --vars '{mixpanel_schema: mixpanel_sqlw_tests}' --target "$db"
else
dbt seed --target "$db" --full-refresh
dbt compile --target "$db"
dbt run --target "$db" --full-refresh
dbt test --target "$db"
dbt run --target "$db"
dbt test --target "$db"
fi
dbt run-operation fivetran_utils.drop_schemas_automation --target "$db"
23 changes: 23 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,26 @@
# dbt_mixpanel v0.10.0

[PR #49](https://github.com/fivetran/dbt_mixpanel/pull/49) includes the following updates:

## 🚨 Breaking Changes 🚨
> ⚠️ Since the following changes result in the table format changing, we recommend running a `--full-refresh` after upgrading to this version to avoid possible incremental failures.

- For Databricks All-Purpose clusters, incremental models will now be materialized using the delta table format (previously parquet).
- Delta tables are generally more performant than parquet and are also more widely available for Databricks users. This will also prevent compilation issues on customers' managed tables.

- For Databricks SQL Warehouses, incremental materialization will not be used due to the incompatibility of the `insert_overwrite` strategy.

## Under the Hood
- The `is_incremental_compatible` macro has been added and will return `true` if the target warehouse supports our chosen incremental strategy.
- This update was applied as there have been other Databricks runtimes discovered (ie. an endpoint and external runtime) which do not support the `insert_overwrite` incremental strategy used.
- Added integration testing for Databricks SQL Warehouse.
- Added consistency tests for models:
- `mixpanel__daily_events`
- `mixpanel__event`
- `mixpanel__monthly_events`
- `mixpanel__sessions`
- Updated logic for macro `mixpanel_lookback` to align with logic used in similar macros in other packages.

# dbt_mixpanel v0.9.0
[PR #41](https://github.com/fivetran/dbt_mixpanel/pull/41) includes the following updates:

Expand Down
10 changes: 6 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,22 +56,22 @@ dispatch:
```

### Database Incremental Strategies
Many of the end models in this package are materialized incrementally, so we have configured our models to work with the different strategies available to each supported warehouse.
Many of the models in this package are materialized incrementally, so we have configured our models to work with the different strategies available to each supported warehouse.

For **BigQuery** and **Databricks** destinations, we have chosen `insert_overwrite` as the default strategy, which benefits from the partitioning capability.
For **BigQuery** and **Databricks All Purpose Cluster runtime** destinations, we have chosen `insert_overwrite` as the default strategy, which benefits from the partitioning capability.
> For Databricks SQL Warehouse destinations, models are materialized as tables without support for incremental runs.

For **Snowflake**, **Redshift**, and **Postgres** databases, we have chosen `delete+insert` as the default strategy.

> Regardless of strategy, we recommend that users periodically run a `--full-refresh` to ensure a high level of data quality.

## Step 2: Install the package
Include the following mixpanel package version in your `packages.yml` file:
> TIP: Check [dbt Hub](https://hub.getdbt.com/) for the latest installation instructions or [read the dbt docs](https://docs.getdbt.com/docs/package-management) for more information on installing packages.

```yaml
packages:
- package: fivetran/mixpanel
version: [">=0.9.0", "<0.10.0"] # we recommend using ranges to capture non-breaking changes automatically
version: [">=0.10.0", "<0.11.0"] # we recommend using ranges to capture non-breaking changes automatically
```

## Step 3: Define database and schema variables
Expand All @@ -84,6 +84,7 @@ vars:
```

## (Optional) Step 4: Additional configurations
<details open><summary>Collapse/expand details</summary>

## Macros
### analyze_funnel [(source)](https://github.com/fivetran/dbt_mixpanel/blob/master/macros/analyze_funnel.sql)
Expand Down Expand Up @@ -239,6 +240,7 @@ Events are considered duplicates and consolidated by the package if they contain
* calendar date of occurrence (event timestamps are set in the timezone the Mixpanel project is configured to)

This is performed in line with Mixpanel's internal de-duplication process, in which events are de-duped at the end of each day. This means that if an event was triggered during an offline session at 11:59 PM and _resent_ when the user came online at 12:01 AM, these records would _not_ be de-duplicated. This is the case in both Mixpanel and the Mixpanel dbt package.
</details>

## (Optional) Step 5: Orchestrate your models with Fivetran Transformations for dbt Core™
<details><summary>Expand for details</summary>
Expand Down
2 changes: 1 addition & 1 deletion dbt_project.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
config-version: 2
name: 'mixpanel'
version: '0.9.0'
version: '0.10.0'
require-dbt-version: [">=1.3.0", "<2.0.0"]
models:
mixpanel:
Expand Down
2 changes: 1 addition & 1 deletion docs/catalog.json

Large diffs are not rendered by default.

Loading