Skip to content

Commit

Permalink
Employ base tables to resolve unioning-null issue (#25)
Browse files Browse the repository at this point in the history
* added to hubspot

* add tmp models to jira and zendesk

* try this out in bk

* forgot one zendesk model

* bk

* revert and document

* typo

* docs

* readme upgrade

* owner update

* levon's stuff

* coalesces

* include records without comments in jira and zendesk as well

* license

* Update CHANGELOG.md

Co-authored-by: Joe Markiewicz <[email protected]>

* Apply suggestions from code review

Co-authored-by: Avinash Kunnath <[email protected]>

* avinash feedback

---------

Co-authored-by: Joe Markiewicz <[email protected]>
Co-authored-by: Avinash Kunnath <[email protected]>
  • Loading branch information
3 people authored Mar 3, 2025
1 parent 59d8592 commit 11a445c
Show file tree
Hide file tree
Showing 50 changed files with 579 additions and 327 deletions.
29 changes: 29 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,32 @@
# dbt_unified_rag v0.1.0-a7
This release introduces the following updates that **require a full refresh**.

## Bug Fixes
- Fixed an issue in which [unioned](https://github.com/fivetran/dbt_unified_rag?tab=readme-ov-file#union-multiple-connections) source connections were producing null models. ([#25](https://github.com/fivetran/dbt_unified_rag/pull/25))
- The solution required the addition of a base staging model layer. For each staging model, there is a `*_base` counterpart in which we are running our `union_data` macro. This framework is necessary to the cooperation of our unioning and column-filling macros, which ensure the models do not fail if you are missing an expected column.
- For each connector type, this adds:
- **10 more models if Hubspot is enabled**
- **5 more models if Jira is enabled**
- **3 more models if Zendsk is enabled**
- Updated `stg_rag_hubspot__owner` to correctly find columns from the owner source. Previously, this erroneously looked at the columns from the HubSpot `contact` table. ([#23](https://github.com/fivetran/dbt_unified_rag/pull/23))

## Feature Updates
- Adjusted joins to persist records without any comments to each document model ([#25](https://github.com/fivetran/dbt_unified_rag/pull/25)). This may increase the volume of data in each model:
- `rag_hubspot__document`: HubSpot deals without comments are now included.
- `rag_jira__document`: Jira issues without comments are now included.
- `rag_zendesk__document`: Zendesk tickets without comments are now included.
- `rag__unified_document`: Includes all of the above.
- For each record without any comments, the `most_recent_chunk_update` and `update_date` fields will reflect the deal/issue/ticket creation date. The `chunk_index` and `chunk_tokens_approximate` fields will be `0`. ([#25](https://github.com/fivetran/dbt_unified_rag/pull/25))

## Under the Hood
- Added the `created_on` field to the following intermediate models to support the above inclusion of comment-less document records. ([#25](https://github.com/fivetran/dbt_unified_rag/pull/25))
- `int_rag_hubspot__deal_document`
- `int_rag_jira__issue_document`
- `int_rag_zendesk__ticket_document`

## Contributors
- [@levonkorganyan](https://github.com/JustMaris) ([#23](https://github.com/fivetran/dbt_unified_rag/pull/23))

# dbt_unified_rag v0.1.0-a6

## Bug Fixes (requires `--full-refresh`)
Expand Down
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,7 @@
same "printed page" as the copyright notice for easier
identification within third-party archives.

Copyright [yyyy] [name of copyright owner]
Copyright © 2025 Fivetran Inc.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,9 +33,9 @@ Each Quickstart transformation job run materializes the following model counts f

| **Connector** | **Model Count** |
| ------------- | --------------- |
| HubSpot | 11 |
| Jira | 6 |
| Zendesk | 4 |
| HubSpot | 21 |
| Jira | 11 |
| Zendesk | 7 |
| (Combined) | 1 |

<!--section-end-->
Expand All @@ -58,7 +58,7 @@ Include the following package_display_name package version in your `packages.yml
```yml
packages:
- package: fivetran/unified_rag
version: 0.1.0-a6
version: 0.1.0-a7
```
### Step 3: Define database and schema variables
Expand Down
2 changes: 1 addition & 1 deletion docs/catalog.json

Large diffs are not rendered by default.

274 changes: 211 additions & 63 deletions docs/index.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/manifest.json

Large diffs are not rendered by default.

10 changes: 5 additions & 5 deletions integration_tests/ci/sample.profiles.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,13 @@ integration_tests:
pass: "{{ env_var('CI_REDSHIFT_DBT_PASS') }}"
dbname: "{{ env_var('CI_REDSHIFT_DBT_DBNAME') }}"
port: 5439
schema: rag_integration_tests_5
schema: rag_integration_tests_04
threads: 8
bigquery:
type: bigquery
method: service-account-json
project: 'dbt-package-testing'
schema: rag_integration_tests_5
schema: rag_integration_tests_04
threads: 8
keyfile_json: "{{ env_var('GCLOUD_SERVICE_KEY') | as_native }}"
snowflake:
Expand All @@ -33,7 +33,7 @@ integration_tests:
role: "{{ env_var('CI_SNOWFLAKE_DBT_ROLE') }}"
database: "{{ env_var('CI_SNOWFLAKE_DBT_DATABASE') }}"
warehouse: "{{ env_var('CI_SNOWFLAKE_DBT_WAREHOUSE') }}"
schema: rag_integration_tests_5
schema: rag_integration_tests_04
threads: 8
postgres:
type: postgres
Expand All @@ -42,13 +42,13 @@ integration_tests:
pass: "{{ env_var('CI_POSTGRES_DBT_PASS') }}"
dbname: "{{ env_var('CI_POSTGRES_DBT_DBNAME') }}"
port: 5432
schema: rag_integration_tests_5
schema: rag_integration_tests_04
threads: 8
databricks:
catalog: "{{ env_var('CI_DATABRICKS_DBT_CATALOG') }}"
host: "{{ env_var('CI_DATABRICKS_DBT_HOST') }}"
http_path: "{{ env_var('CI_DATABRICKS_DBT_HTTP_PATH') }}"
schema: rag_integration_tests_5
schema: rag_integration_tests_04
threads: 2
token: "{{ env_var('CI_DATABRICKS_DBT_TOKEN') }}"
type: databricks
4 changes: 2 additions & 2 deletions integration_tests/dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ models:
+schema: "unified_rag_{{ var('directed_schema','dev') }}"

vars:
consistency_test_exclude_fields: ['title'] # for now
consistency_test_exclude_fields: []

rag_hubspot_schema: "rag_integration_tests_04"
rag_zendesk_schema: "rag_integration_tests_04"
rag_jira_schema: "rag_integration_tests_04"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@ engagement_markdown as (
title,
source_relation,
url_reference,
created_on,
cast( {{ dbt.concat([
"'Deal Name : '", "title", "'\\n\\n'",
"'Created By : '", "contact_name", "' ('", "created_by", "')\\n'",
Expand Down
1 change: 1 addition & 0 deletions models/intermediate/jira/int_rag_jira__issue_document.sql
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ final as (
title,
source_relation,
url_reference,
created_on,
{{ dbt.concat([
"'# issue : '", "title", "'\\n\\n'",
"'Created By : '", "user_name", "' ('", "created_by", "')\\n'",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ with tickets as (
title,
source_relation,
url_reference,
created_on,
{{ dbt.concat([
"'# Ticket : '", "title", "'\\n\\n'",
"'Created By : '", "user_name", "' ('", "created_by", "')\\n'",
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{{ config(enabled=var('rag__using_hubspot', True)) }}

{{
fivetran_utils.union_data(
table_identifier='company',
database_variable='rag_hubspot_database',
schema_variable='rag_hubspot_schema',
default_database=target.database,
default_schema='rag_hubspot',
default_variable='hubspot_company',
union_schema_variable='rag_hubspot_union_schemas',
union_database_variable='rag_hubspot_union_databases'
)
}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{{ config(enabled=var('rag__using_hubspot', True)) }}

{{
fivetran_utils.union_data(
table_identifier='contact',
database_variable='rag_hubspot_database',
schema_variable='rag_hubspot_schema',
default_database=target.database,
default_schema='rag_hubspot',
default_variable='hubspot_contact',
union_schema_variable='rag_hubspot_union_schemas',
union_database_variable='rag_hubspot_union_databases'
)
}}
14 changes: 14 additions & 0 deletions models/staging/hubspot_staging/base/stg_rag_hubspot__deal_base.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{{ config(enabled=var('rag__using_hubspot', True)) }}

{{
fivetran_utils.union_data(
table_identifier='deal',
database_variable='rag_hubspot_database',
schema_variable='rag_hubspot_schema',
default_database=target.database,
default_schema='rag_hubspot',
default_variable='hubspot_deal',
union_schema_variable='rag_hubspot_union_schemas',
union_database_variable='rag_hubspot_union_databases'
)
}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{{ config(enabled=var('rag__using_hubspot', True)) }}

{{
fivetran_utils.union_data(
table_identifier='engagement',
database_variable='rag_hubspot_database',
schema_variable='rag_hubspot_schema',
default_database=target.database,
default_schema='rag_hubspot',
default_variable='hubspot_engagement',
union_schema_variable='rag_hubspot_union_schemas',
union_database_variable='rag_hubspot_union_databases'
)
}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{{ config(enabled=var('rag__using_hubspot', True)) }}

{{
fivetran_utils.union_data(
table_identifier='engagement_company',
database_variable='rag_hubspot_database',
schema_variable='rag_hubspot_schema',
default_database=target.database,
default_schema='rag_hubspot',
default_variable='hubspot_engagement_company',
union_schema_variable='rag_hubspot_union_schemas',
union_database_variable='rag_hubspot_union_databases'
)
}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{{ config(enabled=var('rag__using_hubspot', True)) }}

{{
fivetran_utils.union_data(
table_identifier='engagement_contact',
database_variable='rag_hubspot_database',
schema_variable='rag_hubspot_schema',
default_database=target.database,
default_schema='rag_hubspot',
default_variable='hubspot_engagement_contact',
union_schema_variable='rag_hubspot_union_schemas',
union_database_variable='rag_hubspot_union_databases'
)
}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{{ config(enabled=var('rag__using_hubspot', True)) }}

{{
fivetran_utils.union_data(
table_identifier='engagement_deal',
database_variable='rag_hubspot_database',
schema_variable='rag_hubspot_schema',
default_database=target.database,
default_schema='rag_hubspot',
default_variable='hubspot_engagement_deal',
union_schema_variable='rag_hubspot_union_schemas',
union_database_variable='rag_hubspot_union_databases'
)
}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{{ config(enabled=var('rag__using_hubspot', True)) }}

{{
fivetran_utils.union_data(
table_identifier='engagement_email',
database_variable='rag_hubspot_database',
schema_variable='rag_hubspot_schema',
default_database=target.database,
default_schema='rag_hubspot',
default_variable='hubspot_engagement_email',
union_schema_variable='rag_hubspot_union_schemas',
union_database_variable='rag_hubspot_union_databases'
)
}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{{ config(enabled=var('rag__using_hubspot', True)) }}

{{
fivetran_utils.union_data(
table_identifier='engagement_note',
database_variable='rag_hubspot_database',
schema_variable='rag_hubspot_schema',
default_database=target.database,
default_schema='rag_hubspot',
default_variable='hubspot_engagement_note',
union_schema_variable='rag_hubspot_union_schemas',
union_database_variable='rag_hubspot_union_databases'
)
}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{{ config(enabled=var('rag__using_hubspot', True)) }}

{{
fivetran_utils.union_data(
table_identifier='owner',
database_variable='rag_hubspot_database',
schema_variable='rag_hubspot_schema',
default_database=target.database,
default_schema='rag_hubspot',
default_variable='hubspot_owner',
union_schema_variable='rag_hubspot_union_schemas',
union_database_variable='rag_hubspot_union_databases'
)
}}
16 changes: 3 additions & 13 deletions models/staging/hubspot_staging/stg_rag_hubspot__company.sql
Original file line number Diff line number Diff line change
Expand Up @@ -2,26 +2,16 @@

with base as (

{{
fivetran_utils.union_data(
table_identifier='company',
database_variable='rag_hubspot_database',
schema_variable='rag_hubspot_schema',
default_database=target.database,
default_schema='rag_hubspot',
default_variable='hubspot_company',
union_schema_variable='rag_hubspot_union_schemas',
union_database_variable='rag_hubspot_union_databases'
)
}}
select *
from {{ ref('stg_rag_hubspot__company_base') }}
),

fields as (

select
{{
fivetran_utils.fill_staging_columns(
source_columns=adapter.get_columns_in_relation(source('rag_hubspot','company')),
source_columns=adapter.get_columns_in_relation(ref('stg_rag_hubspot__company_base')),
staging_columns=get_hubspot_company_columns()
)
}}
Expand Down
18 changes: 4 additions & 14 deletions models/staging/hubspot_staging/stg_rag_hubspot__contact.sql
Original file line number Diff line number Diff line change
@@ -1,27 +1,17 @@
{{ config(enabled=var('rag__using_hubspot', True)) }}

with base as (

{{
fivetran_utils.union_data(
table_identifier='contact',
database_variable='rag_hubspot_database',
schema_variable='rag_hubspot_schema',
default_database=target.database,
default_schema='rag_hubspot',
default_variable='hubspot_contact',
union_schema_variable='rag_hubspot_union_schemas',
union_database_variable='rag_hubspot_union_databases'
)
}}

select *
from {{ ref('stg_rag_hubspot__contact_base') }}
),

fields as (

select
{{
fivetran_utils.fill_staging_columns(
source_columns=adapter.get_columns_in_relation(source('rag_hubspot','contact')),
source_columns=adapter.get_columns_in_relation(ref('stg_rag_hubspot__contact_base')),
staging_columns=get_hubspot_contact_columns()
)
}}
Expand Down
16 changes: 3 additions & 13 deletions models/staging/hubspot_staging/stg_rag_hubspot__deal.sql
Original file line number Diff line number Diff line change
Expand Up @@ -2,26 +2,16 @@

with base as (

{{
fivetran_utils.union_data(
table_identifier='deal',
database_variable='rag_hubspot_database',
schema_variable='rag_hubspot_schema',
default_database=target.database,
default_schema='rag_hubspot',
default_variable='hubspot_deal',
union_schema_variable='rag_hubspot_union_schemas',
union_database_variable='rag_hubspot_union_databases'
)
}}
select *
from {{ ref('stg_rag_hubspot__deal_base') }}
),

fields as (

select
{{
fivetran_utils.fill_staging_columns(
source_columns=adapter.get_columns_in_relation(source('rag_hubspot','deal')),
source_columns=adapter.get_columns_in_relation(ref('stg_rag_hubspot__deal_base')),
staging_columns=get_hubspot_deal_columns()
)
}}
Expand Down
Loading

0 comments on commit 11a445c

Please sign in to comment.