Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/add union data #133

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
43 changes: 11 additions & 32 deletions .github/PULL_REQUEST_TEMPLATE/maintainer_pull_request_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,48 +4,27 @@
**This PR will result in the following new package version:**
<!--- Please add details around your decision for breaking vs non-breaking version upgrade. If this is a breaking change, were backwards-compatible options explored? -->

**Please detail what change(s) this PR introduces and any additional information that should be known during the review of this PR:**
**Please provide the finalized CHANGELOG entry which details the relevant changes included in this PR:**
<!--- Copy/paste the CHANGELOG for this version below. -->

## PR Checklist
### Basic Validation
Please acknowledge that you have successfully performed the following commands locally:
- [ ] dbt compile
- [ ] dbt run –full-refresh
- [ ] dbt run
- [ ] dbt test
- [ ] dbt run –vars (if applicable)
- [ ] dbt run –full-refresh && dbt test
- [ ] dbt run (if incremental models are present) && dbt test

Before marking this PR as "ready for review" the following have been applied:
- [ ] The appropriate issue has been linked and tagged
- [ ] You are assigned to the corresponding issue and this PR
- [ ] The appropriate issue has been linked, tagged, and properly assigned
- [ ] All necessary documentation and version upgrades have been applied
<!--- Be sure to update the package version in the dbt_project.yml, integration_tests/dbt_project.yml, and README if necessary. -->
- [ ] docs were regenerated (unless this PR does not include any code or yml updates)
- [ ] BuildKite integration tests are passing
- [ ] Detailed validation steps have been provided below

### Detailed Validation
Please acknowledge that the following validation checks have been performed prior to marking this PR as "ready for review":
- [ ] You have validated these changes and assure this PR will address the respective Issue/Feature.
- [ ] You are reasonably confident these changes will not impact any other components of this package or any dependent packages.
- [ ] You have provided details below around the validation steps performed to gain confidence in these changes.
Please share any and all of your validation steps:
<!--- Provide the steps you took to validate your changes below. -->

### Standard Updates
Please acknowledge that your PR contains the following standard updates:
- Package versioning has been appropriately indexed in the following locations:
- [ ] indexed within dbt_project.yml
- [ ] indexed within integration_tests/dbt_project.yml
- [ ] CHANGELOG has individual entries for each respective change in this PR
<!--- If there is a parallel upstream change, remember to reference the corresponding CHANGELOG as an individual entry. -->
- [ ] README updates have been applied (if applicable)
<!--- Remember to check the following README locations for common updates. →
<!--- Suggested install range (needed for breaking changes) →
<!--- Dependency matrix is appropriately updated (if applicable) →
<!--- New variable documentation (if applicable) -->
- [ ] DECISIONLOG updates have been updated (if applicable)
- [ ] Appropriate yml documentation has been added (if applicable)

### dbt Docs
Please acknowledge that after the above were all completed the below were applied to your branch:
- [ ] docs were regenerated (unless this PR does not include any code or yml updates)

### If you had to summarize this PR in an emoji, which would it be?
<!--- For a complete list of markdown compatible emojis check our this git repo (https://gist.github.com/rxaviers/7360908) -->
:dancer:
:dancer:
35 changes: 0 additions & 35 deletions .github/workflows/check_docs.yml

This file was deleted.

15 changes: 15 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,18 @@
# dbt_hubspot v0.16.0

## 🎉 Feature Update 🎉
- This release supports running the package on multiple Hubspot sources at once! See the [README](https://github.com/fivetran/dbt_hubspot?tab=readme-ov-file#step-3-define-database-and-schema-variables) for details on how to leverage this feature ([PR #133](https://github.com/fivetran/dbt_hubspot/pull/133)).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is also very relevant to include here that customers will not also see a new source_relation field in their end models.

Additionally, we should flag that customers using the ticket models will need to run a full refresh to capture the new schema change.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added


> Please note: For users running the `hubspot__daily_ticket_history` model, this is a **🚨Breaking Change🚨** in that we have a added a new field, `source_relation`, that points to the source connector from which the record originated. This field is added to all _all_ end models but will require a `dbt run --full-refresh` for `+hubspot__daily_ticket_history` specifically.

## 📝 Documentation 📝
- Included documentation around dbt Core™ versions and disabling source freshness tests. Previously, we included [this](https://github.com/fivetran/dbt_hubspot_source?tab=readme-ov-file#dbt-core-version-requirement-for-disabling-freshness-tests) in the source package README but had missed the transform README ([PR #133](https://github.com/fivetran/dbt_hubspot/pull/133)).
- Tweaked the description of `hubspot__deal_changes` to better reflect the grain of the model ([PR #132](https://github.com/fivetran/dbt_hubspot/pull/132)).

## 🛠️ Under the Hood 🛠️
- Updated the maintainer PR template to resemble the most up to date format.
- Removed GitHub workflow that checks if docs are re-built prior to merges to `main`. We will opt for a more seamless automation later this year.

# dbt_hubspot v0.15.1
[PR #129](https://github.com/fivetran/dbt_hubspot/pull/129) includes the following updates:

Expand Down
51 changes: 48 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ The following table provides a detailed list of all models materialized within t
| [hubspot__contact_history](https://fivetran.github.io/dbt_hubspot/#!/model/model.hubspot.hubspot__contact_history) | Each record represents a change to a contact in Hubspot, with `valid_to` and `valid_from` information. |
| [hubspot__contact_lists](https://fivetran.github.io/dbt_hubspot/#!/model/model.hubspot.hubspot__contact_lists) | Each record represents a contact list in Hubspot, enriched with metrics about email activities. |
| [hubspot__deals](https://fivetran.github.io/dbt_hubspot/#!/model/model.hubspot.hubspot__deals) | Each record represents a deal in Hubspot, enriched with metrics about engagement activities. |
| [hubspot__deal_stages](https://fivetran.github.io/dbt_hubspot/#!/model/model.hubspot.hubspot__deal_stages) | Each record represents a deal stage in Hubspot, enriched with metrics deal activities. |
| [hubspot__deal_stages](https://fivetran.github.io/dbt_hubspot/#!/model/model.hubspot.hubspot__deal_stages) | Each record represents when a deal stage changes in Hubspot, enriched with metrics about deal activities. |
| [hubspot__deal_history](https://fivetran.github.io/dbt_hubspot/#!/model/model.hubspot.hubspot__deal_history) | Each record represents a change to a deal in Hubspot, with `valid_to` and `valid_from` information. |
| [hubspot__tickets](https://fivetran.github.io/dbt_hubspot/#!/model/model.hubspot.hubspot__tickets) | Each record represents a ticket in Hubspot, enriched with metrics about engagement activities and information on associated deals, contacts, companies, and owners. |
| [hubspot__daily_ticket_history](https://fivetran.github.io/dbt_hubspot/#!/model/model.hubspot.hubspot__daily_ticket_history) | Each record represents a ticket's day in Hubspot with tracked properties pivoted out into columns. |
Expand Down Expand Up @@ -73,7 +73,7 @@ Include the following hubspot package version in your `packages.yml` file:
```yaml
packages:
- package: fivetran/hubspot
version: [">=0.15.0", "<0.16.0"] # we recommend using ranges to capture non-breaking changes automatically
version: [">=0.16.0", "<0.17.0"] # we recommend using ranges to capture non-breaking changes automatically

```
Do **NOT** include the `hubspot_source` package in this file. The transformation package itself has a dependency on it and will install the source package as well.
Expand All @@ -87,15 +87,57 @@ dispatch:
```

## Step 3: Define database and schema variables
### Option 1: Single connector 💃
By default, this package runs using your destination and the `hubspot` schema. If this is not where your hubspot data is (for example, if your hubspot schema is named `hubspot_fivetran`), add the following configuration to your root `dbt_project.yml` file:

```yml
vars:
hubspot_database: your_destination_name
hubspot_schema: your_schema_name
```
> **Note**: If you are running the package on one source connector, each model will have a `source_relation` column that is just an empty string.

### Option 2: Union multiple connectors 👯
If you have multiple Hubspot connectors in Fivetran and would like to use this package on all of them simultaneously, we have provided functionality to do so. The package will union all of the data together and pass the unioned table into the transformations. You will be able to see which source it came from in the `source_relation` column of each model. To use this functionality, you will need to set either the `hubspot_union_schemas` OR `hubspot_union_databases` variables (cannot do both, though a more flexible approach is in the works...) in your root `dbt_project.yml` file:

```yml
# dbt_project.yml

vars:
hubspot_union_schemas: ['hubspot_usa','hubspot_canada'] # use this if the data is in different schemas/datasets of the same database/project
hubspot_union_databases: ['hubspot_usa','hubspot_canada'] # use this if the data is in different databases/projects but uses the same schema name
```

#### Recommended: Incorporate unioned sources into DAG
By default, this package defines one single-connector source, called `hubspot`, which will be disabled if you are unioning multiple connectors. This means that your DAG will not include your Hubspot sources, though the package will run successfully.

To properly incorporate all of your Hubspot connectors into your project's DAG:
1. Define each of your sources in a `.yml` file in your project. Utilize the following template for the `source`-level configurations, and, **most importantly**, copy and paste the table and column-level definitions from the package's `src_hubspot.yml` [file](https://github.com/fivetran/dbt_hubspot_source/blob/main/models/src_hubspot.yml#L9-L1313).

```yml
# a .yml file in your root project
sources:
- name: <name> # ex: hubspot_usa
schema: <schema_name> # one of var('hubspot_union_schemas') if unioning schemas, otherwise just 'hubspot'
database: <database_name> # one of var('hubspot_union_databases') if unioning databases, otherwise whatever DB your hubspot schemas all live in
loader: Fivetran
loaded_at_field: _fivetran_synced
tables: # copy and paste from hubspot_source/models/src_hubspot.yml
```

> **Note**: If there are source tables you do not have (see [Step 4](https://github.com/fivetran/dbt_hubspot?tab=readme-ov-file#step-4-disable-models-for-non-existent-sources)), you may still include them here, as long as you have set the right variables to `False`. Otherwise, you may remove them from your source definition.

2. Set the `has_defined_sources` variable (scoped to the `hubspot_source` package) to `True`, like such:
```yml
# dbt_project.yml
vars:
hubspot_source:
has_defined_sources: true
```

## Step 4: Disable models for non-existent sources
> _This step is unnecessary (but still available for use) if you are unioning multiple connectors together in the previous step. That is, the `union_data` macro we use will create completely empty staging models for sources that are not found in any of your Hubspot schemas/databases. However, you can still leverage the below variables if you would like to avoid this behavior._

When setting up your Hubspot connection in Fivetran, it is possible that not every table this package expects will be synced. This can occur because you either don't use that functionality in Hubspot or have actively decided to not sync some tables. In order to disable the relevant functionality in the package, you will need to add the relevant variables. By default, all variables are assumed to be `true` (with exception of `hubspot_service_enabled`, `hubspot_ticket_deal_enabled`, and `hubspot_contact_merge_audit_enabled`). You only need to add variables within your root `dbt_project.yml` for the tables you would like to disable or enable respectively:

```yml
Expand Down Expand Up @@ -150,7 +192,9 @@ vars:
hubspot_service_enabled: true # Enables all service/ticket models. Default = false
hubspot_ticket_deal_enabled: true # Default = false
```

## (Optional) Step 5: Additional configurations
<details open><summary>Expand/collapse configurations</summary>

### Configure email metrics
This package allows you to specify which email metrics (total count and total unique count) you would like to be calculated for specified fields within the `hubspot__email_campaigns` model. By default, the `email_metrics` variable below includes all the shown fields. If you would like to remove any field metrics from the final model, you may copy and paste the below snippet within your root `dbt_project.yml` and remove any fields you want to be ignored in the final model.
Expand Down Expand Up @@ -299,6 +343,7 @@ If an individual source table has a different name than the package expects, add
vars:
hubspot_<default_source_table_name>_identifier: your_table_name
```
</details>

## (Optional) Step 6: Orchestrate your models with Fivetran Transformations for dbt Core™
<details><summary>Expand for details</summary>
Expand All @@ -314,7 +359,7 @@ This dbt package is dependent on the following dbt packages. Please be aware tha
```yml
packages:
- package: fivetran/hubspot_source
version: [">=0.14.0", "<0.15.0"]
version: [">=0.15.0", "<0.16.0"]

- package: fivetran/fivetran_utils
version: [">=0.4.0", "<0.5.0"]
Expand Down
2 changes: 1 addition & 1 deletion dbt_project.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name: 'hubspot'
version: '0.15.1'
version: '0.16.0'

config-version: 2
require-dbt-version: [">=1.3.0", "<2.0.0"]
Expand Down
2 changes: 1 addition & 1 deletion docs/manifest.json

Large diffs are not rendered by default.

4 changes: 3 additions & 1 deletion macros/email_events_joined.sql
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,8 @@ with base as (
events.sent_by_event_id as email_send_id
from base
left join events
using (event_id)
on base.event_id = events.event_id
and base.source_relation = events.source_relation

), contacts_joined as (

Expand All @@ -37,6 +38,7 @@ with base as (
from events_joined
left join contacts
on events_joined.recipient_email_address = contacts.email
and events_joined.source_relation = contacts.source_relation

)

Expand Down
3 changes: 2 additions & 1 deletion macros/engagements_aggregated.sql
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

select
{{ primary_key }},
source_relation,
count(case when engagement_type = 'NOTE' then {{ primary_key }} end) as count_engagement_notes,
count(case when engagement_type = 'TASK' then {{ primary_key }} end) as count_engagement_tasks,
count(case when engagement_type = 'CALL' then {{ primary_key }} end) as count_engagement_calls,
Expand All @@ -10,7 +11,7 @@
count(case when engagement_type = 'INCOMING_EMAIL' then {{ primary_key }} end) as count_engagement_incoming_emails,
count(case when engagement_type = 'FORWARDED_EMAIL' then {{ primary_key }} end) as count_engagement_forwarded_emails
from {{ from_ref }}
group by 1
group by 1,2

{% endmacro %}

Expand Down
1 change: 1 addition & 0 deletions macros/engagements_joined.sql
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ with base as (
from base
left join engagements
on base.engagement_id = engagements.engagement_id
and base.source_relation = engagements.source_relation

)

Expand Down
20 changes: 14 additions & 6 deletions macros/merge_contacts.sql
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@
{# bigquery #}
select
contacts.contact_id,
split(merges, ':')[offset(0)] as vid_to_merge
split(merges, ':')[offset(0)] as vid_to_merge,
contacts.source_relation

from contacts
cross join
Expand All @@ -19,7 +20,8 @@
{% macro snowflake__merge_contacts() %}
select
contacts.contact_id,
split_part(merges.value, ':', 0) as vid_to_merge
split_part(merges.value, ':', 0) as vid_to_merge,
contacts.source_relation

from contacts
cross join
Expand All @@ -30,11 +32,14 @@
{% macro redshift__merge_contacts() %}
select
unnest_vid_array.contact_id,
split_part(cast(vid_to_merge as {{ dbt.type_string() }}) ,':',1) as vid_to_merge
split_part(cast(vid_to_merge as {{ dbt.type_string() }}) ,':',1) as vid_to_merge,
source_relation

from (
select
contacts.contact_id,
split_to_array(calculated_merged_vids, ';') as super_calculated_merged_vids
split_to_array(calculated_merged_vids, ';') as super_calculated_merged_vids,
contacts.source_relation
from contacts
) as unnest_vid_array, unnest_vid_array.super_calculated_merged_vids as vid_to_merge

Expand All @@ -43,7 +48,8 @@
{% macro postgres__merge_contacts() %}
select
contacts.contact_id,
split_part(merges, ':', 1) as vid_to_merge
split_part(merges, ':', 1) as vid_to_merge,
contacts.source_relation

from contacts
cross join
Expand All @@ -55,7 +61,9 @@
{# databricks and spark #}
select
contacts.contact_id,
split_part(merges, ':', 1) as vid_to_merge
split_part(merges, ':', 1) as vid_to_merge,
contacts.source_relation

from contacts
cross join (
select
Expand Down
Loading