Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug/total queries #140

Merged
merged 7 commits into from
Dec 18, 2024
Merged

Bug/total queries #140

merged 7 commits into from
Dec 18, 2024

Conversation

fivetran-catfritz
Copy link
Contributor

@fivetran-catfritz fivetran-catfritz commented Dec 12, 2024

PR Overview

This PR will address the following Issue/Feature:

  • Internal ticket

This PR will result in the following new package version:

  • 1.10.0

Please provide the finalized CHANGELOG entry which details the relevant changes included in this PR:

Breaking Changes

A --full-refresh is recommended after upgrading to ensure historical records in incremental models are refreshed.

  • Updated the fivetran_log_json_parse macro for Redshift to return NULL instead of an empty string when a JSON path is not found. This resolves errors caused by casting empty strings to integers in Redshift.
  • Standardized the message_data field from the LOG source, in which JSON key names can appear in both camelCase (e.g., {"totalQueries":5}) and snake_case (e.g., {"total_queries":5}) formats, depending on the Fivetran connector version. The fivetran_platform__audit_table and fivetran_platform__connector_daily_events models now convert all key names to snake_case for consistency.
  • These changes are considered breaking because the standardization of key names (e.g., totalQueries to total_queries) may impact downstream reporting by including previously ignored values.

Under the Hood (Maintainers Only)

  • Enhanced seed data for integration testing to include the different spellings and ensure compatibility with Redshift.

PR Checklist

Basic Validation

Please acknowledge that you have successfully performed the following commands locally:

  • dbt run –full-refresh && dbt test
  • dbt run (if incremental models are present) && dbt test

Before marking this PR as "ready for review" the following have been applied:

  • The appropriate issue has been linked, tagged, and properly assigned
  • All necessary documentation and version upgrades have been applied
  • docs were regenerated (unless this PR does not include any code or yml updates)
  • BuildKite integration tests are passing
    • SQL Server credential issue
  • Detailed validation steps have been provided below

Detailed Validation

Please share any and all of your validation steps:

  • Updated the seed to have both spellings of total_queries and totalQueries.

    • Screenshot 2024-12-12 at 7 56 05 PM
  • I was able to reproduce the error using the updated seed data with the production version of the package.

    • Screenshot 2024-12-12 at 12 08 22 PM
  • The result shows both spellings produce totals.

    • Screenshot 2024-12-12 at 8 11 56 PM
  • Consistency and integrity tests pass

    • Screenshot 2024-12-13 at 2 34 42 PM

If you had to summarize this PR in an emoji, which would it be?

💃

@fivetran-catfritz fivetran-catfritz self-assigned this Dec 12, 2024
Copy link
Contributor

@fivetran-avinash fivetran-avinash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-catfritz Great work! Conducted all the validation and it looks good on my end. Approved!

Copy link
Contributor

@fivetran-joemarkiewicz fivetran-joemarkiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-catfritz this looks good to me, I just have one quick approach related question before approving.

@@ -14,7 +14,7 @@ log_events as (
connector_id,
cast( {{ dbt.date_trunc('day', 'created_at') }} as date) as date_day,
event_subtype,
message_data
replace(message_data, 'totalQueries', 'total_queries') as message_data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach question, was there a reason we do these replace statements directly in the end models as opposed to the upstream staging model where we could do both in one place? Is the reason so we don't have nested replace statements (replace(replace(message_data, 'totalQueries', 'total_queries'),message_data, 'operationType', 'operation_type') which could be hard to maintain.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. My reason was thinking a staging model was a view, so downstream models might be needlessly performing this operation. However, I double checked and staging are tables not views. In light of that--should I move to the staging model or leave as-is do you think? @fivetran-joemarkiewicz

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good thought in not wanting downstream queries to be needlessly performing these operations if the staging modes are views. Truthfully, we should have the staging models be views instead of tables. Instead of making the above change, let's keep your approach. In addition, can you create a FR for us to update the materialization of the staging models in a future breaking change update. Since we want to get this PR out as soon as possible I would prefer we don't include that update now, but it will be good for us to consider in an upcoming release.

@@ -37,7 +41,7 @@ sync_log as (
*,
{{ fivetran_log.fivetran_log_json_parse(string='message_data', string_path=['table']) }} as table_name,
{{ fivetran_log.fivetran_log_json_parse(string='message_data', string_path=['schema']) }} as schema_name,
{{ fivetran_log.fivetran_log_json_parse(string='message_data', string_path=['operationType']) }} as operation_type,
{{ fivetran_log.fivetran_log_json_parse(string='message_data', string_path=['operation_type']) }} as operation_type,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch here!

Copy link
Contributor

@fivetran-joemarkiewicz fivetran-joemarkiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with request to create a FR for future materialization update.

@fivetran-catfritz fivetran-catfritz merged commit a4851bb into main Dec 18, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants