-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Test for uniqueness passes when duplicates are present in BigQuery column #11067
Comments
Thanks for reaching out @steffan-roughsedge ! I haven't been able to reproduce this myself. Could you provide project files similar to below to help us reproduce the issue you are seeing? The original issue (#2061) and draft PR (#2075) got closed as stale, but we'd review a new pull request if anyone is interested in solving this for dbt-bigquery for both The relevant default adapter code for each of the current implementations is located here:
Depending on the details of the solution, we'd either choose to adopt it as the new default for all adapters or just provided targeted solution for dbt-bigquery. Either way, any implementation should include relevant tests that include the scenario described above with the model name and column sharing the same name. Project filesCreate these files:
select struct(1 as id) as orders
models:
- name: orders
columns:
- name: orders
tests:
- unique
- not_null Run these commands:
Get this output:
|
The same underlying behavior in BigQuery might be the underlying root cause of all of the following reports:
|
Is this a new bug in dbt-core?
Current Behavior
When running the out-of-the-box 'unique' test the test will pass even when there are duplicates in the column if:
Expected Behavior
The test should fail if the there are duplicates in the column being tested
Steps To Reproduce
Create table called
my-project
.my-dataset
.orders
with a single column named 'orders'. Insert duplicate values into the 'orders' column. Run the DBT 'unique' test against this column. The test passes.Relevant log output
No response
Environment
Which database adapter are you using with dbt?
bigquery
Additional Context
Looking at the generated code (shown below) for the test the issue occurs because the table in the first CTE is not aliased, therefore the subsequent GROUP BY in the main SELECT is grouping by the whole table.
A similar issue was previously logged here, however it looks like it was closed and the issue I am experiencing still remains.
The text was updated successfully, but these errors were encountered: