-
Notifications
You must be signed in to change notification settings - Fork 224
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Contracts v3: * Added spark session support * Made all sql properties consistently end with ..._sql * Introduced warehouse as terminology * Made identity hash instead of long structured name * Added support quoting * Added check level filter_sql support
- Loading branch information
1 parent
76159ca
commit 062b1e2
Showing
58 changed files
with
3,093 additions
and
3,764 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,15 +1,56 @@ | ||
# Contract check identities | ||
|
||
TODO explain a bit better: | ||
### From the user perspective | ||
|
||
Every check in a contract has an identity. | ||
Check identity is used to correlate checks in files with Soda Cloud. | ||
|
||
* why uniqueness required | ||
* relation to schedule | ||
* used in correlation for sodacl | ||
* used in correlation on soda cloud | ||
In contracts, we want to change the user interface regarding identities. | ||
|
||
format: | ||
`//{schedule}/{dataset}/{column}/{check_type}/{identity_suffix}` | ||
The contracts parser ensures that all checks in a contract must have a unique identity. | ||
An error will be created if there are multiple checks with the same identity. An identity | ||
will be automatically generated based on a few check properties including the name. If two | ||
checks are not unique, users must use the name property to ensure uniqueness. | ||
|
||
One-way: A check identity is a composite key. But we don't expect we ever need to decompose a check identity into it's parts. | ||
> IMPORTANT! All this means that users have to be aware of the Soda Cloud correlation impact when they | ||
> change the name! Changing the name will also change the identity and hence will get a new check and | ||
> check history on Soda Cloud.In the future we envision a mechanism for renaming a check without loosing | ||
> the history by introducing a `name_was` property on a check. When users want to change the name, they | ||
> will have to rename the existing `name` property to `name_was` and create a new `name` property with | ||
> the new name. | ||
Checks automatically generate a unique identity if you have max 1 check in each scope. | ||
A scope is defined by | ||
* warehouse | ||
* schema | ||
* dataset | ||
* column | ||
* check type | ||
|
||
So as long as you have only one check type in the same list of checks in the YAML, you're good. | ||
|
||
In case of dataset checks like `metric_query` or `metric_expression`, it might be likely that | ||
there are multiple checks with the same check type. To keep those unique, a `name` is mandatory. | ||
|
||
### Implementation docs | ||
|
||
The contract check identity will be a consistent hash (soda/contracts/soda/contracts/impl/consistent_hash_builder.py) based on: | ||
|
||
For schema checks: | ||
* warehouse | ||
* schema | ||
* dataset | ||
* check type (=schema) | ||
|
||
For all other checks: | ||
* warehouse | ||
* schema | ||
* dataset | ||
* column | ||
* check type | ||
* check name | ||
|
||
The check identity will be used as explicit `identity` in the generated SodaCL | ||
|
||
Soda Core is updated so that it will pass the identity back as the property `source_identity` in the scan results. | ||
The `source_identity` property in the scan results will also be used to correlate the Soda scan check results with | ||
the contract checks for reporting and logging the results. |
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.