align definition of duplicate with its behavior #950
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
align definition of duplicate_chck duplicate_percent with its behavior
"The behavior of duplicate checks is a bit unintuitive and doesn't match our docs, and I am wondering if this is a bug. The value of a duplicate count check is equal to the count of distinct values that have duplicates. But in the docs it says a duplicate_count is "The number of rows that contain duplicate values". There is a similar issue for duplicate percent checks, for which the value is the count of distinct values that have duplicates divided by the total number of rows, but the docs say duplicate_percent is "The percentage of rows in a dataset that contain duplicate values"."