Skip to content

Commit

Permalink
New Docs Deployment for Mon Jun 10 22:01:39 UTC 2024
Browse files Browse the repository at this point in the history
  • Loading branch information
actions-user committed Jun 10, 2024
1 parent a9401df commit 6bde552
Show file tree
Hide file tree
Showing 135 changed files with 2,977 additions and 677 deletions.
61 changes: 41 additions & 20 deletions docs/guide/data-quality/checks.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,15 @@ Let's write and modify a check, starting with one of the sdf samples.
sdf new --sample hello_with_pii && cd hello_with_pii
```

```shell shell
Created hello_with_pii/.gitignore
Created hello_with_pii/checks/code_check.sql
Created hello_with_pii/src/main.sql
Created hello_with_pii/workspace.sdf.yml
Finished new in 0.292 secs

```

This sample workspace contains a model, and a classifier, and a check.

</Step>
Expand Down Expand Up @@ -50,17 +59,13 @@ Let's write and modify a check, starting with one of the sdf samples.

Hurray! The check passes.

``` shell
Created hello_with_pii/.gitignore
Created hello_with_pii/checks/code_check.sql
Created hello_with_pii/src/main.sql
Created hello_with_pii/workspace.sdf.yml
Finished new in 0.025 secs

```shell shell
Working set 1 model file, 1 .sdf file
Compiling hello.pub.main (./src/main.sql)
Working set 1 test file, 1 .sdf file
Testing hello.pub.code_check (./checks/code_check.sql)
Finished 1 model [1 succeeded], 1 check [1 passed] in 0.439 secs
Finished 1 model [1 succeeded], 1 check [1 passed] in 1.156 secs
[Pass] Check hello.pub.code_check
```
Expand All @@ -70,13 +75,13 @@ Working set 1 test file, 1 .sdf file

```sql
SELECT
DISTINCT c.table_name as "table_name",
DISTINCT c.table_name as "table name",
c.column_name as "column name",
c.classifiers
FROM
sdf.information_schema.columns c
WHERE
c.classifiers like '%PII.name%'
CONTAINS_ARRAY_VARCHAR(c.classifiers, 'PII.name')
```

It queries for any column that contains PII.name and asserts that this query should be empty. But we never added the classifier in the first place!
Expand All @@ -94,23 +99,24 @@ Working set 1 test file, 1 .sdf file
```
Running `sdf check` again will **result in a failed test**. The output will look like this:

``` shell

```shell shell
Working set 1 model file, 1 .sdf file
Compiling hello.pub.main (./src/main.sql)
Working set 1 test file, 1 .sdf file
Testing hello.pub.code_check (./checks/code_check.sql)
Finished 1 model [1 succeeded], 1 check [1 failed] in 0.425 secs, for details see below.
Finished 1 model [1 succeeded], 1 check [1 failed] in 1.164 secs, for details see below.
[Fail] Check hello.pub.code_check
+----------------+-------------+-------------+
| table_name | column name | classifiers |
+----------------+-------------+-------------+
| hello.pub.main | column_2 | PII.name |
+----------------+-------------+-------------+
+------------+-------------+-------------+
| table_name | column name | classifiers |
+------------+-------------+-------------+
| main | column_2 | [PII.name] |
+------------+-------------+-------------+
1 rows.
-------
Summary 1 model [1 succeeded], 1 check [1 failed] in 0.425 secs.
Summary 1 model [1 succeeded], 1 check [1 failed] in 1.164 secs.
-------
```
Expand Down Expand Up @@ -151,13 +157,28 @@ Let's fix this failing check! We can apply an `md5()` hash to column_2 (the name
c.column_name as "column name",
c.classifiers
FROM
columns c
sdf.information_schema.columns c
WHERE
c.classifiers like '%PII.name%'
and c.table_name like '%.sink'
CONTAINS_ARRAY_VARCHAR(c.classifiers, 'PII.name')
AND c.table_name like '%.sink'
```

Run `sdf check` again and voila! All checks pass. The source table has PII.name, the sink table does not.




```shell shell
Working set 2 model files, 1 .sdf file
Compiling hello.pub.main (./src/main.sql)
Compiling hello.pub.sink (./src/sink.sql)
Working set 1 test file, 1 .sdf file
Testing hello.pub.code_check (./checks/code_check.sql)
Finished 2 models [2 succeeded], 1 check [1 passed] in 1.147 secs
[Pass] Check hello.pub.code_check
```

</Step>
</Steps>

Expand Down
58 changes: 29 additions & 29 deletions docs/guide/data-quality/reports.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ Let's write and modify a report, starting with one of the sdf samples.
sdf new --sample pii_saas_platform
```

``` shell
```shell shell
Created pii_saas_platform/.gitignore
Created pii_saas_platform/checks/no_pii_in_external.sql
Created pii_saas_platform/classification/taxonomy.sdf.yml
Expand All @@ -41,7 +41,7 @@ Let's write and modify a report, starting with one of the sdf samples.
Created pii_saas_platform/src/internal/users_per_domain.sql
Created pii_saas_platform/src/internal/users_per_org.sql
Created pii_saas_platform/workspace.sdf.yml
Finished new in 0.027 secs
Finished new in 0.291 secs

```
This sample contains a report and a workspace modeled after a simple SAAS platform.
Expand All @@ -60,14 +60,14 @@ Let's write and modify a report, starting with one of the sdf samples.
The report itself lists all tables with PII by fetching all columns with classifiers that have `PII`. This is done with the following SQL:
```sql
SELECT
t.table_ref,
t.table_id,
t.description,
t.dialect
FROM
sdf.information_schema.tables t
JOIN
sdf.information_schema.columns c ON t.table_ref = c.table_ref
WHERE c.classifiers LIKE '%PII%'
sdf.information_schema.columns c ON t.table_id = c.table_id
WHERE CONTAINS_ARRAY_VARCHAR(c.classifiers, 'PII')
GROUP BY 1,2,3;
```

Expand Down Expand Up @@ -96,31 +96,31 @@ Let's write and modify a report, starting with one of the sdf samples.
sdf report --show all
```

``` shell
```shell shell
Working set 12 model files, 3 .sdf files
Compiling payment.public.invoices (./ddls/payment/public/invoices.sql)
Compiling payment.public.users (./ddls/payment/public/users.sql)
Compiling payment.public.organizations (./ddls/payment/public/organizations.sql)
Compiling transformations.internal.avg_invoice_amt (./src/internal/avg_invoice_amt.sql)
Compiling transformations.internal.total_revenue_per_org (./src/internal/total_revenue_per_org.sql)
Compiling transformations.internal.invoice_payment_delay (./src/internal/invoice_payment_delay.sql)
Compiling transformations.internal.most_frequent_payer (./src/internal/most_frequent_payer.sql)
Compiling transformations.internal.total_revenue_per_org (./src/internal/total_revenue_per_org.sql)
Compiling transformations.external.invoice_stats (./src/external/invoice_stats.sql)
Compiling transformations.internal.mau_per_org (./src/internal/mau_per_org.sql)
Compiling transformations.external.org_invoice_stats (./src/external/org_invoice_stats.sql)
Compiling transformations.internal.most_frequent_payer (./src/internal/most_frequent_payer.sql)
Compiling transformations.internal.users_per_domain (./src/internal/users_per_domain.sql)
Compiling transformations.external.org_invoice_stats (./src/external/org_invoice_stats.sql)
Compiling transformations.internal.users_per_org (./src/internal/users_per_org.sql)
Working set 1 report file, 1 .sdf file
Reporting sdf.reports.tables_with_pii (./reports/tables_with_pii.sql)
Finished 12 models [12 succeeded], 1 report [1 succeeded] in 1.360 secs
Finished 12 models [12 succeeded], 1 report [1 succeeded] in 1.676 secs

Report sdf.reports.tables_with_pii
+----------------------------------------------+-------------------------+-----------+
| table_ref | description | dialect |
| table_id | description | dialect |
+----------------------------------------------+-------------------------+-----------+
| transformations.internal.most_frequent_payer | | snowflake |
| transformations.external.invoice_stats | | snowflake |
| payment.public.users | DDL for the users table | snowflake |
| transformations.internal.most_frequent_payer | | snowflake |
| transformations.internal.users_per_domain | | snowflake |
+----------------------------------------------+-------------------------+-----------+
4 rows.
Expand All @@ -130,39 +130,39 @@ Report sdf.reports.tables_with_pii
As you can see from the output, it looks we have three tables containing columns with PII. Let's say we also wanted to show the names of the columns that contain PII. We can do this by modifying the SQL query to include the column name like so:
```sql
SELECT
t.table_ref,
t.table_id,
c.column_name,
t.description,
t.dialect
FROM
sdf.information_schema.tables t
JOIN
sdf.information_schema.columns c ON t.table_ref = c.table_ref
WHERE c.classifiers LIKE '%PII%'
sdf.information_schema.columns c ON t.table_id = c.table_id
WHERE CONTAINS_ARRAY_VARCHAR(c.classifiers, 'PII')
GROUP BY 2,1,3,4;
```

Now, when we run the report, we can see the column names as well:
```shell
sdf report --show all
```
``` shell
```shell shell
Working set 12 model files, 3 .sdf files
Working set 1 report file, 1 .sdf file
Reporting sdf.reports.tables_with_pii (./reports/tables_with_pii.sql)
Finished 12 models [12 reused], 1 report [1 succeeded] in 1.078 secs
Finished 12 models [12 reused], 1 report [1 succeeded] in 1.459 secs

Report sdf.reports.tables_with_pii
+----------------------------------------------+--------------+-------------------------+-----------+
| table_ref | column_name | description | dialect |
| table_id | column_name | description | dialect |
+----------------------------------------------+--------------+-------------------------+-----------+
| transformations.external.invoice_stats | email | | snowflake |
| payment.public.users | email | DDL for the users table | snowflake |
| payment.public.users | phone | DDL for the users table | snowflake |
| payment.public.users | name | DDL for the users table | snowflake |
| transformations.internal.users_per_domain | email_domain | | snowflake |
| payment.public.users | name | DDL for the users table | snowflake |
| transformations.internal.most_frequent_payer | email | | snowflake |
| transformations.external.invoice_stats | name | | snowflake |
| payment.public.users | email | DDL for the users table | snowflake |
+----------------------------------------------+--------------+-------------------------+-----------+
7 rows.

Expand All @@ -173,7 +173,7 @@ Report sdf.reports.tables_with_pii
```sql pii_datatypes.sql
SELECT c.datatype, COUNT(*) as frequency
FROM sdf.information_schema.columns c
WHERE c.classifiers LIKE '%PII%'
WHERE CONTAINS_ARRAY_VARCHAR(c.classifiers, 'PII')
GROUP BY c.datatype
ORDER BY frequency DESC;
```
Expand All @@ -183,23 +183,23 @@ Report sdf.reports.tables_with_pii
```


``` shell
```shell shell
Working set 12 model files, 3 .sdf files
Working set 2 report files, 1 .sdf file
Reporting sdf.reports.pii_datatypes (./reports/pii_datatypes.sql)
Finished 12 models [12 reused], 2 reports [1 succeeded, 1 reused] in 1.190 secs
Finished 12 models [12 reused], 2 reports [1 succeeded, 1 reused] in 1.442 secs

Report sdf.reports.tables_with_pii
+----------------------------------------------+--------------+-------------------------+-----------+
| table_ref | column_name | description | dialect |
| table_id | column_name | description | dialect |
+----------------------------------------------+--------------+-------------------------+-----------+
| payment.public.users | name | DDL for the users table | snowflake |
| payment.public.users | phone | DDL for the users table | snowflake |
| transformations.external.invoice_stats | email | | snowflake |
| transformations.internal.most_frequent_payer | email | | snowflake |
| transformations.external.invoice_stats | name | | snowflake |
| transformations.internal.users_per_domain | email_domain | | snowflake |
| payment.public.users | email | DDL for the users table | snowflake |
| payment.public.users | phone | DDL for the users table | snowflake |
| transformations.internal.users_per_domain | email_domain | | snowflake |
| transformations.external.invoice_stats | name | | snowflake |
| payment.public.users | name | DDL for the users table | snowflake |
| transformations.internal.most_frequent_payer | email | | snowflake |
+----------------------------------------------+--------------+-------------------------+-----------+
7 rows.

Expand Down
2 changes: 1 addition & 1 deletion docs/guide/install.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ To verify that SDF is installed correctly, run the following command:
sdf --help
```

``` shell
```shell shell
SDF's modular SQL
Usage: sdf <COMMAND>
Expand Down
8 changes: 4 additions & 4 deletions docs/guide/setup/io.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -67,14 +67,14 @@ Let's try these options. Start by creating a new SDF workspace with a sample pro
```
After running the command, you will see the following output:

``` shell
```shell shell
Created lineage/checks/check_sink_phone_is_pii.sql
Created lineage/src/knis.sql
Created lineage/src/middle.sql
Created lineage/src/sink.sql
Created lineage/src/source.sql
Created lineage/workspace.sdf.yml
Finished new in 0.028 secs
Finished new in 0.293 secs

```
</Step>
Expand All @@ -95,13 +95,13 @@ sdf compile
```
Reviewing this output, multiple files have been compiled and SDF has statically analyzed the queries.

``` shell
```shell shell
Working set 4 model files, 1 .sdf file
Compiling lineage.pub.source (./src/source.sql)
Compiling lineage.pub.middle (./src/middle.sql)
Compiling lineage.pub.knis (./src/knis.sql)
Compiling lineage.pub.sink (./src/sink.sql)
Finished 4 models [4 succeeded] in 0.395 secs
Finished 4 models [4 succeeded] in 0.774 secs

```

Expand Down
4 changes: 2 additions & 2 deletions docs/integrations/dbt/integrating.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -195,7 +195,7 @@ This command will initialize the SDF workspace for your DBT project. It will cre
```shell
sdf dbt init
```
``` shell
```shell shell
Initialize a sdf workspace from a dbt project -- best effort

Usage: sdf dbt init [OPTIONS]
Expand All @@ -220,7 +220,7 @@ This command will refresh the SDF workspace for your DBT project. This is useful
```shell
sdf dbt refresh
```
``` shell
```shell shell
Re-initialize a sdf workspace from a dbt project -- best effort

Usage: sdf dbt refresh [OPTIONS]
Expand Down
Loading

0 comments on commit 6bde552

Please sign in to comment.