New Docs Deployment for Mon Jun 10 22:01:39 UTC 2024

sdf-labs · Jun 10, 2024 · 6bde552 · 6bde552
1 parent a9401df
commit 6bde552
Show file tree

Hide file tree

Showing 135 changed files with 2,977 additions and 677 deletions.
diff --git a/docs/guide/data-quality/checks.mdx b/docs/guide/data-quality/checks.mdx
@@ -22,6 +22,15 @@ Let's write and modify a check, starting with one of the sdf samples.
   sdf new --sample hello_with_pii && cd hello_with_pii
   ```
 
+```shell shell
+    Created hello_with_pii/.gitignore
+    Created hello_with_pii/checks/code_check.sql
+    Created hello_with_pii/src/main.sql
+    Created hello_with_pii/workspace.sdf.yml
+   Finished new in 0.292 secs
+
+    ```
+
     This sample workspace contains a model, and a classifier, and a check. 
 
   </Step>
@@ -50,17 +59,13 @@ Let's write and modify a check, starting with one of the sdf samples.
 
     Hurray! The check passes.
 
-``` shell
-    Created hello_with_pii/.gitignore
-    Created hello_with_pii/checks/code_check.sql
-    Created hello_with_pii/src/main.sql
-    Created hello_with_pii/workspace.sdf.yml
-   Finished new in 0.025 secs
+
+```shell shell
 Working set 1 model file, 1 .sdf file
   Compiling hello.pub.main (./src/main.sql)
 Working set 1 test file, 1 .sdf file
     Testing hello.pub.code_check (./checks/code_check.sql)
-   Finished 1 model [1 succeeded], 1 check [1 passed] in 0.439 secs
+   Finished 1 model [1 succeeded], 1 check [1 passed] in 1.156 secs
 [Pass] Check hello.pub.code_check
 
     ```
@@ -70,13 +75,13 @@ Working set 1 test file, 1 .sdf file
 
     ```sql
     SELECT
-        DISTINCT c.table_name as "table_name",
+        DISTINCT c.table_name as "table name",
         c.column_name as "column name",
         c.classifiers
     FROM
         sdf.information_schema.columns c
     WHERE
-        c.classifiers like '%PII.name%'
+        CONTAINS_ARRAY_VARCHAR(c.classifiers, 'PII.name')
     ```
 
     It queries for any column that contains PII.name and asserts that this query should be empty. But we never added the classifier in the first place!
@@ -94,23 +99,24 @@ Working set 1 test file, 1 .sdf file
     ```
     Running `sdf check` again will **result in a failed test**. The output will look like this:
 
-``` shell
+
+```shell shell
 Working set 1 model file, 1 .sdf file
   Compiling hello.pub.main (./src/main.sql)
 Working set 1 test file, 1 .sdf file
     Testing hello.pub.code_check (./checks/code_check.sql)
-   Finished 1 model [1 succeeded], 1 check [1 failed] in 0.425 secs, for details see below.
+   Finished 1 model [1 succeeded], 1 check [1 failed] in 1.164 secs, for details see below.
 
 [Fail] Check hello.pub.code_check
-+----------------+-------------+-------------+
-| table_name     | column name | classifiers |
-+----------------+-------------+-------------+
-| hello.pub.main | column_2    | PII.name    |
-+----------------+-------------+-------------+
++------------+-------------+-------------+
+| table_name | column name | classifiers |
++------------+-------------+-------------+
+| main       | column_2    | [PII.name]  |
++------------+-------------+-------------+
 1 rows.
 
 -------
-Summary 1 model [1 succeeded], 1 check [1 failed] in 0.425 secs.
+Summary 1 model [1 succeeded], 1 check [1 failed] in 1.164 secs.
 -------
 
     ```
@@ -151,13 +157,28 @@ Let's fix this failing check! We can apply an `md5()` hash to column_2 (the name
         c.column_name as "column name",
         c.classifiers
     FROM
-        columns c
+        sdf.information_schema.columns c
     WHERE
-        c.classifiers like '%PII.name%'
-        and c.table_name like '%.sink'
+        CONTAINS_ARRAY_VARCHAR(c.classifiers, 'PII.name')
+        AND c.table_name like '%.sink'
     ```
 
     Run `sdf check` again and voila! All checks pass. The source table has PII.name, the sink table does not. 
+
+
+
+
+```shell shell
+Working set 2 model files, 1 .sdf file
+  Compiling hello.pub.main (./src/main.sql)
+  Compiling hello.pub.sink (./src/sink.sql)
+Working set 1 test file, 1 .sdf file
+    Testing hello.pub.code_check (./checks/code_check.sql)
+   Finished 2 models [2 succeeded], 1 check [1 passed] in 1.147 secs
+[Pass] Check hello.pub.code_check
+
+    ``` 
+
   </Step>
 </Steps>
 

diff --git a/docs/guide/data-quality/reports.mdx b/docs/guide/data-quality/reports.mdx
@@ -22,7 +22,7 @@ Let's write and modify a report, starting with one of the sdf samples.
     sdf new --sample pii_saas_platform
     ```
 
-``` shell
+```shell shell
     Created pii_saas_platform/.gitignore
     Created pii_saas_platform/checks/no_pii_in_external.sql
     Created pii_saas_platform/classification/taxonomy.sdf.yml
@@ -41,7 +41,7 @@ Let's write and modify a report, starting with one of the sdf samples.
     Created pii_saas_platform/src/internal/users_per_domain.sql
     Created pii_saas_platform/src/internal/users_per_org.sql
     Created pii_saas_platform/workspace.sdf.yml
-   Finished new in 0.027 secs
+   Finished new in 0.291 secs
 
 ```
     This sample contains a report and a workspace modeled after a simple SAAS platform.
@@ -60,14 +60,14 @@ Let's write and modify a report, starting with one of the sdf samples.
     The report itself lists all tables with PII by fetching all columns with classifiers that have `PII`. This is done with the following SQL:
     ```sql
     SELECT 
-      t.table_ref, 
+      t.table_id, 
       t.description, 
       t.dialect
     FROM 
       sdf.information_schema.tables t
     JOIN 
-      sdf.information_schema.columns c ON t.table_ref = c.table_ref
-    WHERE c.classifiers LIKE '%PII%'
+      sdf.information_schema.columns c ON t.table_id = c.table_id
+    WHERE CONTAINS_ARRAY_VARCHAR(c.classifiers, 'PII')
     GROUP BY 1,2,3;
     ```
 
@@ -96,31 +96,31 @@ Let's write and modify a report, starting with one of the sdf samples.
     sdf report --show all
     ```
 
-``` shell
+```shell shell
 Working set 12 model files, 3 .sdf files
   Compiling payment.public.invoices (./ddls/payment/public/invoices.sql)
   Compiling payment.public.users (./ddls/payment/public/users.sql)
   Compiling payment.public.organizations (./ddls/payment/public/organizations.sql)
   Compiling transformations.internal.avg_invoice_amt (./src/internal/avg_invoice_amt.sql)
-  Compiling transformations.internal.total_revenue_per_org (./src/internal/total_revenue_per_org.sql)
   Compiling transformations.internal.invoice_payment_delay (./src/internal/invoice_payment_delay.sql)
+  Compiling transformations.internal.most_frequent_payer (./src/internal/most_frequent_payer.sql)
+  Compiling transformations.internal.total_revenue_per_org (./src/internal/total_revenue_per_org.sql)
   Compiling transformations.external.invoice_stats (./src/external/invoice_stats.sql)
   Compiling transformations.internal.mau_per_org (./src/internal/mau_per_org.sql)
-  Compiling transformations.external.org_invoice_stats (./src/external/org_invoice_stats.sql)
-  Compiling transformations.internal.most_frequent_payer (./src/internal/most_frequent_payer.sql)
   Compiling transformations.internal.users_per_domain (./src/internal/users_per_domain.sql)
+  Compiling transformations.external.org_invoice_stats (./src/external/org_invoice_stats.sql)
   Compiling transformations.internal.users_per_org (./src/internal/users_per_org.sql)
 Working set 1 report file, 1 .sdf file
   Reporting sdf.reports.tables_with_pii (./reports/tables_with_pii.sql)
-   Finished 12 models [12 succeeded], 1 report [1 succeeded] in 1.360 secs
+   Finished 12 models [12 succeeded], 1 report [1 succeeded] in 1.676 secs
 
 Report sdf.reports.tables_with_pii
 +----------------------------------------------+-------------------------+-----------+
-| table_ref                                    | description             | dialect   |
+| table_id                                     | description             | dialect   |
 +----------------------------------------------+-------------------------+-----------+
-| transformations.internal.most_frequent_payer |                         | snowflake |
 | transformations.external.invoice_stats       |                         | snowflake |
 | payment.public.users                         | DDL for the users table | snowflake |
+| transformations.internal.most_frequent_payer |                         | snowflake |
 | transformations.internal.users_per_domain    |                         | snowflake |
 +----------------------------------------------+-------------------------+-----------+
 4 rows.
@@ -130,39 +130,39 @@ Report sdf.reports.tables_with_pii
     As you can see from the output, it looks we have three tables containing columns with PII. Let's say we also wanted to show the names of the columns that contain PII. We can do this by modifying the SQL query to include the column name like so:
     ```sql
     SELECT 
-      t.table_ref, 
+      t.table_id, 
       c.column_name,
       t.description, 
       t.dialect
     FROM 
       sdf.information_schema.tables t
     JOIN 
-      sdf.information_schema.columns c ON t.table_ref = c.table_ref
-    WHERE c.classifiers LIKE '%PII%'
+      sdf.information_schema.columns c ON t.table_id = c.table_id
+    WHERE CONTAINS_ARRAY_VARCHAR(c.classifiers, 'PII')
     GROUP BY 2,1,3,4;
     ```
 
     Now, when we run the report, we can see the column names as well:
     ```shell
     sdf report --show all
     ```
-``` shell 
+```shell shell
 Working set 12 model files, 3 .sdf files
 Working set 1 report file, 1 .sdf file
   Reporting sdf.reports.tables_with_pii (./reports/tables_with_pii.sql)
-   Finished 12 models [12 reused], 1 report [1 succeeded] in 1.078 secs
+   Finished 12 models [12 reused], 1 report [1 succeeded] in 1.459 secs
 
 Report sdf.reports.tables_with_pii
 +----------------------------------------------+--------------+-------------------------+-----------+
-| table_ref                                    | column_name  | description             | dialect   |
+| table_id                                     | column_name  | description             | dialect   |
 +----------------------------------------------+--------------+-------------------------+-----------+
 | transformations.external.invoice_stats       | email        |                         | snowflake |
+| payment.public.users                         | email        | DDL for the users table | snowflake |
 | payment.public.users                         | phone        | DDL for the users table | snowflake |
-| payment.public.users                         | name         | DDL for the users table | snowflake |
 | transformations.internal.users_per_domain    | email_domain |                         | snowflake |
+| payment.public.users                         | name         | DDL for the users table | snowflake |
 | transformations.internal.most_frequent_payer | email        |                         | snowflake |
 | transformations.external.invoice_stats       | name         |                         | snowflake |
-| payment.public.users                         | email        | DDL for the users table | snowflake |
 +----------------------------------------------+--------------+-------------------------+-----------+
 7 rows.
 
@@ -173,7 +173,7 @@ Report sdf.reports.tables_with_pii
     ```sql pii_datatypes.sql
       SELECT c.datatype, COUNT(*) as frequency
       FROM sdf.information_schema.columns c
-      WHERE c.classifiers LIKE '%PII%'
+      WHERE CONTAINS_ARRAY_VARCHAR(c.classifiers, 'PII')
       GROUP BY c.datatype 
       ORDER BY frequency DESC;
     ```
@@ -183,23 +183,23 @@ Report sdf.reports.tables_with_pii
     ```
 
 
-``` shell
+```shell shell
 Working set 12 model files, 3 .sdf files
 Working set 2 report files, 1 .sdf file
   Reporting sdf.reports.pii_datatypes (./reports/pii_datatypes.sql)
-   Finished 12 models [12 reused], 2 reports [1 succeeded, 1 reused] in 1.190 secs
+   Finished 12 models [12 reused], 2 reports [1 succeeded, 1 reused] in 1.442 secs
 
 Report sdf.reports.tables_with_pii
 +----------------------------------------------+--------------+-------------------------+-----------+
-| table_ref                                    | column_name  | description             | dialect   |
+| table_id                                     | column_name  | description             | dialect   |
 +----------------------------------------------+--------------+-------------------------+-----------+
-| payment.public.users                         | name         | DDL for the users table | snowflake |
-| payment.public.users                         | phone        | DDL for the users table | snowflake |
 | transformations.external.invoice_stats       | email        |                         | snowflake |
-| transformations.internal.most_frequent_payer | email        |                         | snowflake |
-| transformations.external.invoice_stats       | name         |                         | snowflake |
-| transformations.internal.users_per_domain    | email_domain |                         | snowflake |
 | payment.public.users                         | email        | DDL for the users table | snowflake |
+| payment.public.users                         | phone        | DDL for the users table | snowflake |
+| transformations.internal.users_per_domain    | email_domain |                         | snowflake |
+| transformations.external.invoice_stats       | name         |                         | snowflake |
+| payment.public.users                         | name         | DDL for the users table | snowflake |
+| transformations.internal.most_frequent_payer | email        |                         | snowflake |
 +----------------------------------------------+--------------+-------------------------+-----------+
 7 rows.
 

diff --git a/docs/guide/install.mdx b/docs/guide/install.mdx
@@ -26,7 +26,7 @@ To verify that SDF is installed correctly, run the following command:
   sdf --help
 ```
 
-``` shell
+```shell shell
 SDF's modular SQL
 
 Usage: sdf <COMMAND>

diff --git a/docs/guide/setup/io.mdx b/docs/guide/setup/io.mdx
@@ -67,14 +67,14 @@ Let's try these options. Start by creating a new SDF workspace with a sample pro
 		```
     After running the command, you will see the following output:
 
-``` shell
+```shell shell
     Created lineage/checks/check_sink_phone_is_pii.sql
     Created lineage/src/knis.sql
     Created lineage/src/middle.sql
     Created lineage/src/sink.sql
     Created lineage/src/source.sql
     Created lineage/workspace.sdf.yml
-   Finished new in 0.028 secs
+   Finished new in 0.293 secs
 
 ```
   </Step>
@@ -95,13 +95,13 @@ sdf compile
 ```
     Reviewing this output, multiple files have been compiled and SDF has statically analyzed the queries.
 
-``` shell
+```shell shell
 Working set 4 model files, 1 .sdf file
   Compiling lineage.pub.source (./src/source.sql)
   Compiling lineage.pub.middle (./src/middle.sql)
   Compiling lineage.pub.knis (./src/knis.sql)
   Compiling lineage.pub.sink (./src/sink.sql)
-   Finished 4 models [4 succeeded] in 0.395 secs
+   Finished 4 models [4 succeeded] in 0.774 secs
 
 ```
 

diff --git a/docs/integrations/dbt/integrating.mdx b/docs/integrations/dbt/integrating.mdx
@@ -195,7 +195,7 @@ This command will initialize the SDF workspace for your DBT project. It will cre
 ```shell
 sdf dbt init
 ```
-``` shell
+```shell shell
 Initialize a sdf workspace from a dbt project -- best effort
 
 Usage: sdf dbt init [OPTIONS]
@@ -220,7 +220,7 @@ This command will refresh the SDF workspace for your DBT project. This is useful
 ```shell
 sdf dbt refresh
 ```
-``` shell
+```shell shell
 Re-initialize a sdf workspace from a dbt project -- best effort
 
 Usage: sdf dbt refresh [OPTIONS]