chore: doc updates stage 1

z3z1ma · Jan 4, 2025 · 701eb22 · 701eb22
1 parent 401d7ae
commit 701eb22
Show file tree

Hide file tree

Showing 5 changed files with 220 additions and 139 deletions.
diff --git a/docs/docs/intro.md b/docs/docs/intro.md
@@ -4,7 +4,7 @@ sidebar_position: 1
 
 # dbt-osmosis Intro
 
-Let's discover **dbt-osmosis in less than 5 minutes**.
+Let's discover **dbt-osmosis** in less than 5 minutes.
 
 ## Getting Started
 
@@ -14,7 +14,7 @@ Get started by **running dbt-osmosis**.
 
 - [Python](https://www.python.org/downloads/) (3.8+)
 - [dbt](https://docs.getdbt.com/docs/core/installation) (1.0.0+)
-- [pipx](https://pypa.github.io/pipx/installation/)
+- [uv](https://docs.astral.sh/uv/getting-started/installation/#standalone-installer)
 - An existing dbt project (or you can play with it using [jaffle shop](https://github.com/dbt-labs/jaffle_shop_duckdb))
 
 ## Configure dbt-osmosis
@@ -29,10 +29,18 @@ models:
 
 ## Run dbt-osmosis
 
-Run dbt-osmosis with the following command to automatically perform a refactoring of your dbt project YAML files. Run this command from the root of your dbt project. Ensure your git repository is clean before running this command. Replace `<adapter>` with the name of your dbt adapter (e.g. `snowflake`, `bigquery`, `redshift`, `postgres`, `athena`, `spark`, `trino`, `sqlite`, `duckdb`, `oracle`, `sqlserver`).
+If using uv(x):
 
 ```bash
-pipx run --pip-args="dbt-<adapter>" dbt-osmosis yaml refactor
+uvx --with='dbt-<adapter>==1.9.0' dbt-osmosis yaml refactor
 ```
 
+Or, if installed in your Python environment:
+
+```bash
+dbt-osmosis yaml refactor
+```
+
+Run this command from the root of your dbt project. Ensure your git repository is clean before running. Replace `<adapter>` with the name of your dbt adapter (e.g. `snowflake`, `bigquery`, `redshift`, `postgres`, `athena`, `spark`, `trino`, `sqlite`, `duckdb`, `oracle`, `sqlserver`).
+
 Watch the magic unfold. ✨
diff --git a/docs/docs/tutorial-basics/commands.md b/docs/docs/tutorial-basics/commands.md
@@ -4,111 +4,117 @@ sidebar_position: 2
 
 # CLI Overview
 
-This section describes the commands available in dbt-osmosis.
+Below is a high-level overview of the commands currently provided by dbt-osmosis. Each command also supports additional options such as:
 
-## YAML Management
-
-These commands are used to manage the YAML files in your dbt project. Please read the [YAML configuration](/docs/tutorial-yaml/configuration) section to understand the minimum required configuration to use these commands.
-
-### Document
+- `--dry-run` to prevent writing changes to disk
+- `--check` to exit with a non-zero code if changes would have been made
+- `--fqn` to filter nodes by [dbt's FQN](https://docs.getdbt.com/reference/node-selection/syntax#the-fqn-method) segments
+- `--disable-introspection` to run without querying the warehouse (helpful if you are offline), often paired with `--catalog-path`
+- `--catalog-path` to read columns from a prebuilt `catalog.json`
 
-This command will document your dbt project YAML files. Specifically it will:
+Other helpful flags are described in each command below.
 
-- Reorder columns in your YAML files to match the order of the columns in your database
-- Add columns to your YAML files that are present in your database
-- Remove columns from your YAML files that are missing from your database
-- Pass down column level documentation from upstream models to downstream models (if the downstream model does not have documentation for that column)
+## YAML Management
 
-```bash
-dbt-osmosis yaml document [--project-dir] [--profiles-dir] [--target]
-```
+**All of the following commands live under** `dbt-osmosis yaml <command>`.
 
 ### Organize
 
-This command will organize your dbt project YAML files. Specifically it will:
+Restructures your schema YAML files based on the **declarative** configuration in `dbt_project.yml`. Specifically, it:
 
-- Bootstrap sources if they do not exist based on the `dbt-osmosis` **var** in your `dbt_project.yml` file.
-- Migrate your YAML files based on the dbt-osmosis **config** (ideally) set in your `dbt_project.yml` file.
-- Ensures that your project matches a declarative specification (i.e. your YAML files are in the correct location and have the correct name).
+- Bootstraps missing YAML files for any undocumented models or sources
+- Moves or merges existing YAML files according to your configured rules (the `+dbt-osmosis:` keys)
 
 ```bash
-dbt-osmosis yaml organize [--project-dir] [--profiles-dir] [--target]
+dbt-osmosis yaml organize [--project-dir] [--profiles-dir] [--target] [--fqn ...] [--dry-run] [--check]
 ```
 
-### Refactor
+Options often used:
+
+- `--auto-apply` to apply all file location changes without asking for confirmation
+- `--disable-introspection` + `--catalog-path=/path/to/catalog.json` if not connected to a warehouse
 
-This command will refactor your dbt project YAML files. Specifically it will:
+### Document
 
-- Bootstrap sources if they do not exist based on the `dbt-osmosis` **var** in your `dbt_project.yml` file.
-- Migrate your YAML files based on the dbt-osmosis **config** (ideally) set in your `dbt_project.yml` file.
-- Ensures that your project matches a declarative specification (i.e. your YAML files are in the correct location and have the correct name).
-- Reorder columns in your YAML files to match the order of the columns in your database
-- Add columns to your YAML files that are present in your database
-- Remove columns from your YAML files that are missing from your database
-- Pass down column level documentation from upstream models to downstream models (if the downstream model does not have documentation for that column)
+Passes down column-level documentation from upstream nodes to downstream nodes (a deep inheritance). Specifically, it can:
 
-This command is a combination of the `document` and `organize` commands run in the correct order.
+- Add columns that are present in the database (or `catalog.json`) but missing from your YAML
+- Remove columns missing from your database (optional, if used with other steps)
+- Reorder columns (optional, if combined with your sorting preference—see below)
+- Inherit tags, descriptions, and meta fields from upstream models
 
 ```bash
-dbt-osmosis yaml refactor [--project-dir] [--profiles-dir] [--target]
+dbt-osmosis yaml document [--project-dir] [--profiles-dir] [--target] [--fqn ...] [--dry-run] [--check]
 ```
 
-## Server
-
-dbt-osmosis ships with a server that can be used to drive 3rd party tools. This server is a zero dependency WSGI server powered by [bottle](https://bottlepy.org/docs/dev/). It provides high performance endpoints that leverage the plumbing in dbt-osmosis to provide a fast and reliable API. The server is "multi-tenant" in that it can serve multiple dbt projects at once. The server is not intended to be run on a public facing network. dbt-osmosis is essentially providing a thin CLI wrapper over dbt-core-interface where the server is actually implemented.
-
-### Serve
+Options often used:
 
-This command will start the dbt-osmosis server. The server will be available at `http://localhost:8581` by default.
+- `--force-inherit-descriptions` to override *existing* descriptions if they are placeholders
+- `--use-unrendered-descriptions` so that you can propagate Jinja-based docs (like `{{ doc(...) }}`)
+- `--skip-add-columns`, `--skip-add-data-types`, `--skip-merge-meta`, `--skip-add-tags`, etc., if you want to limit changes
+- `--synthesize` to autogenerate missing documentation with ChatGPT/OpenAI (see *Synthesis* below)
 
-```bash
-dbt-osmosis server serve [--host] [--port]
-```
+### Refactor
 
-### Register Project
+The **combination** of both `organize` and `document` in the correct order. Typically the recommended command to run:
 
-This command will register a dbt project with the dbt-osmosis server.
+- Creates or moves YAML files to match your `dbt_project.yml` rules
+- Ensures columns are up to date with warehouse or catalog
+- Inherits descriptions and metadata
+- Reorders columns if desired
 
 ```bash
-dbt-osmosis server register-project --project-dir /path/to/dbt/project
+dbt-osmosis yaml refactor [--project-dir] [--profiles-dir] [--target] [--fqn ...] [--dry-run] [--check]
 ```
 
-### Unregister Project
+Options often used:
 
-This command will unregister a dbt project with the dbt-osmosis server.
+- `--auto-apply`
+- `--force-inherit-descriptions`, `--use-unrendered-descriptions`
+- `--skip-add-data-types`, `--skip-add-columns`, etc.
+- `--synthesize` to autogenerate missing documentation with ChatGPT/OpenAI
 
-```bash
-dbt-osmosis server unregister-project --project-dir /path/to/dbt/project
-```
+### Commonly Used Flags in YAML Commands
+
+- `--fqn=staging.some_subfolder` to limit to a particular subfolder or results of dbt ls
+- `--check` to fail your CI if dbt-osmosis *would* make changes
+- `--dry-run` to preview changes without writing them to disk
+- `--catalog-path=target/catalog.json` to avoid live queries
+- `--disable-introspection` to skip warehouse queries entirely
+- `--auto-apply` to skip manual confirmation for file moves
 
 ## SQL
 
-These commands provide two unique and interesting ways to interact with dbt models. Both of these commands support stdin as an input source. This allows you to pipe a SQL query into the command or `cat` a dbt model into the command.
+These commands let you compile or run SQL snippets (including Jinja) directly:
 
 ### Run
 
-This command will run a dbt model and return the results as a JSON object. This command is useful for testing dbt models in a REPL environment.
+Runs a SQL statement or a dbt Jinja-based query.
 
 ```bash
-dbt-osmosis sql run [--project-dir] [--profiles-dir] [--target] "select * from {{ ref('my_model') }}"
+dbt-osmosis sql run "select * from {{ ref('my_model') }} limit 50"
 ```
 
+Returns results in tabular format to stdout. Use `--threads` to run multiple queries in parallel (though typically you’d run one statement at a time).
+
 ### Compile
 
-This command will compile a dbt model and return the results as a JSON object. This command is useful for testing dbt models in a REPL environment.
+Compiles a SQL statement (including Jinja) but doesn’t run it. Useful for quickly validating macros, refs, or Jinja logic:
 
 ```bash
-dbt-osmosis sql compile [--project-dir] [--profiles-dir] [--target] "select * from {{ ref('my_model') }}"
+dbt-osmosis sql compile "select * from {{ ref('my_model') }}"
 ```
 
+Prints the compiled SQL to stdout.
+
 ## Workbench
 
-This command starts a [streamlit](https://streamlit.io/) workbench. The workbench is a REPL environment that allows you to run dbt models, provides realtime side by side compilation, and lets you explore the results.
+Launches a [Streamlit](https://streamlit.io/) application that:
+
+- Lets you explore and run queries against your dbt models in a REPL-like environment
+- Provides side-by-side compiled SQL
+- Offers real-time iteration on queries
 
 ```bash
-dbt-osmosis workbench [--project-dir] [--profiles-dir] [--target] [--host] [--port]
+dbt-osmosis workbench [--project-dir] [--profiles-dir] [--host] [--port]
 ```
-
-## Diff
-
-This command will diff a dbt model across git commits. This command is useful for understanding how a model has changed over time. Currently this feature is under development. 🚧
diff --git a/docs/docs/tutorial-basics/installation.md b/docs/docs/tutorial-basics/installation.md
@@ -4,21 +4,18 @@ sidebar_position: 1
 
 # Installation
 
-## Install with pipx
-
-If you  will install dbt-osmosis and its dependencies in a virtual environment, and make it available as a command-line tool.
+## Install with uv
 
 ```bash
-pipx install dbt-osmosis
-pipx inject dbt-osmosis dbt-<adapter>
+uv tool install --with="dbt-<adapter>~=1.9.0" dbt-osmosis
 ```
 
-## Install with pip
+This will install `dbt-osmosis` and its dependencies in a virtual environment, and make it available as a command-line tool via `dbt-osmosis`. You can also use `uvx` like in the intro to run it directly in a more ephemeral way.
 
-This will install dbt-osmosis and its dependencies in your global Python environment or your active virtual environment.
+## Install with pip
 
 ```bash
 pip install dbt-osmosis dbt-<adapter>
 ```
 
-Naturally, you can also install dbt-osmosis using your favorite package manager. `dbt-osmosis` is available on [PyPI](https://pypi.org/project/dbt-osmosis/).
+(This installs `dbt-osmosis` into your current Python environment.)
diff --git a/docs/docs/tutorial-yaml/configuration.md b/docs/docs/tutorial-yaml/configuration.md
@@ -1,87 +1,60 @@
 ---
 sidebar_position: 1
 ---
+
 # Configuration
 
 ## Configuring dbt-osmosis
 
 ### Models
 
-dbt-osmosis' primary purpose is to automatically generate and manage YAML files for your dbt models. We opt for explicitness over implicitness. Thus the following configuration is required to even run dbt-osmosis. By specifying this configuration at the top-level beneath your project key, you are specifying a default configuration for all models in your project. You can override this configuration for individual models by specifying the `+dbt-osmosis` configuration at various levels of the hierarchy in the configuration file. These levels match your folder structure exactly.
+At minimum, each **folder** (or subfolder) of models in your dbt project must specify the `+dbt-osmosis` directive so that dbt-osmosis knows **where** to create or move the YAML files.
 
 ```yaml title="dbt_project.yml"
 models:
   <your_project_name>:
-    +dbt-osmosis: <path>
-```
-
-- `<your_project_name>` is the name of your dbt project.
-- `<path>` is the path to the YAML file that will be generated for the model. This path is **relative to the model's (sql file) directory.**
-
-#### Examples
-
-```yaml title="dbt_project.yml"
-models:
-  your_project_name:
-    # a default blanket rule
-    +dbt-osmosis: "_{model}.yml"
+    +dbt-osmosis: "_{model}.yml"   # Default for entire project
 
     staging:
-      # nest docs in subfolder relative to model
-      +dbt-osmosis: "schema/{model}.yml"
+      +dbt-osmosis: "{parent}.yml" # Each subfolder lumps docs by folder name
 
     intermediate:
-      # separate docs based on materialization
+      # Example of using node.config or node.tags
       +dbt-osmosis: "{node.config[materialized]}/{model}.yml"
 
     marts:
-      # static paths are perfectly fine!
+      # A single schema file for all models in 'marts'
       +dbt-osmosis: "prod.yml"
-```
-
-### Sources
 
-dbt-osmosis can be configured to automatically generate YAML files for your dbt sources. To enable this feature, add the following to your `dbt_project.yml` file.
-
-```yaml title="dbt_project.yml"
-vars:
-  dbt-osmosis:
-    <source_name>: <path>
-    <source_name>:
-      path: <path>
-      database: <database>
-      schema: <schema>
-    _blacklist: <blacklist>
+seeds:
+  <your_project_name>:
+    +dbt-osmosis: "_schema.yml"
 ```
 
-- `<source_name>` is the name of a source in your `dbt_project.yml` file.
-- `<path>` is the path to the YAML file that will be generated for the source. This path is relative to the root of your dbt project models directory.
-- `<database>` is the database that will be used for the source. If not specified, the database will default to the one in your profiles.yml file.
-- `<schema>` is the schema that will be used for the source. If not specified, the source name is assumed to be the schema which matches dbt's default behavior.
-- `<blacklist>` is the columns to be ignored. You can use regular expressions to specify which columns you'd like to exclude.
+### Sources
 
-#### Examples
+You can optionally configure dbt-osmosis to manage sources automatically. In your `dbt_project.yml`:
 
 ```yaml title="dbt_project.yml"
 vars:
   dbt-osmosis:
-    # a source with a different schema
-    salesforce:
-      path: "staging/salesforce/source.yml"
-      schema: "salesforce_v2"
-
-    # a source with the same schema as the source name
-    marketo: "staging/customer/marketo.yml"
+    sources:
+      salesforce:
+        path: "staging/salesforce/source.yml"
+        schema: "salesforce_v2"
 
-    # a special variable interpolated at runtime
-    jira: "staging/project_mgmt/{parent}.yml"
+      marketo: "staging/customer/marketo.yml"
+      jira: "staging/project_mgmt/{parent}.yml"
+      github: "all_sources/github.yml"
 
-    # a dedicated directory for all sources
-    github: "all_sources/github.yml"
-
-  _blacklist:
+  # Columns matching these patterns will be ignored (like ephemeral system columns)
+  column_ignore_patterns:
     - "_FIVETRAN_SYNCED"
     - ".*__key__.namespace"
 ```
 
-Notice the use of the `{parent}` variable in the `jira` source configuration. This variable is a special variable that will be replaced with the name of the parent directory of the YAML file. The other special variables are `{node}` and `{model}`. We will discuss these variables in more detail in the next section.
+**Key points:**
+
+- `vars: dbt-osmosis: sources: <source_name>` sets where the source YAML file should live.
+- If the source does not actually exist yet, dbt-osmosis can bootstrap it.
+- If you omit `schema`, dbt-osmosis infers it is the same as your source name.