Skip to content

Commit

Permalink
chore: doc updates stage 1
Browse files Browse the repository at this point in the history
  • Loading branch information
z3z1ma committed Jan 4, 2025
1 parent 401d7ae commit 701eb22
Show file tree
Hide file tree
Showing 5 changed files with 220 additions and 139 deletions.
16 changes: 12 additions & 4 deletions docs/docs/intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ sidebar_position: 1

# dbt-osmosis Intro

Let's discover **dbt-osmosis in less than 5 minutes**.
Let's discover **dbt-osmosis** in less than 5 minutes.

## Getting Started

Expand All @@ -14,7 +14,7 @@ Get started by **running dbt-osmosis**.

- [Python](https://www.python.org/downloads/) (3.8+)
- [dbt](https://docs.getdbt.com/docs/core/installation) (1.0.0+)
- [pipx](https://pypa.github.io/pipx/installation/)
- [uv](https://docs.astral.sh/uv/getting-started/installation/#standalone-installer)
- An existing dbt project (or you can play with it using [jaffle shop](https://github.com/dbt-labs/jaffle_shop_duckdb))

## Configure dbt-osmosis
Expand All @@ -29,10 +29,18 @@ models:
## Run dbt-osmosis
Run dbt-osmosis with the following command to automatically perform a refactoring of your dbt project YAML files. Run this command from the root of your dbt project. Ensure your git repository is clean before running this command. Replace `<adapter>` with the name of your dbt adapter (e.g. `snowflake`, `bigquery`, `redshift`, `postgres`, `athena`, `spark`, `trino`, `sqlite`, `duckdb`, `oracle`, `sqlserver`).
If using uv(x):
```bash
pipx run --pip-args="dbt-<adapter>" dbt-osmosis yaml refactor
uvx --with='dbt-<adapter>==1.9.0' dbt-osmosis yaml refactor
```

Or, if installed in your Python environment:

```bash
dbt-osmosis yaml refactor
```

Run this command from the root of your dbt project. Ensure your git repository is clean before running. Replace `<adapter>` with the name of your dbt adapter (e.g. `snowflake`, `bigquery`, `redshift`, `postgres`, `athena`, `spark`, `trino`, `sqlite`, `duckdb`, `oracle`, `sqlserver`).

Watch the magic unfold. ✨
122 changes: 64 additions & 58 deletions docs/docs/tutorial-basics/commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,111 +4,117 @@ sidebar_position: 2

# CLI Overview

This section describes the commands available in dbt-osmosis.
Below is a high-level overview of the commands currently provided by dbt-osmosis. Each command also supports additional options such as:

## YAML Management

These commands are used to manage the YAML files in your dbt project. Please read the [YAML configuration](/docs/tutorial-yaml/configuration) section to understand the minimum required configuration to use these commands.

### Document
- `--dry-run` to prevent writing changes to disk
- `--check` to exit with a non-zero code if changes would have been made
- `--fqn` to filter nodes by [dbt's FQN](https://docs.getdbt.com/reference/node-selection/syntax#the-fqn-method) segments
- `--disable-introspection` to run without querying the warehouse (helpful if you are offline), often paired with `--catalog-path`
- `--catalog-path` to read columns from a prebuilt `catalog.json`

This command will document your dbt project YAML files. Specifically it will:
Other helpful flags are described in each command below.

- Reorder columns in your YAML files to match the order of the columns in your database
- Add columns to your YAML files that are present in your database
- Remove columns from your YAML files that are missing from your database
- Pass down column level documentation from upstream models to downstream models (if the downstream model does not have documentation for that column)
## YAML Management

```bash
dbt-osmosis yaml document [--project-dir] [--profiles-dir] [--target]
```
**All of the following commands live under** `dbt-osmosis yaml <command>`.

### Organize

This command will organize your dbt project YAML files. Specifically it will:
Restructures your schema YAML files based on the **declarative** configuration in `dbt_project.yml`. Specifically, it:

- Bootstrap sources if they do not exist based on the `dbt-osmosis` **var** in your `dbt_project.yml` file.
- Migrate your YAML files based on the dbt-osmosis **config** (ideally) set in your `dbt_project.yml` file.
- Ensures that your project matches a declarative specification (i.e. your YAML files are in the correct location and have the correct name).
- Bootstraps missing YAML files for any undocumented models or sources
- Moves or merges existing YAML files according to your configured rules (the `+dbt-osmosis:` keys)

```bash
dbt-osmosis yaml organize [--project-dir] [--profiles-dir] [--target]
dbt-osmosis yaml organize [--project-dir] [--profiles-dir] [--target] [--fqn ...] [--dry-run] [--check]
```

### Refactor
Options often used:

- `--auto-apply` to apply all file location changes without asking for confirmation
- `--disable-introspection` + `--catalog-path=/path/to/catalog.json` if not connected to a warehouse

This command will refactor your dbt project YAML files. Specifically it will:
### Document

- Bootstrap sources if they do not exist based on the `dbt-osmosis` **var** in your `dbt_project.yml` file.
- Migrate your YAML files based on the dbt-osmosis **config** (ideally) set in your `dbt_project.yml` file.
- Ensures that your project matches a declarative specification (i.e. your YAML files are in the correct location and have the correct name).
- Reorder columns in your YAML files to match the order of the columns in your database
- Add columns to your YAML files that are present in your database
- Remove columns from your YAML files that are missing from your database
- Pass down column level documentation from upstream models to downstream models (if the downstream model does not have documentation for that column)
Passes down column-level documentation from upstream nodes to downstream nodes (a deep inheritance). Specifically, it can:

This command is a combination of the `document` and `organize` commands run in the correct order.
- Add columns that are present in the database (or `catalog.json`) but missing from your YAML
- Remove columns missing from your database (optional, if used with other steps)
- Reorder columns (optional, if combined with your sorting preference—see below)
- Inherit tags, descriptions, and meta fields from upstream models

```bash
dbt-osmosis yaml refactor [--project-dir] [--profiles-dir] [--target]
dbt-osmosis yaml document [--project-dir] [--profiles-dir] [--target] [--fqn ...] [--dry-run] [--check]
```

## Server

dbt-osmosis ships with a server that can be used to drive 3rd party tools. This server is a zero dependency WSGI server powered by [bottle](https://bottlepy.org/docs/dev/). It provides high performance endpoints that leverage the plumbing in dbt-osmosis to provide a fast and reliable API. The server is "multi-tenant" in that it can serve multiple dbt projects at once. The server is not intended to be run on a public facing network. dbt-osmosis is essentially providing a thin CLI wrapper over dbt-core-interface where the server is actually implemented.

### Serve
Options often used:

This command will start the dbt-osmosis server. The server will be available at `http://localhost:8581` by default.
- `--force-inherit-descriptions` to override *existing* descriptions if they are placeholders
- `--use-unrendered-descriptions` so that you can propagate Jinja-based docs (like `{{ doc(...) }}`)
- `--skip-add-columns`, `--skip-add-data-types`, `--skip-merge-meta`, `--skip-add-tags`, etc., if you want to limit changes
- `--synthesize` to autogenerate missing documentation with ChatGPT/OpenAI (see *Synthesis* below)

```bash
dbt-osmosis server serve [--host] [--port]
```
### Refactor

### Register Project
The **combination** of both `organize` and `document` in the correct order. Typically the recommended command to run:

This command will register a dbt project with the dbt-osmosis server.
- Creates or moves YAML files to match your `dbt_project.yml` rules
- Ensures columns are up to date with warehouse or catalog
- Inherits descriptions and metadata
- Reorders columns if desired

```bash
dbt-osmosis server register-project --project-dir /path/to/dbt/project
dbt-osmosis yaml refactor [--project-dir] [--profiles-dir] [--target] [--fqn ...] [--dry-run] [--check]
```

### Unregister Project
Options often used:

This command will unregister a dbt project with the dbt-osmosis server.
- `--auto-apply`
- `--force-inherit-descriptions`, `--use-unrendered-descriptions`
- `--skip-add-data-types`, `--skip-add-columns`, etc.
- `--synthesize` to autogenerate missing documentation with ChatGPT/OpenAI

```bash
dbt-osmosis server unregister-project --project-dir /path/to/dbt/project
```
### Commonly Used Flags in YAML Commands

- `--fqn=staging.some_subfolder` to limit to a particular subfolder or results of dbt ls
- `--check` to fail your CI if dbt-osmosis *would* make changes
- `--dry-run` to preview changes without writing them to disk
- `--catalog-path=target/catalog.json` to avoid live queries
- `--disable-introspection` to skip warehouse queries entirely
- `--auto-apply` to skip manual confirmation for file moves

## SQL

These commands provide two unique and interesting ways to interact with dbt models. Both of these commands support stdin as an input source. This allows you to pipe a SQL query into the command or `cat` a dbt model into the command.
These commands let you compile or run SQL snippets (including Jinja) directly:

### Run

This command will run a dbt model and return the results as a JSON object. This command is useful for testing dbt models in a REPL environment.
Runs a SQL statement or a dbt Jinja-based query.

```bash
dbt-osmosis sql run [--project-dir] [--profiles-dir] [--target] "select * from {{ ref('my_model') }}"
dbt-osmosis sql run "select * from {{ ref('my_model') }} limit 50"
```

Returns results in tabular format to stdout. Use `--threads` to run multiple queries in parallel (though typically you’d run one statement at a time).

### Compile

This command will compile a dbt model and return the results as a JSON object. This command is useful for testing dbt models in a REPL environment.
Compiles a SQL statement (including Jinja) but doesn’t run it. Useful for quickly validating macros, refs, or Jinja logic:

```bash
dbt-osmosis sql compile [--project-dir] [--profiles-dir] [--target] "select * from {{ ref('my_model') }}"
dbt-osmosis sql compile "select * from {{ ref('my_model') }}"
```

Prints the compiled SQL to stdout.

## Workbench

This command starts a [streamlit](https://streamlit.io/) workbench. The workbench is a REPL environment that allows you to run dbt models, provides realtime side by side compilation, and lets you explore the results.
Launches a [Streamlit](https://streamlit.io/) application that:

- Lets you explore and run queries against your dbt models in a REPL-like environment
- Provides side-by-side compiled SQL
- Offers real-time iteration on queries

```bash
dbt-osmosis workbench [--project-dir] [--profiles-dir] [--target] [--host] [--port]
dbt-osmosis workbench [--project-dir] [--profiles-dir] [--host] [--port]
```

## Diff

This command will diff a dbt model across git commits. This command is useful for understanding how a model has changed over time. Currently this feature is under development. 🚧
13 changes: 5 additions & 8 deletions docs/docs/tutorial-basics/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,21 +4,18 @@ sidebar_position: 1

# Installation

## Install with pipx

If you will install dbt-osmosis and its dependencies in a virtual environment, and make it available as a command-line tool.
## Install with uv

```bash
pipx install dbt-osmosis
pipx inject dbt-osmosis dbt-<adapter>
uv tool install --with="dbt-<adapter>~=1.9.0" dbt-osmosis
```

## Install with pip
This will install `dbt-osmosis` and its dependencies in a virtual environment, and make it available as a command-line tool via `dbt-osmosis`. You can also use `uvx` like in the intro to run it directly in a more ephemeral way.

This will install dbt-osmosis and its dependencies in your global Python environment or your active virtual environment.
## Install with pip

```bash
pip install dbt-osmosis dbt-<adapter>
```

Naturally, you can also install dbt-osmosis using your favorite package manager. `dbt-osmosis` is available on [PyPI](https://pypi.org/project/dbt-osmosis/).
(This installs `dbt-osmosis` into your current Python environment.)
77 changes: 25 additions & 52 deletions docs/docs/tutorial-yaml/configuration.md
Original file line number Diff line number Diff line change
@@ -1,87 +1,60 @@
---
sidebar_position: 1
---

# Configuration

## Configuring dbt-osmosis

### Models

dbt-osmosis' primary purpose is to automatically generate and manage YAML files for your dbt models. We opt for explicitness over implicitness. Thus the following configuration is required to even run dbt-osmosis. By specifying this configuration at the top-level beneath your project key, you are specifying a default configuration for all models in your project. You can override this configuration for individual models by specifying the `+dbt-osmosis` configuration at various levels of the hierarchy in the configuration file. These levels match your folder structure exactly.
At minimum, each **folder** (or subfolder) of models in your dbt project must specify the `+dbt-osmosis` directive so that dbt-osmosis knows **where** to create or move the YAML files.

```yaml title="dbt_project.yml"
models:
<your_project_name>:
+dbt-osmosis: <path>
```
- `<your_project_name>` is the name of your dbt project.
- `<path>` is the path to the YAML file that will be generated for the model. This path is **relative to the model's (sql file) directory.**

#### Examples

```yaml title="dbt_project.yml"
models:
your_project_name:
# a default blanket rule
+dbt-osmosis: "_{model}.yml"
+dbt-osmosis: "_{model}.yml" # Default for entire project

staging:
# nest docs in subfolder relative to model
+dbt-osmosis: "schema/{model}.yml"
+dbt-osmosis: "{parent}.yml" # Each subfolder lumps docs by folder name

intermediate:
# separate docs based on materialization
# Example of using node.config or node.tags
+dbt-osmosis: "{node.config[materialized]}/{model}.yml"

marts:
# static paths are perfectly fine!
# A single schema file for all models in 'marts'
+dbt-osmosis: "prod.yml"
```

### Sources

dbt-osmosis can be configured to automatically generate YAML files for your dbt sources. To enable this feature, add the following to your `dbt_project.yml` file.

```yaml title="dbt_project.yml"
vars:
dbt-osmosis:
<source_name>: <path>
<source_name>:
path: <path>
database: <database>
schema: <schema>
_blacklist: <blacklist>
seeds:
<your_project_name>:
+dbt-osmosis: "_schema.yml"
```
- `<source_name>` is the name of a source in your `dbt_project.yml` file.
- `<path>` is the path to the YAML file that will be generated for the source. This path is relative to the root of your dbt project models directory.
- `<database>` is the database that will be used for the source. If not specified, the database will default to the one in your profiles.yml file.
- `<schema>` is the schema that will be used for the source. If not specified, the source name is assumed to be the schema which matches dbt's default behavior.
- `<blacklist>` is the columns to be ignored. You can use regular expressions to specify which columns you'd like to exclude.
### Sources
#### Examples
You can optionally configure dbt-osmosis to manage sources automatically. In your `dbt_project.yml`:

```yaml title="dbt_project.yml"
vars:
dbt-osmosis:
# a source with a different schema
salesforce:
path: "staging/salesforce/source.yml"
schema: "salesforce_v2"
# a source with the same schema as the source name
marketo: "staging/customer/marketo.yml"
sources:
salesforce:
path: "staging/salesforce/source.yml"
schema: "salesforce_v2"
# a special variable interpolated at runtime
jira: "staging/project_mgmt/{parent}.yml"
marketo: "staging/customer/marketo.yml"
jira: "staging/project_mgmt/{parent}.yml"
github: "all_sources/github.yml"
# a dedicated directory for all sources
github: "all_sources/github.yml"
_blacklist:
# Columns matching these patterns will be ignored (like ephemeral system columns)
column_ignore_patterns:
- "_FIVETRAN_SYNCED"
- ".*__key__.namespace"
```

Notice the use of the `{parent}` variable in the `jira` source configuration. This variable is a special variable that will be replaced with the name of the parent directory of the YAML file. The other special variables are `{node}` and `{model}`. We will discuss these variables in more detail in the next section.
**Key points:**

- `vars: dbt-osmosis: sources: <source_name>` sets where the source YAML file should live.
- If the source does not actually exist yet, dbt-osmosis can bootstrap it.
- If you omit `schema`, dbt-osmosis infers it is the same as your source name.
Loading

0 comments on commit 701eb22

Please sign in to comment.