Skip to content

Commit

Permalink
Guide for 'Connecting to databases' (#24062)
Browse files Browse the repository at this point in the history
## Summary & Motivation

New docs guide for "Connecting to Databases"

## Changelog [New | Bug | Docs]

NOCHANGELOG

---------

Co-authored-by: colton <[email protected]>
  • Loading branch information
shalabhc and cmpadden authored Aug 30, 2024
1 parent 833b208 commit f7a5d66
Show file tree
Hide file tree
Showing 4 changed files with 212 additions and 1 deletion.
59 changes: 58 additions & 1 deletion docs/docs-beta/docs/guides/external-systems/databases.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,61 @@
---
title: Connecting to databases
description: How to configure resources to connect to databases
sidebar_position: 10
---
---

In Dagster, *resources* are used to connect to databases by acting as a wrapper around database clients. The resource is registered along with connection details in the `Definitions` object, and can then be referenced from your asset definitions.

## What you'll learn

- How to connect to and query a local DuckDB database using the `DuckDBResource`
- How to connect to different databases in different environments, such as development and production.
- How to connect to a Snowflake database using the `SnowflakeResource`

<details>
<summary>Prerequisites</summary>

To follow the steps in this guide, you'll need:

- Familiarity with [Asset definitions](/concepts/assets)

If you want to run the examples in this guide, you'll need:
- Connection information for a Snowflake database
- To `pip install` the `dagster-duckdb` and `dagster-snowflake` packages

</details>

## Define a DuckDB resource and use it in an asset definition

Here is an example of a DuckDB resource definition that's used to create two tables in the DuckDB database.

<CodeExample filePath="guides/external-systems/resource-duckdb-example.py" language="python" title="DuckDB Resource Example" />

## Define a resource that depends on an environment variable

Resources can be configured using environment variables to connect to environment-specific databases. For example, a resource can connect to a test database in a development environment and a live database in the production environment. You can change the resource definition in the previous example to use an `EnvVar` as shown here:

<CodeExample filePath="guides/external-systems/resource-duckdb-envvar-example.py" language="python" title="DuckDB Resource using EnvVar Example" />

When launching a run, the database path will be read from the `IRIS_DUCKDB_PATH` environment variable.

## Define a Snowflake resource and use it in an asset definition

Using the Snowflake resource is similar to using the DuckDB resource. Here is a complete example showing how to connect to a Snowflake database and create two tables:

<CodeExample filePath="guides/external-systems/resource-snowflake-example.py" language="python" title="Snowflake Resource Example" />

**Note:** before running this example, you will need to set the `SNOWFLAKE_PASSWORKD` environment variable.

## Other database resource types

See [Dagster Integrations](https://dagster.io/integrations) for resource types that connect to other databases. Some other popular resource types are:

* [`BigQueryResource`](https://dagster.io/integrations/dagster-gcp-bigquery)
* [`RedshiftClientResource`](https://dagster.io/integrations/dagster-aws-redshift)

## Next steps

- Explore how to use resources for [Connecting to APIs](/guides/external-systems/apis)
- Go deeper into [Understanding Resources](/concepts/resources)

Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
import pandas as pd
from dagster_duckdb import DuckDBResource

import dagster as dg


# An asset that uses a DuckDb resource called iris_db
@dg.asset
def iris_dataset(iris_db: DuckDBResource) -> None:
iris_df = pd.read_csv(
"https://docs.dagster.io/assets/iris.csv",
names=[
"sepal_length_cm",
"sepal_width_cm",
"petal_length_cm",
"petal_width_cm",
"species",
],
)

with iris_db.get_connection() as conn:
conn.execute("CREATE SCHEMA IF NOT EXISTS iris")
conn.execute("CREATE TABLE iris.iris_dataset AS SELECT * FROM iris_df")


# Another asset that uses the iris_db resource
@dg.asset(deps=[iris_dataset])
def iris_setosa(iris_db: DuckDBResource) -> None:
with iris_db.get_connection() as conn:
conn.execute(
"CREATE TABLE iris.iris_setosa AS SELECT * FROM iris.iris_dataset WHERE"
" species = 'Iris-setosa'"
)


defs = dg.Definitions(
assets=[iris_dataset, iris_setosa],
resources={
# highlight-start
# This defines a DuckDB resource that reads the
# from the environment
"iris_db": DuckDBResource(
database=dg.EnvVar("IRIS_DUCKDB_PATH"),
)
# highlight-end
},
)
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
import pandas as pd
from dagster_duckdb import DuckDBResource

import dagster as dg


# highlight-start
# An asset that uses a DuckDb resource called iris_db
# Note the parameter name `iris_db` must match the resource defined later
@dg.asset
def iris_dataset(iris_db: DuckDBResource) -> None:
# highlight-end
iris_df = pd.read_csv(
"https://docs.dagster.io/assets/iris.csv",
names=[
"sepal_length_cm",
"sepal_width_cm",
"petal_length_cm",
"petal_width_cm",
"species",
],
)

# highlight-start
with iris_db.get_connection() as conn:
conn.execute("CREATE SCHEMA IF NOT EXISTS iris")
conn.execute("CREATE TABLE iris.iris_dataset AS SELECT * FROM iris_df")
# highlight-end


# Another asset that uses the iris_db resource
@dg.asset(deps=[iris_dataset])
def iris_setosa(iris_db: DuckDBResource) -> None:
with iris_db.get_connection() as conn:
conn.execute(
"CREATE TABLE iris.iris_setosa AS SELECT * FROM iris.iris_dataset WHERE"
" species = 'Iris-setosa'"
)


defs = dg.Definitions(
assets=[iris_dataset, iris_setosa],
resources={
# highlight-start
# This defines a DuckDB resource called iris_db
"iris_db": DuckDBResource(
database="/tmp/iris_dataset.duckdb",
)
# highlight-end
},
)
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
import pandas as pd
from dagster_snowflake import SnowflakeResource
from snowflake.connector.pandas_tools import write_pandas

import dagster as dg


# An asset that uses a Snowflake resource called iris_db
# and creates a new table from a Pandas dataframe
@dg.asset
def iris_dataset(iris_db: SnowflakeResource) -> None:
iris_df = pd.read_csv(
"https://docs.dagster.io/assets/iris.csv",
names=[
"sepal_length_cm",
"sepal_width_cm",
"petal_length_cm",
"petal_width_cm",
"species",
],
)

with iris_db.get_connection() as conn:
write_pandas(conn, iris_df, table_name="iris_dataset")


# An asset that uses a Snowflake resource called iris_db
# and creates a new table from an existing table
@dg.asset(deps=[iris_dataset])
def iris_setosa(iris_db: SnowflakeResource) -> None:
with iris_db.get_connection() as conn:
conn.cursor().execute("""
CREATE OR REPALCE TABLE iris_setosa as (
SELECT *
FROM iris.iris_dataset
WHERE species = 'Iris-setosa'
);""")


defs = dg.Definitions(
assets=[iris_dataset, iris_setosa],
resources={
# highlight-start
"iris_db": SnowflakeResource(
# Set the SNOWFLAKE_PASSWORD environment variables before running this code
password=dg.EnvVar("SNOWFLAKE_PASSWORD"),
# Update the following strings to match your snowflake database
warehouse="snowflake_warehouse",
account="snowflake_account",
user="snowflake_user",
database="iris_database",
schema="iris_schema",
)
# highlight-end
},
)

1 comment on commit f7a5d66

@github-actions
Copy link

@github-actions github-actions bot commented on f7a5d66 Aug 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deploy preview for dagster-docs-beta ready!

✅ Preview
https://dagster-docs-beta-95wismed7-elementl.vercel.app
https://dagster-docs-beta.dagster-docs.io

Built with commit f7a5d66.
This pull request is being automatically deployed with vercel-action

Please sign in to comment.