Skip to content

Commit

Permalink
Merge branch 'current' into dc-jc/update-mesh-guide
Browse files Browse the repository at this point in the history
  • Loading branch information
dave-connors-3 authored Jun 17, 2024
2 parents 8ab039d + 8abb8de commit 12b6ef1
Show file tree
Hide file tree
Showing 16 changed files with 197 additions and 20 deletions.
119 changes: 119 additions & 0 deletions website/blog/2024-06-12-putting-your-dag-on-the-internet.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
---
title: Putting Your DAG on the internet
description: "Use dbt and Snowflake's external access integrations to allow Snowflake Python models access the internet."
slug: dag-on-the-internet

authors: [ernesto_ongaro, sebastian_stan, filip_byrén]

tags: [analytics craft, APIs, data ecosystem]
hide_table_of_contents: false

date: 2024-06-14
is_featured: true
---

**New in dbt: allow Snowflake Python models to access the internet**

With dbt 1.8, dbt released support for Snowflake’s [external access integrations](https://docs.snowflake.com/en/developer-guide/external-network-access/external-network-access-overview) further enabling the use of dbt + AI to enrich your data. This allows querying of external APIs within dbt Python models, a functionality that was required for dbt Cloud customer, [EQT AB](https://eqtgroup.com/). Learn about why they needed it and how they helped build the feature and get it shipped!

<!--truncate-->
## Why did EQT require this functionality?
by Filip Bryén, VP and Software Architect (EQT) and Sebastian Stan, Data Engineer (EQT)

_EQT AB is a global investment organization and as a long-term customer of dbt Cloud, presented at dbt’s Coalesce [2020](https://www.getdbt.com/coalesce-2020/seven-use-cases-for-dbt) and [2023](https://www.youtube.com/watch?v=-9hIUziITtU)._

_Motherbrain Labs is EQT’s bespoke AI team, primarily focused on accelerating our portfolio companies' roadmaps through hands-on data and AI work. Due to the high demand for our time, we are constantly exploring mechanisms for simplifying our processes and increasing our own throughput. Integration of workflow components directly in dbt has been a major efficiency gain and helped us rapidly deliver across a global portfolio._

Motherbrain Labs is focused on creating measurable AI impact in our portfolio. We work hand-in-hand with leadership from our deal teams and portfolio company leadership but our starting approach is always the same: identify which data matters.

While we have access to reams of proprietary information, we believe the greatest effect happens when we combine that information with external datasets like geolocation, demographics, or competitor traction.

These valuable datasets often come from third-party vendors who operate on a pay-per-use model; a single charge for every piece of information we want. To avoid overspending, we focus on enriching only the specific subset of data that is relevant to an individual company's strategic question.

In response to this recurring need, we have partnered with Snowflake and dbt to introduce new functionality that facilitates communication with external endpoints and manages secrets within dbt. This new integration enables us to incorporate enrichment processes directly into our DAGs, similar to how current Python models are utilized within dbt environments. We’ve found that this augmented approach allows us to reduce complexity and enable external communications before materialization.

## An example with Carbon Intensity: How does it work?

In this section, we will demonstrate how to integrate an external API to retrieve the current Carbon Intensity of the UK power grid. The goal is to illustrate how the feature works, and perhaps explore how the scheduling of data transformations at different times can potentially reduce their carbon footprint, making them a greener choice. We will be leveraging the API from the [UK National Grid ESO](https://www.nationalgrideso.com/) to achieve this.

To start, we need to set up a network rule (Snowflake instructions [here](https://docs.snowflake.com/en/user-guide/network-rules)) to allow access to the external API. Specifically, we'll create an egress rule to permit Snowflake to communicate with api.carbonintensity.org.

Next, to access network locations outside of Snowflake, you need to define an external access integration first and reference it within a dbt Python model. You can find an overview of Snowflake's external network access [here](https://docs.snowflake.com/en/developer-guide/external-network-access/external-network-access-overview).

This API is open and if it requires an API key, handle it similarly to managing secrets. More information on API authentication in Snowflake is available [here](https://docs.snowflake.com/en/user-guide/api-authentication).

For simplicity’s sake, we will show how to create them using [pre-hooks](/reference/resource-configs/pre-hook-post-hook) in a model configuration yml file:


```
models:
- name: external_access_sample
config:
pre_hook:
- "create or replace network rule test_network_rule type = host_port mode = egress value_list= ('api.carbonintensity.org.uk:443');"
- "create or replace external access integration test_external_access_integration allowed_network_rules = (test_network_rule) enabled = true;"
```

Then we can simply use the new external_access_integrations configuration parameter to use our network rule within a Python model (called external_access_sample.py):


```
import snowflake.snowpark as snowpark
def model(dbt, session: snowpark.Session):
dbt.config(
materialized="table",
external_access_integrations=["test_external_access_integration"],
packages=["httpx==0.26.0"]
)
import httpx
return session.create_dataframe(
[{"carbon_intensity": httpx.get(url="https://api.carbonintensity.org.uk/intensity").text}]
)
```


The result is a model with some json I can parse, for example, in a SQL model to extract some information:


```
{{
config(
materialized='incremental',
unique_key='dbt_invocation_id'
)
}}
with raw as (
select parse_json(carbon_intensity) as carbon_intensity_json
from {{ ref('external_access_demo') }}
)
select
'{{ invocation_id }}' as dbt_invocation_id,
value:from::TIMESTAMP_NTZ as start_time,
value:to::TIMESTAMP_NTZ as end_time,
value:intensity.actual::NUMBER as actual_intensity,
value:intensity.forecast::NUMBER as forecast_intensity,
value:intensity.index::STRING as intensity_index
from raw,
lateral flatten(input => raw.carbon_intensity_json:data)
```


The result is a model that will keep track of dbt invocations, and the current UK carbon intensity levels.

<Lightbox src="/img/blog/2024-06-12-putting-your-dag-on-the-internet/image1.png" title="Preview in dbt Cloud IDE of output" />

## dbt best practices

This is a very new area to Snowflake and dbt -- something special about SQL and dbt is that it’s very resistant to external entropy. The second we rely on API calls, Python packages and other external dependencies, we open up to a lot more external entropy. APIs will change, break, and your models could fail.

Traditionally dbt is the T in ELT (dbt overview [here](https://docs.getdbt.com/terms/elt)), and this functionality unlocks brand new EL capabilities for which best practices do not yet exist. What’s clear is that EL workloads should be separated from T workloads, perhaps in a different modeling layer. Note also that unless using incremental models, your historical data can easily be deleted. dbt has seen a lot of use cases for this, including this AI example as outlined in this external [engineering blog post](https://klimmy.hashnode.dev/enhancing-your-dbt-project-with-large-language-models).

**A few words about the power of Commercial Open Source Software**

In order to get this functionality shipped quickly, EQT opened a pull request, Snowflake helped with some problems we had with CI and a member of dbt Labs helped write the tests and merge the code in!

dbt now features this functionality in dbt 1.8+ or on “Keep on latest version” option of dbt Cloud (dbt overview [here](/docs/dbt-versions/upgrade-dbt-version-in-cloud#keep-on-latest-version)).

dbt Labs staff and community members would love to chat more about it in the [#db-snowflake](https://getdbt.slack.com/archives/CJN7XRF1B) slack channel.
27 changes: 27 additions & 0 deletions website/blog/authors.yml
Original file line number Diff line number Diff line change
Expand Up @@ -614,3 +614,30 @@ anders_swanson:
links:
- icon: fa-linkedin
url: https://www.linkedin.com/in/andersswanson

ernesto_ongaro:
image_url: /img/blog/authors/ernesto-ongaro.png
job_title: Senior Solutions Architect
name: Ernesto Ongaro
organization: dbt Labs
links:
- icon: fa-linkedin
url: https://www.linkedin.com/in/eongaro

sebastian_stan:
image_url: /img/blog/authors/sebastian-eqt.png
job_title: Data Engineer
name: Sebastian Stan
organization: EQT Group
links:
- icon: fa-linkedin
url: https://www.linkedin.com/in/sebastian-lindblom/

filip_byrén:
image_url: /img/blog/authors/filip-eqt.png
job_title: VP and Software Architect
name: Filip Byrén
organization: EQT Group
links:
- icon: fa-linked
url: https://www.linkedin.com/in/filip-byr%C3%A9n/
2 changes: 1 addition & 1 deletion website/docs/docs/build/documentation.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ dbt provides a way to generate documentation for your dbt project and render it

* [Declaring properties](/reference/configs-and-properties)
* [`dbt docs` command](/reference/commands/cmd-docs)
* [`doc` Jinja function](/reference/dbt-jinja-functions)
* [`doc` Jinja function](/reference/dbt-jinja-functions/doc)
* If you're new to dbt, we recommend that you check out our [quickstart guide](/guides) to build your first dbt project, complete with documentation.

## Assumed knowledge
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,8 @@ The following are the prerequisites for dbt Cloud and Snowflake.

- You have **ACCOUNTADMIN** access in Snowflake.
- Your Snowflake account must have access to the Native App/SPCS integration (PrPr until Summit) and NA/SPCS configurations (PuPr at end of June). If you're unsure, please check with your Snowflake account manager.
- The Snowflake account must be in an AWS Region or Azure region.
- The Snowflake account must be in an AWS Region or Azure region.
- You have access to Snowflake Cortex through your Snowflake permissions and [Snowflake Cortex is available in your region](https://docs.snowflake.com/en/user-guide/snowflake-cortex/llm-functions#availability). Without this, Ask dbt will not work.

## Set up the configuration for Ask dbt

Expand All @@ -50,10 +51,6 @@ Configure dbt Cloud and Snowflake Cortex to power the **Ask dbt** chatbot.

<Lightbox src="/img/docs/cloud-integrations/semantic_layer_configuration.png" width="100%" title="Semantic Layer credentials"/>

1. Identify the default database the environment is connecting to.
1. Select **Deploy > Environments** from the top navigation bar. From the environments list, select the one that was identified in the **Semantic Layer Configuration Details** panel.
1. On the environment's page, click **Settings**. Scroll to the section **Deployment connection**. The listed database is the default for your environment and is also where you will create the schema. Save this information in a temporary location to use later on.

1. In Snowflake, verify that your SL and deployment user has been granted permission to use Snowflake Cortex. For more information, refer to [Required Privileges](https://docs.snowflake.com/en/user-guide/snowflake-cortex/llm-functions#required-privileges) in the Snowflake docs.

By default, all users should have access to Snowflake Cortex. If this is disabled for you, open a Snowflake SQL worksheet and run these statements:
Expand All @@ -67,15 +64,6 @@ Configure dbt Cloud and Snowflake Cortex to power the **Ask dbt** chatbot.

Make sure to replace `SNOWFLAKE.CORTEX_USER`, `DEPLOYMENT_USER`, and `SL_USER` with the appropriate strings for your environment.

1. Create a schema `dbt_sl_llm` in the deployment database. The deployment user needs write access to create the necessary tables in this schema and the SL user needs only read access to it. Open a Snowflake SQL worksheet and run these statements:

```sql
create schema YOUR_DEPLOYMENT_DATABASE.dbt_sl_llm;
grant select on schema dbt_sl_llm to role SL_USER;
```

Make sure to replace `YOUR_DEPLOYMENT_DATABASE` and `SL_USER` with the appropriate strings for your environment.

## Configure dbt Cloud
Collect three pieces of information from dbt Cloud to set up the application.

Expand Down
8 changes: 8 additions & 0 deletions website/docs/docs/cloud/manage-access/auth0-migration.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,14 @@ sidebar: "SSO Auth0 Migration"
description: "Required actions for migrating to Auth0 for SSO services on dbt Cloud."
---

:::note

This migration is a feature of the dbt Cloud Enterprise plan. To learn more about an Enterprise plan, contact us at [[email protected]](mailto::[email protected]).

For single-tenant Virtual Private Cloud, you should [email dbt Cloud Support](mailto::[email protected]) to set up or update your SSO configuration.

:::

dbt Labs is partnering with Auth0 to bring enhanced features to dbt Cloud's single sign-on (SSO) capabilities. Auth0 is an identity and access management (IAM) platform with advanced security features, and it will be leveraged by dbt Cloud. These changes will require some action from customers with SSO configured in dbt Cloud today, and this guide will outline the necessary changes for each environment.

If you have not yet configured SSO in dbt Cloud, refer instead to our setup guides for [SAML](/docs/cloud/manage-access/set-up-sso-saml-2.0), [Okta](/docs/cloud/manage-access/set-up-sso-okta), [Google Workspace](/docs/cloud/manage-access/set-up-sso-google-workspace), or [Microsoft Entra ID (formerly Azure AD)](/docs/cloud/manage-access/set-up-sso-microsoft-entra-id) single sign-on services.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ dbt Labs is committed to providing backward compatibility for all versions 1.x,

:::info Why changes to previous behavior?

This release includes significant new features, and rework to `dbt-core`'s CLI and initialization flow. As part of refactoring its internals, we made a handful of changes to runtime configuration. The net result of these changes is more consistent & practical configuration options, and a more legible codebase.
This release includes significant new features, and rework to `dbt-core`'s CLI and initialization flow. As part of refactoring its internals from [`argparse`](https://docs.python.org/3/library/argparse.html) to [`click`](https://click.palletsprojects.com), we made a handful of changes to runtime configuration. The net result of these changes is more consistent and practical configuration options, and a more legible codebase.

**_Wherever possible, we will provide backward compatibility and deprecation warnings for at least one minor version before actually removing the old functionality._** In those cases, we still reserve the right to fully remove backwards compatibility for deprecated functionality in a future v1.x minor version of `dbt-core`.

Expand Down
34 changes: 34 additions & 0 deletions website/docs/faqs/Git/github-permissions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
---
title: "I'm seeing a 'GitHub and dbt Cloud latest permissions' error"
description: "GitHub and dbt Cloud permissions error"
sidebar_label: "GitHub and dbt Cloud permissions error"
---

If you see the error `This account needs to accept the latest permissions for the dbt Cloud GitHub App` in dbt Cloud &mdash; this usually occurs when the permissions for the dbt Cloud GitHub App are out of date.

To solve this issue, you'll need to update the permissions for the dbt Cloud GitHub App in your GitHub account. Here's a couple of ways you can do it:

#### Update permissions

A Github organization admin will need to update the permissions in GitHub for the dbt Cloud GitHub App. If you're not the admin, reach out to your organization admin to request this. Alternatively, try [disconecting your GitHub account](#disconect-github) in dbt Cloud.

1. Go directly to GitHub to determine if any updated permissions are required.
2. In GitHub, go to your organization **Settings** (or personal if using a non-organization account).
3. Then navigate to **Applications** to identify any necessary permission changes.
For more info on GitHub permissions, refer to [access permissions](https://docs.github.com/en/get-started/learning-about-github/access-permissions-on-github).

#### Disconnect GitHub

Disconnect the GitHub and dbt Cloud integration in dbt Cloud.

1. In dbt Cloud, go to **Account Settings**.
2. In **Projects**, select the project that's experiencing the issue.
3. Click the repository link under **Repository**.
4. In the **Repository details** page, click **Edit**.
5. Click **Disconnect** to remove the GitHub integration.
6. Go back to your **Project details** page and reconnect your repository by clicking the **Configure Repository** link.
7. Configure your repository and click **Save**

<Lightbox src="/img/repository-details-faq.jpg" title="Disconnect your GitHub connection in the 'Repository details' page."/>

If you've tried these workarounds and are still experiencing this behavior &mdash; reach out to the [dbt Support](mailto:[email protected]) team and we'll be happy to help!
4 changes: 2 additions & 2 deletions website/docs/faqs/Git/gitlab-authentication.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ If you're seeing a 'GitLab Authentication is out of date' 500 server error page

No worries - this is a current issue the dbt Labs team is working on and we have a few workarounds for you to try:

### 1st Workaround
#### First workaround

1. Disconnect repo from project in dbt Cloud.
2. Go to Gitlab and click on Settings > Repository.
Expand All @@ -18,7 +18,7 @@ No worries - this is a current issue the dbt Labs team is working on and we have
5. You would then need to check Gitlab to make sure that the new deploy key is added.
6. Once confirmed that it's added, refresh dbt Cloud and try developing once again.

### 2nd Workaround
#### Second workaround

1. Keep repo in project as is -- don't disconnect.
2. Copy the deploy key generated in dbt Cloud.
Expand Down
1 change: 0 additions & 1 deletion website/docs/faqs/Git/run-on-pull.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,3 @@ If it was added via a deploy key method, you'll want to use the [GitHub auth me
To go ahead and enable 'Run on Pull requests', you'll want to remove dbt Cloud from the Apps & Integration on GitHub and re-integrate it again via the GitHub app method.

If you've tried the workaround above and are still experiencing this behavior - reach out to the Support team at [email protected] and we'll be happy to help!

3 changes: 2 additions & 1 deletion website/docs/reference/resource-configs/fabric-configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,4 +101,5 @@ Not supported at this time.

## dbt-utils

Not supported at this time
Not supported at this time. However, dbt-fabric offers some utils macros. Please check out [utils macros](https://github.com/microsoft/dbt-fabric/tree/main/dbt/include/fabric/macros/utils).

Original file line number Diff line number Diff line change
@@ -0,0 +1 @@

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added website/static/img/blog/authors/filip-eqt.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added website/static/img/repository-details-faq.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 12b6ef1

Please sign in to comment.