diff --git a/website/docs/terms/elt.md b/website/docs/terms/elt.md deleted file mode 100644 index 0e7d11bf7dd..00000000000 --- a/website/docs/terms/elt.md +++ /dev/null @@ -1,139 +0,0 @@ ---- -id: elt -title: What is ELT (Extract, Load, Transform)? -description: ELT is the process of first extraction data from different sources, then loading it into a data warehouse, and finally transforming it. -displayText: ELT -hoverSnippet: Extract, Load, Transform (ELT) is the process of first extracting data from different data sources, loading it into a target data warehouse, and finally transforming it. ---- - - What is ELT (Extract, Load, Transform)? How does it differ from ETL? - -Extract, Load, Transform (ELT) is the process of first extracting data from different data sources, then loading it into a target , and finally transforming it. - -ELT has emerged as a paradigm for how to manage information flows in a modern data warehouse. This represents a fundamental shift from how data previously was handled when Extract, Transform, Load (ETL) was the data workflow most companies implemented. - -Transitioning from ETL to ELT means that you no longer have to capture your transformations during the initial loading of the data into your data warehouse. Rather, you are able to load all of your data, then build transformations on top of it. Data teams report that the ELT workflow has several advantages over the traditional ETL workflow which we’ll go over [in-depth later in this glossary](#benefits-of-elt). - -## How ELT works - -In an ELT process, data is extracted from data sources, loaded into a target data platform, and finally transformed for analytics use. We’ll go over the three components (extract, load, transform) in detail here. - -![Diagram depicting the ELT workflow. Data is depicted being extracted from example data sources like an Email CRM, Facebook Ads platform, Backend databases, and Netsuite. The data is then loaded as raw data into a data warehouse. From there, the data is transformed within the warehouse by renaming, casting, joining, or enriching the raw data. The result is then modeled data inside your data warehouse.](/img/docs/terms/elt/elt-diagram.png) - -### Extract - -In the extraction process, data is extracted from multiple data sources. The data extracted is, for the most part, data that teams eventually want to use for analytics work. Some examples of data sources can include: - -- Backend application databases -- Marketing platforms -- Email and sales CRMs -- and more! - -Accessing these data sources using Application Programming Interface (API) calls can be a challenge for individuals and teams who don't have the technical expertise or resources to create their own scripts and automated processes. However, the recent development of certain open-source and Software as a Service (SaaS) products has removed the need for this custom development work. By establishing the option to create and manage pipelines in an automated way, you can extract the data from data sources and load it into data warehouses via a user interface. - -Since not every data source will integrate with SaaS tools for extraction and loading, it’s sometimes inevitable that teams will write custom ingestion scripts in addition to their SaaS tools. - -### Load - -During the loading stage, data that was extracted is loaded into the target data warehouse. Some examples of modern data warehouses include Snowflake, Amazon Redshift, and Google BigQuery. Examples of other data storage platforms include data lakes such as Databricks’s Data Lakes. Most of the SaaS applications that extract data from your data sources will also load it into your target data warehouse. Custom or in-house extraction and load processes usually require strong data engineering and technical skills. - -At this point in the ELT process, the data is mostly unchanged from its point of extraction. If you use an extraction and loading tool like Fivetran, there may have been some light normalization on your data. But for all intents and purposes, the data loaded into your data warehouse at this stage is in its raw format. - -### Transform - -In the final transformation step, the raw data that has been loaded into your data warehouse is finally ready for modeling! When you first look at this data, you may notice a few things about it… - -- Column names may or may not be clear -- Some columns are potentially the incorrect data type -- Tables are not joined to other tables -- Timestamps may be in the incorrect timezone for your reporting -- fields may need to be unnested -- Tables may be missing primary keys -- And more! - -...hence the need for transformation! During the transformation process, data from your data sources is usually: - -- **Lightly Transformed**: Fields are cast correctly, timestamp fields’ timezones are made uniform, tables and fields are renamed appropriately, and more. -- **Heavily Transformed**: Business logic is added, appropriate materializations are established, data is joined together, etc. -- **QA’d**: Data is tested according to business standards. In this step, data teams may ensure primary keys are unique, model relations match-up, column values are appropriate, and more. - -Common ways to transform your data include leveraging modern technologies such as dbt, writing custom SQL scripts that are automated by a scheduler, utilizing stored procedures, and more. - -## ELT vs ETL - -The primary difference between the traditional ETL and the modern ELT workflow is when [data transformation](https://www.getdbt.com/analytics-engineering/transformation/) and loading take place. In ETL workflows, data extracted from data sources is transformed prior to being loaded into target data platforms. Newer ELT workflows have data being transformed after being loaded into the data platform of choice. Why is this such a big deal? - -| | ELT | ETL | -|---|---|---| -| Programming skills required| Often little to no code to extract and load data into your data warehouse. | Often requires custom scripts or considerable data engineering lift to extract and transform data prior to load. | -| Separation of concerns | Extraction, load, and transformation layers can be explicitly separated out by different products. | ETL processes are often encapsulated in one product. | -| Distribution of transformations | Since transformations take place last, there is greater flexibility in the modeling process. Worry first about getting your data in one place, then you have time to explore the data to understand the best way to transform it. | Because transformation occurs before data is loaded into the target location, teams must conduct thorough work prior to make sure data is transformed properly. Heavy transformations often take place downstream in the BI layer. | -| [Data team distribution](https://www.getdbt.com/data-teams/analytics-job-descriptions/) | ELT workflows empower data team members who know SQL to create their own extraction and loading pipelines and transformations. | ETL workflows often require teams with greater technical skill to create and maintain pipelines. | - -Why has ELT adoption grown so quickly in recent years? A few reasons: - -- **The abundance of cheap cloud storage with modern data warehouses.** The creation of modern data warehouses such Redshift and Snowflake has made it so teams of all sizes can store and scale their data at a more efficient cost. This was a huge enabler for the ELT workflow. -- **The development of low-code or no-code data extractors and loaders.** Products that require little technical expertise such as Fivetran and Stitch, which can extract data from many data sources and load it into many different data warehouses, have helped lower the barrier of entry to the ELT workflow. Data teams can now relieve some of the data engineering lift needed to extract data and create complex transformations. -- **A true code-based, version-controlled transformation layer with the development of dbt.** Prior to the development of dbt, there was no singular transformation layer product. dbt helps data analysts apply software engineering best practices (version control, CI/CD, and testing) to data transformation, ultimately allowing for anyone who knows SQL to be a part of the ELT process. -- **Increased compatibility between ELT layers and technology in recent years.** With the expansion of extraction, loading, and transformation layers that integrate closely together and with cloud storage, the ELT workflow has never been more accessible. For example, Fivetran creates and maintains [dbt packages](https://hub.getdbt.com/) to help write dbt transformations for the data sources they connect to. - -## Benefits of ELT - -You often hear about the benefits of the ELT workflow to data, but you can sometimes forget to talk about the benefits it brings to people. There are a variety of benefits that this workflow brings to the actual data (which we’ll outline in detail below), such as the ability to recreate historical transformations, test data and data models, and more. We'll also want to use this section to emphasize the empowerment the ELT workflow brings to both data team members and business stakeholders. - -### ELT benefit #1: Data as code - -Ok we said it earlier: The ELT workflow allows data teams to function like software engineers. But what does this really mean? How does it actually impact your data? - -#### Analytics code can now follow the same best practices as software code - -At its core, data transformations that occur last in a data pipeline allow for code-based and version-controlled transformations. These two factors alone permit data team members to: - -- Easily recreate historical transformations by rolling back commits -- Establish code-based tests -- Implement CI/CD workflows -- Document data models like typical software code. - -#### Scaling, made sustainable - -As your business grows, the number of data sources correspondingly increases along with it. As such, so do the number of transformations and models needed for your business. Managing a high number of transformations without version control or automation is not scalable. - -The ELT workflow capitalizes on transformations occurring last to provide flexibility and software engineering best practices to data transformation. Instead of having to worry about how your extraction scripts scale as your data increases, data can be extracted and loaded automatically with a few clicks. - -### ELT benefit #2: Bring the power to the people - -The ELT workflow opens up a world of opportunity for the people that work on that data, not just the data itself. - -#### Empowers data team members - -Data analysts, analytics engineers, and even data scientists no longer have to be dependent on data engineers to create custom pipelines and models. Instead, they can use point-and-click products such as Fivetran and Airbyte to extract and load the data for them. - -Having the transformation as the final step in the ELT workflow also allows data folks to leverage their understanding of the data and SQL to focus more on actually modeling the data. - -#### Promotes greater transparency for end busines users - -Data teams can expose the version-controlled code used to transform data for analytics to end business users by no longer having transformations hidden in the ETL process. Instead of having to manually respond to the common question, “How is this data generated?” data folks can direct business users to documentation and repositories. Having end business users involved or viewing the data transformations promote greater collaboration and awareness between business and data folks. - -## ELT tools - -As mentioned earlier, the recent development of certain technologies and products has helped lower the barrier of entry to implementing the ELT workflow. Most of these new products act as one or two parts of the ELT process, but some have crossover across all three parts. We’ll outline some of the current tools in the ELT ecosystem below. - -| Product | E/L/T? | Description | Open source option? | -|---|---|---|---| -| Fivetran/HVR | E, some T, L | Fivetran is a SaaS company that helps data teams extract, load, and perform some transformation on their data. Fivetran easily integrates with modern data warehouses and dbt. They also offer transformations that leverage dbt Core. | :x: | -| Stitch by Talend | E, L | Stitch (part of Talend) is another SaaS product that has many data connectors to extract data and load it into data warehouses. | :x: | -| Airbyte | E, L | Airbyte is an open-source and cloud service that allows teams to create data extraction and load pipelines. | :white_check_mark: | -| Funnel | E, some T, L | Funnel is another product that can extract and load data. Funnel’s data connectors are primarily focused around marketing data sources. | :x: | -| dbt | T | dbt is the transformation tool that enables data analysts and engineers to transform, test, and document data in the cloud data warehouse. dbt offers both an open-source and cloud-based product. | :white_check_mark: | - -## Conclusion - -The past few years have been a whirlwind for the data world. The increased accessibility and affordability of cloud warehouses, no-code data extractors and loaders, and a true transformation layer with dbt has allowed for the ELT workflow to become the preferred analytics workflow. ETL predates ELT and differs in when data is transformed. In both processes, data is first extracted from different sources. However, in ELT processes, data is loaded into the target data platform and then transformed. The ELT workflow ultimately allows for data team members to extract, load, and model their own data in a flexible, accessible, and scalable way. - -## Further reading - -Here's some of our favorite content about the ELT workflow: - -- [The case for the ELT workflow](https://www.getdbt.com/analytics-engineering/case-for-elt-workflow/) -- [A love letter to ETL tools](https://www.getdbt.com/analytics-engineering/etl-tools-a-love-letter/) -- [What is dbt?](https://getdbt.com/product/what-is-dbt/) diff --git a/website/docs/terms/etl.md b/website/docs/terms/etl.md deleted file mode 100644 index 321f59a65d0..00000000000 --- a/website/docs/terms/etl.md +++ /dev/null @@ -1,130 +0,0 @@ ---- -id: etl -title: What is ETL (Extract, Transform, Load)? -description: ETL is the process of first extracting data from a data source, transforming it, and then loading it into a target data warehouse. -displayText: ETL -hoverSnippet: Extract, Transform, Load (ETL) is the process of first extracting data from a data source, transforming it, and then loading it into a target data warehouse. ---- - - - What is ETL (Extract, Transform, Load)? How has it evolved? - - -ETL, or “Extract, Transform, Load”, is the process of first extracting data from a data source, transforming it, and then loading it into a target . In ETL workflows, much of the meaningful [data transformation](https://www.getdbt.com/analytics-engineering/transformation/) occurs outside this primary pipeline in a downstream business intelligence (BI) platform. - -ETL is contrasted with the newer (Extract, Load, Transform) workflow, where transformation occurs after data has been loaded into the target data warehouse. In many ways, the ETL workflow could have been renamed the ETLT workflow, because a considerable portion of meaningful data transformations happen outside the data pipeline. The same transformations can occur in both ETL and ELT workflows, the primary difference is *when* (inside or outside the primary ETL workflow) and *where* the data is transformed (ETL platform/BI tool/data warehouse). - -It’s important to talk about ETL and understand how it works, where it provides value, and how it can hold people back. If you don’t talk about the benefits and drawbacks of systems, how can you expect to improve them? - -## How ETL works - -In an ETL process, data is first extracted from a source, transformed, and then loaded into a target data platform. We’ll go into greater depth for all three steps below. - -![A diagram depicting the ETL workflow. The diagram starts by depicting raw data being extracted from various example data sources like an email CRM, Facebook Ads platform, a backend database, and Netsuite. Once the data is extracted, the raw data is transformed within the data pipeline via renaming, casting, joining, and enriching. After the data is transformed within the data pipeline, the modeled data is loaded into a data warehouse.](/img/docs/terms/etl/etl-diagram.png) - -### Extract - -In this first step, data is extracted from different data sources. Data that is extracted at this stage is likely going to be eventually used by end business users to make decisions. Some examples of these data sources include: - -- Ad platforms (Facebook Ads, Google Ads, etc.) -- Backend application databases -- Sales CRMs -- And more! - -To actually get this data, data engineers may write custom scripts that make Application Programming Interface (API) calls to extract all the relevant data. Because making and automating these API calls gets harder as data sources and data volume grows, this method of extraction often requires strong technical skills. In addition, these extraction scripts also involve considerable maintenance since APIs change relatively often. Data engineers are often incredibly competent at using different programming languages such as Python and Java. Data teams can also extract from these data sources with open source and Software as a Service (SaaS) products. - -### Transform - -At this stage, the raw data that has been extracted is normalized and modeled. In ETL workflows, much of the actual meaningful business logic, metric calculations, and entity joins tend to happen further down in a downstream BI platform. As a result, the transformation stage here is focused on data cleanup and normalization – renaming of columns, correct casting of fields, timestamp conversions. - -To actually transform the data, there’s two primary methods teams will use: - -- **Custom solutions**: In this solution, data teams (typically data engineers on the team), will write custom scripts and create automated pipelines to transform the data. Unlike ELT transformations that typically use SQL for modeling, ETL transformations are often written in other programming languages such as Python or Scala. Data engineers may leverage technologies such as Apache Spark or Hadoop at this point to help process large volumes of data. -- **ETL products**: There are ETL products that will extract, transform, and load your data in one platform. [These tools](#etl-tools) often involve little to no code and instead use Graphical User Interfaces (GUI) to create pipelines and transformations. - -### Load - -In the final stage, the transformed data is loaded into your target data warehouse. Once this transformed data is in its final destination, it’s most commonly exposed to end business users either in a BI tool or in the data warehouse directly. - -The ETL workflow implies that your raw data does not live in your data warehouse. *Because transformations occur before load, only transformed data lives in your data warehouse in the ETL process.* This can make it harder to ensure that transformations are performing the correct functionality. - -## How ETL is being used - -While ELT adoption is growing, we still see ETL use cases for processing large volumes of data and adhering to strong data governance principles. - -### ETL to efficiently normalize large volumes of data - -ETL can be an efficient way to perform simple normalizations across large data sets. Doing these lighter transformations across a large volume of data during loading can help get the data formatted properly and quickly for downstream use. In addition, end business users sometimes need quick access to raw or somewhat normalized data. Through an ETL workflow, data teams can conduct lightweight transformations on data sources and quickly expose them in their target data warehouse and downstream BI tool. - -### ETL for hashing PII prior to load - -Some companies will want to mask, hash, or remove PII values before it enters their data warehouse. In an ETL workflow, teams can transform PII to hashed values or remove them completely during the loading process. This limits where PII is available or accessible in an organization’s data warehouse. - -## ETL challenges - -There are reasons ETL has persisted as a workflow for over twenty years. However, there are also reasons why there’s been such immense innovation in this part of the data world in the past decade. From our perspective, the technical and human limitations we describe below are some of the reasons ELT has surpassed ETL as the preferred workflow. - -### ETL challenge #1: Technical limitations - -**Limited or lack of version control** - -When transformations exist as standalone scripts or deeply woven in ETL products, it can be hard to version control the transformations. Not having version control on transformation as code means that data teams can’t easily recreate or rollback historical transformations and perform code reviews. - -**Immense amount of business logic living in BI tools** - -Some teams with ETL workflows only implement much of their business logic in their BI platform versus earlier in their transformation phase. While most organizations have some business logic in their BI tools, an excess of this logic downstream can make rendering data in the BI tool incredibly slow and potentially hard to track if the code in the BI tool is not version controlled or exposed in documentation. - -**Challenging QA processes** - -While data quality testing can be done in ETL processes, not having the raw data living somewhere in the data warehouse inevitably makes it harder to ensure data models are performing the correct functionality. In addition, quality control continually gets harder as the number of data sources and pipelines within your system grows. - -### ETL challenge #2: Human limitations - -**Data analysts can be excluded from ETL work** - -Because ETL workflows often involve incredibly technical processes, they've restricted data analysts from being involved in the data workflow process. One of the greatest strengths of data analysts is their knowledge of the data and SQL, and when extractions and transformations involve unfamiliar code or applications, they and their expertise can be left out of the process. Data analysts and scientists also become dependent on other people to create the schemas, tables, and datasets they need for their work. - -**Business users are kept in the dark** - -Transformations and business logic can often be buried deep in custom scripts, ETL tools, and BI platforms. At the end of the day, this can hurt business users: They're kept out of the data modeling process and have limited views into how data transformation takes place. As a result, end business users often have little clarity on data definition, quality, and freshness, which ultimately can decrease trust in the data and data team. - -## ETL vs ELT - -You may read other articles or technical documents that use ETL and ELT interchangeably. On paper, the only difference is the order in which the T and the L appear. However, this mere switching of letters dramatically changes the way data exists in and flows through a business’ system. - -In both processes, data from different data sources is extracted in similar ways. However, in ELT, data is then directly loaded into the target data platform versus being transformed in ETL. Now, via ELT workflows, both raw and transformed data can live in a data warehouse. In ELT workflows, data folks have the flexibility to model the data after they’ve had the opportunity to explore and analyze the raw data. ETL workflows can be more constraining since transformations happen immediately after extraction. We break down some of the other major differences between the two below: - -| | ELT | ETL | -|---|---|---| -| Programming skills required | Often requires little to no code to extract and load data into your data warehouse. | Often requires custom scripts or considerable data engineering lift to extract and transform data prior to load. | -| Separation of concerns | Extraction, load, and transformation layers can be explicitly separated out by different products. | ETL processes are often encapsulated in one product. | -| Distribution of transformations | Since transformations take place last, there is greater flexibility in the modeling process. Worry first about getting your data in one place, then you have time to explore the data to understand the best way to transform it. | Because transformation occurs before data is loaded into the target location, teams must conduct thorough work prior to make sure data is transformed properly. Heavy transformations often take place downstream in the BI layer. | -| [Data team roles](https://www.getdbt.com/data-teams/analytics-job-descriptions/) | ELT workflows empower data team members who know SQL to create their own extraction and loading pipelines and transformations. | ETL workflows often require teams with greater technical skill to create and maintain pipelines. | - -While ELT is growing in adoption, it’s still important to talk about when ETL might be appropriate and where you'll see challenges with the ETL workflow. - -## ETL tools - -There exists a variety of ETL technologies to help teams get data into their data warehouse. A good portion of ETL tools on the market today are geared toward enterprise businesses and teams, but there are some that are also applicable for smaller organizations. - -| Platform | E/T/L? | Description | Open source option? | -|---|---|---|---| -| Informatica | E, T, L | An all-purpose ETL platform that supports low or no-code extraction, transformations and loading. Informatica also offers a broad suite of data management solutions beyond ETL and is often leveraged by enterprise organizations. | :x: | -| Integrate.io | E, T, L | A newer ETL product focused on both low-code ETL as well as reverse ETL pipelines. | :x: | -| Matillion | E, T, L | Matillion is an end-to-end ETL solution with a variety of native data connectors and GUI-based transformations. | :x: | -| Microsoft SISS | E, T, L | Microsoft’s SQL Server Integration Services (SISS) offers a robust, GUI-based platform for ETL services. SISS is often used by larger enterprise teams. | :x: | -| Talend Open Studio | E, T, L | An open source suite of GUI-based ETL tools. | :white_check_mark: | - -## Conclusion - -ETL, or “Extract, Transform, Load,” is the process of extracting data from different data sources, transforming it, and loading that transformed data into a data warehouse. ETL typically supports lighter transformations during the phase prior to loading and more meaningful transformations to take place in downstream BI tools. We’re seeing now that ETL is fading out and the newer ELT workflow is replacing it as a practice for many data teams. However, it’s important to note that ETL allowed us to get us to where we are today: Capable of building workflows that extract data within simple UIs, store data in scalable cloud data warehouses, and write data transformations like software engineers. - -## Further Reading - -Please check out some of our favorites reads regarding ETL and ELT below: - -- [Glossary: ELT](https://docs.getdbt.com/terms/elt) -- [The case for the ELT workflow](https://www.getdbt.com/analytics-engineering/case-for-elt-workflow/) -- [A love letter to ETL tools](https://www.getdbt.com/analytics-engineering/etl-tools-a-love-letter/) -- [Reverse ETL](https://www.getdbt.com/analytics-engineering/use-cases/operational-analytics/) - diff --git a/website/docs/terms/reverse-etl.md b/website/docs/terms/reverse-etl.md deleted file mode 100644 index a3ccd0b0f70..00000000000 --- a/website/docs/terms/reverse-etl.md +++ /dev/null @@ -1,94 +0,0 @@ ---- -id: reverse-etl -title: Reverse ETL -description: Reverse ETL is the process of getting your transformed data stored in your data warehouse to end business platforms, such as sales CRMs and ad platforms. -displayText: reverse ETL -hoverSnippet: Reverse ETL is the process of getting your transformed data stored in your data warehouse to end business platforms, such as sales CRMs and ad platforms. ---- - - - Reverse ETL, demystified: What it is in plain english - - -Reverse ETL is the process of getting your transformed data stored in your data warehouse to end business platforms, such as sales CRMs and ad platforms. Once in an end platform, that data is often used to drive meaningful business actions, such as creating custom audiences in ad platforms, personalizing email campaigns, or supplementing data in a sales CRM. You may also hear about reverse ETL referred to as operational analytics or data activation. - -Reverse ETL efforts typically happen after data teams have set up their [modern data stack](https://www.getdbt.com/blog/future-of-the-modern-data-stack/) and ultimately have a consistent and automated way to extract, load, and transform data. Data teams are also often responsible for setting up the pipelines to send down data to business platforms, and business users are typically responsible for *using the data* once it gets to their end platform. - -Ultimately, reverse ETL is a way to put data where the work is already happening, support self-service efforts, and help business users derive real action out of their data. - -## How reverse ETL works - -In the reverse ETL process, transformed data is synced from a data warehouse to external tools in order to be leveraged by different business teams. - -![A diagram depicting how the reverse ETL process works. It starts with data being extract from data sources like email CRMs, Facebook Ad platforms, backend databases, and NetSuite. The raw data is then loaded into a data warehouse. After loading, the data is transformed and modeled. The modeled data is then loaded directly back into the tools that created the data, like Email CRMs, Facebook Ad platforms, and others so the insights are more accessible to business users.](/img/docs/terms/reverse-etl/reverse-etl-diagram.png) - -The power of reverse ETL comes from sending down *already transformed data* to business platforms. Raw data, while beautiful in its own way, typically lacks the structure, aggregations, and aliasing to be useful for end business users off the bat. After data teams transform data for business use in pipelines, typically to expose in an end business intelligence (BI) tool, they can also send this cleaned and meaningful data to other platforms where business users can derive value using [reverse ETL tools](#reverse-etl-tools). - -Data teams can choose to write additional transformations that may need to happen for end business tools in reverse ETL tools themselves or by creating [additional models in dbt](https://getdbt.com/open-source-data-culture/reverse-etl-playbook/). - -## Why use reverse ETL? - -There’s a few reasons why your team may want to consider using reverse ETL: - -### Putting data where the work is happening - -While most data teams would love it if business users spent a significant portion of their time in their BI tool, that’s neither practical nor necessarily the most efficient use of their time. In the real world, many business users will spend some time in a BI tool, identify the data that could be useful in a platform they spend a significant amount of time in, and work with the data team to get that data where they need it. Users feel comfortable and confident in the systems they use everyday—why not put the data in the places that allow them to thrive? - -### Manipulating data to fit end platform requirements - -Reverse ETL helps you to put data your business users need *in the format their end tool expects*. Oftentimes, end platforms expect data fields to be named or cast in a certain way. Instead of business users having to manually input those values in the correct format, you can transform your data using a product like dbt or directly in a reverse ETL tool itself, and sync down that data in an automated way. - -### Supporting self-service efforts - -By sending down data-team approved data in reverse ETL pipelines, your business users have the flexibility to use that data however they see fit. Soon, your business users will be making audiences, testing personalization efforts, and running their end platform like a well-oiled, data-powered machine. - - -## Reverse ETL use cases - -Just as there are almost endless opportunities with data, there are many potential different use cases for reverse ETL. We won’t go into every possible option, but we’ll cover some of the common use cases that exist for reverse ETL efforts. - -### Personalization - -Reverse ETL allows business users to access data that they normally would only have access to in a BI tool *in the platforms they use every day*. As a result, business users can now use this data to personalize how they create ads, send emails, and communicate with customers. - -Personalization was all the hype a few years ago and now, you rarely ever see an email come into your inbox without some sort of personalization in-place. Data teams using reverse ETL are able to pass down important customer information, such as location, customer lifetime value (CLV), tenure, and other fields, that can be used to create personalized emails, establish appropriate messaging, and segment email flows. All we can say: the possibilities for personalization powered by reverse ETL are endless. - -### Sophisticated paid marketing initiatives - -At the end of the day, businesses want to serve the right ads to the right people (and at the right cost). A common use case for reverse ETL is for teams to use their customer data to create audiences in ad platforms to either serve specific audiences or create lookalikes. While ad platforms have gotten increasingly sophisticated with their algorithms to identify high-value audiences, it usually never hurts to try supplementing those audiences with your own data to create sophisticated audiences or lookalikes. - -### Self-service analytics culture - -We hinted at it earlier, but reverse ETL efforts can be an effective way to promote a self-service analytics culture. When data teams put the data where business users need it, business users can confidently access it on their own, driving even faster insights and action. Instead of requesting a data pull from a data team member, they can find the data they need directly within the platform that they use. Reverse ETL allows business users to act on metrics that have already been built out and validated by data teams without creating ad-hoc requests. - -### “Real-time” data - -It would be amiss if we didn’t mention reverse ETL and the notion of “real-time” data. While you can have the debate over the meaningfulness and true value-add of real-time data another time, reverse ETL can be a mechanism to bring data to end business platforms in a more “real-time” way. - -Data teams can set up syncs in reverse ETL tools at higher cadences, allowing business users to have the data they need, faster. Obviously, there’s some cost-benefit analysis on how often you want to be loading data via [ETL tools](https://www.getdbt.com/analytics-engineering/etl-tools-a-love-letter/) and hitting your data warehouse, but reverse ETL can help move data into external tools at a quicker cadence if deemed necessary. - -All this to say: move with caution in the realm of “real-time”, understand your stakeholders’ wants and decision-making process for real-time data, and work towards a solution that’s both practical and impactful. - -## Reverse ETL tools - -Reverse ETL tools typically establish the connection between your data warehouse and end business tools, offer an interface to create additional transformations or audiences, and support automation of downstream syncs. Below are some examples of tools that support reverse ETL pipelines. - -| Tool | Description | Open source option? | -|:---:|:---:|:---:| -| Hightouch | A platform to sync data models and create custom audiences for downstream business platforms. | :x: | -| Polytomic | A unified sync platform for syncing to and from data warehouses (ETL and Reverse ETL), databases, business apps, APIs, and spreadsheets. | :x: | -| Census | Another reverse ETL tool that can sync data from your data warehouse to your go-to-market tools. | :x: | -| Rudderstack | Also a CDP (customer data platform), Rudderstack additionally supports pushing down data and audience to external tools, such as ad platforms and email CRMs. | :white_check_mark: | -| Grouparoo | Grouparoo, part of Airbyte, is an open source framework to move data from data warehouses to different cloud-based tools. | :white_check_mark: | - -## Conclusion - -Reverse ETL enables you to sync your transformed data stored in your data warehouse to external platforms often used by marketing, sales, and product teams. It allows you to leverage your data in a whole new way. Reverse ETL pipelines can support personalization efforts, sophisticated paid marketing initiatives, and ultimately offer new ways to leverage your data. In doing this, it creates a self-service analytics culture where stakeholders can receive the data they need in, in the places they need, in an automated way. - -## Further reading - -If you’re interested learning more about reverse ETL and the impact it could have on your team, check out the following: - -- [How dbt Labs’s data team approaches reverse ETL](https://getdbt.com/open-source-data-culture/reverse-etl-playbook/) -- [The operational data warehouse in action: Reverse ETL, CDPs, and the future of data activation](https://www.getdbt.com/coalesce-2021/operational-data-warehouse-reverse-etl-cdp-data-activation/) -- [The analytics engineering guide: Operational analytics](https://www.getdbt.com/analytics-engineering/use-cases/operational-analytics/) diff --git a/website/vercel.json b/website/vercel.json index 993ff9065bd..d22d7a37b43 100644 --- a/website/vercel.json +++ b/website/vercel.json @@ -3631,13 +3631,28 @@ "destination": "https://www.getdbt.com/blog/guide-to-surrogate-key", "permanent": true }, + { + "source": "/terms/elt", + "destination": "https://www.getdbt.com/blog/extract-load-transform", + "permanent": true + }, + { + "source": "/terms/etl", + "destination": "https://www.getdbt.com/blog/extract-transform-load", + "permanent": true + }, + { + "source": "/terms/reverse-etl", + "destination": "https://www.getdbt.com/blog/reverse-etl-playbook", + "permanent": true + }, { "source": "/glossary", "destination": "https://www.getdbt.com/blog", "permanent": true }, { - "source": "/terms/:path((?!elt|etl|reverse-etl).*)", + "source": "/terms/:path*", "destination": "https://www.getdbt.com/blog", "permanent": true }