Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dbt_eda_tools #324

Merged
merged 2 commits into from
Jul 30, 2024
Merged

Add dbt_eda_tools #324

merged 2 commits into from
Jul 30, 2024

Conversation

shankararul
Copy link
Contributor

@shankararul shankararul commented Jul 26, 2024

Description

A medley of macros that could be handy for your data exploration in DBT. The first iteration implements Get Missing Dates

✅ Get Missing Dates
Finds all the missing dates in a model for the specified dimensions and filters according to the time granularity expected

🚧 Fill Missing Dates (Coming soon)
Fills the missing dates in a model for the specified dimensions and filters according to the time granularity expected

🚧 Show as Percentage (Coming soon)
Shows the value as percentage of the total value for the specified aggregations

🚧 Exploratory data analysis (Coming soon)

Numeric column exploration
Get summary statistics such as Min, Max, Median, Null values, Percentiles, Standard deviation, etc. for numeric columns

Categoric column exploration
Get summary statistics such as Count, Unique values, Null values for categoric columns

Timeseries column exploration
Get summary statistics such as Start date, End date, granularity of the timeseries (day,month,year), null values, missing dates for timeseries columns

Docs site: https://shankararul.github.io/dbt_eda_tools/#!/macro/macro.dbt_eda_tools.get_missing_date
Link to your package's repository: https://github.com/shankararul/dbt_eda_tools

Checklist

This checklist is a cut down version of the best practices that we have identified as the package hub has grown. Although meeting these checklist items is not a prerequisite to being added to the Hub, we have found that packages which don't conform provide a worse user experience.

First run experience

  • (Required): The package includes a licence file detectable by GitHub, such as the Apache 2.0 or MIT licence.
  • The package includes a README which explains how to get started with the package and customise its behaviour
  • The README indicates which data warehouses/platforms are expected to work with this package

Customisability

  • The package uses ref or source, instead of hard-coding table references.

Packages for data transformation (delete if not relevant):

  • provide a mechanism (such as variables) to customise the location of source tables.
  • do not assume database/schema names in sources..

Dependencies

Dependencies on dbt Core

  • The package has set a supported require-dbt-version range in dbt_project.yml. Example: A package which depends on functionality added in dbt Core 1.2 should set its require-dbt-version property to [">=1.2.0", "<2.0.0"].

Dependencies on other packages defined in packages.yml:

  • Dependencies are imported from the dbt Package Hub when available, as opposed to a git installation.
  • Dependencies contain the widest possible range of supported versions, to minimise issues in dependency resolution.
  • In particular, dependencies are not pinned to a patch version unless there is a known incompatibility.

Interoperability

  • The package does not override dbt Core behaviour in such a way as to impact other dbt resources (models, tests, etc) not provided by the package.
  • The package uses the cross-database macros built into dbt Core where available, such as {{ dbt.except() }} and {{ dbt.type_string() }}.
  • The package disambiguates its resource names to avoid clashes with nodes that are likely to already exist in a project. For example, packages should not provide a model simply called users.

Versioning

  • (Required): The package's git tags validates against the regex defined in version.py
  • The package's version follows the guidance of Semantic Versioning 2.0.0. (Note in particular the recommendation for production-ready packages to be version 1.0.0 or above)

Added dbt_utils_medley
@joellabes
Copy link
Contributor

@shankararul this is so cool! My only concern is the overlap of calling it dbt_utils_medley when it's not related to dbt Labs' dbt_utils package. If you tweaked the name to something like dbt_medley or dbt_eda_tools or similar, I'd love to get this merged 🎉

Renamed to dbt_eda_tools
@shankararul shankararul changed the title Add dbt_utils_medley Add dbt_eda_tools Jul 29, 2024
@shankararul
Copy link
Contributor Author

Hey @joellabes thanks for your quick turnaround. Indeed you're right. In order to avoid confusion with dbt_utils, I renamed the package to dbt_eda_tools (thanks for finding the perfect name for this package😍).

@joellabes joellabes merged commit d09e6c3 into dbt-labs:main Jul 30, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants