Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add dbt_pca package #350

Merged
merged 1 commit into from
Jan 13, 2025
Merged

add dbt_pca package #350

merged 1 commit into from
Jan 13, 2025

Conversation

dwreeves
Copy link
Contributor

Description

dbt_pca is an easy way to perform principal component analysis (PCA) in SQL using dbt. Example:

{{
  config(
    materialized="table"
  )
}}
select * from {{
  dbt_pca.pca(
    table=ref('collinear_matrix'),
    index='idx',
    columns=['x1', 'x2', 'x3', 'x4', 'x5'],
    ncomp=2
  )
}} as pca
comp col eigenvector eigenvalue coefficient
0 x1 0.23766561884860857 29261.18329671314 40.6548443559801
0 x2 -0.5710963390821377 29261.183296713134 -97.69117169801471
0 x3 -0.5730424984614262 29261.18329671314 -98.02407978560524
0 x4 -0.5375734671620216 29261.18329671314 -91.9567825723166
0 x5 0.0010428157925962138 29261.183296713138 0.17838303220022225
1 x1 -0.0662409860256577 10006.031828705965 -6.626096072806324
1 x2 -0.001674528192609409 10006.031828705967 -0.16750331398380647
1 x3 -0.0027280948128929504 10006.031828705962 -0.27289174588904075
1 x4 -0.022663548107992145 10006.031828705964 -2.2670382209597086
1 x5 0.9975411013143917 10006.031828705964 99.78419058137138

Link to your package's repository:

Checklist

First run experience

  • (Required): The package includes a licence file detectable by GitHub, such as the Apache 2.0 or MIT licence.
  • The package includes a README which explains how to get started with the package and customise its behaviour
  • The README indicates which data warehouses/platforms are expected to work with this package

Customisability

  • The package uses ref or source, instead of hard-coding table references.

Packages for data transformation (delete if not relevant):

  • provide a mechanism (such as variables) to customise the location of source tables.
  • do not assume database/schema names in sources.

Dependencies

Dependencies on dbt Core

  • The package has set a supported require-dbt-version range in dbt_project.yml. Example: A package which depends on functionality added in dbt Core 1.2 should set its require-dbt-version property to [">=1.2.0", "<2.0.0"].

Dependencies on other packages defined in packages.yml:

  • Dependencies are imported from the dbt Package Hub when available, as opposed to a git installation.
  • Dependencies contain the widest possible range of supported versions, to minimise issues in dependency resolution.
  • In particular, dependencies are not pinned to a patch version unless there is a known incompatibility.

Interoperability

  • The package does not override dbt Core behaviour in such a way as to impact other dbt resources (models, tests, etc) not provided by the package.
  • The package uses the cross-database macros built into dbt Core where available, such as {{ dbt.except() }} and {{ dbt.type_string() }}.
  • The package disambiguates its resource names to avoid clashes with nodes that are likely to already exist in a project. For example, packages should not provide a model simply called users.

Versioning

  • (Required): The package's git tags validates against the regex defined in hubcap/version.py (examples).
  • The package's version follows the guidance of Semantic Versioning 2.0.0. (Note in particular the recommendation for production-ready packages to be version 1.0.0 or above)

Copy link
Contributor

@joellabes joellabes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@joellabes joellabes merged commit 9a6d219 into dbt-labs:main Jan 13, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants