Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Package best practices doc #139

Merged
merged 8 commits into from
Aug 3, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,4 @@ Assorted constraints:
### Adding your package to hubcap
Currently, only packages hosted on a GitHub repo are supported.

To add your package, open a PR that adds your repository to [hub.json](hub.json).
To add your package, open a PR that adds your repository to [hub.json](hub.json). A dbt Labs team member will review your PR and provide a cursory check of your new package against [best practices](package-best-practices.md).
joellabes marked this conversation as resolved.
Show resolved Hide resolved
45 changes: 45 additions & 0 deletions package-best-practices.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Best practices when submitting a package for public use

Packages are a key part of the dbt experience. Practitioners and vendors alike can [contribute to the knowledge loop](https://github.com/dbt-labs/corp/blob/main/values.md#we-contribute-to-the-knowledge-loop) by enabling compelling, batteries-included use cases.
joellabes marked this conversation as resolved.
Show resolved Hide resolved

There are hundreds of packages on the dbt Package Hub already, and over time we have identified a handful of common mistakes that new package authors make as they take a self-contained package and prepare it for publication. We have listed these below to help you and your users get the best experience.

Although the dbt Labs team completes a cursory review of new packages before adding them to the Package Hub, ultimately you are responsible for your package's performance.

To avoid ambiguity, the bullets below are pretty formal, but we are a friendly bunch! If you need help preparing a package for deployment to the dbt Package Hub, reach out in [#package-ecosystem](https://getdbt.slack.com/archives/CU4MRJ7QB/) in the [dbt Community Slack](https://getdbt.com/community).
joellabes marked this conversation as resolved.
Show resolved Hide resolved

---

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119](https://datatracker.ietf.org/doc/html/rfc2119).

### First run experience
- Packages SHOULD contain a README file which explains how to get started with the package and customise its behaviour.
joellabes marked this conversation as resolved.
Show resolved Hide resolved
- The documentation SHOULD indicate which data warehouses are expected to work with this package.

### Customisability
- Packages MUST NOT hard-code table references, and MUST use `ref` or `source` instead.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking commentary.

The current text definitely conveys the spirit of the guidance, but an overly lawyerly maintainer could hard-code a view reference and claim full compliance.

Two opportunities for further clarity:

  • table → relation
  • use applicable adapter methods rather than directly utilizing describe, show, or information_schema

I'm fuzzy on the detailed guidance to accomplish the latter, so that's why I don't have an explicit suggestion that covers both opportunities.

Our glossary doesn't include an entry "relation" currently, it only has the following disparate entries:

The closest documentation I could find quickly focuses on the definition and usage of the Relation Python object:

- Packages MAY provide a mechanism (such as variables) to enable/disable certain subsets of functionality.
- Packages for data transformation:
- SHOULD provide a mechanism (such as variables) to customise the location of source tables.
- SHOULD NOT assume database/schema names in sources.
- MAY assume table names, particularly if the package was built to support tables created by a known tool.

### Dependencies
#### Dependencies on dbt Core
- Packages requiring a minimum version of `dbt-core` MUST declare it in the `require-dbt-version` property of `dbt_project.yml`.
- Packages requiring a minimum version of `dbt-core` SHOULD allow all subsequent minor and patch releases of that major version.
- For example, a package which depends on functionality added in dbt Core 1.2 SHOULD set its `require-dbt-version` property to `[">=1.2.0", "<2.0.0"]`.
#### Dependencies on other packages defined in `packages.yml`:
- Packages SHOULD import their dependencies from the dbt Package Hub when available, as opposed to a `git` installation.
- Packages SHOULD specify the widest possible range of supported versions, to minimise issues in dependency resolution.
- In particular, packages SHOULD NOT pin to a patch version of their imported package unless they are aware of an incompatibility.
### Interoperability
- Packages MUST NOT override dbt Core behaviour in such a way as to impact other dbt resources (models, tests, etc) not provided by the package.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtcohen6 I wrote this thinking about the aftermath of that Fivetran package that overrode freshness collection and had unfortunate side effects.

But, if someone like this community member turned their gist into a package, that would be an acceptable overriding of behaviour yeah? Does this rule need a bit of wordsmithing or do we just drop it down to REALLY SHOULD NOT per RFC 6919?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for REALLY SHOULD NOT

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REALLY SHOULD NOT doesn't actually exist 😢 (discussed above) - should've made clearer that this was a joke

- Packages SHOULD use the cross-database macros built into dbt Core where available, such as `{{ except() }}` and `{{ type_string() }}`.
- Packages SHOULD disambiguate their resource names to avoid clashes with nodes that are likely to already exist in a project.
- For example, packages SHOULD NOT provide a model simply called `users`.

### Releases and updates
- Packages' git tags MUST validate against the regex defined in [version.py](/hubcap/version.py).
- Packages SHOULD follow the guidance of [Semantic Versioning 2.0.0](https://semver.org/spec/v2.0.0.html).
- Note in particular the recommendation for production-ready packages to be version `1.0.0` or above