Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add databricks asset bundles docs #4265

Merged
merged 27 commits into from
Nov 26, 2024
Merged
Changes from 1 commit
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
2d98d6d
rename docs
noklam Oct 28, 2024
02aa9c0
add new docs
noklam Oct 28, 2024
7e31b88
add redirection
noklam Oct 29, 2024
f1e36f2
indeX
noklam Oct 29, 2024
055da3f
add scaffold
noklam Oct 29, 2024
db2c0b3
update
noklam Oct 29, 2024
2733196
style
noklam Oct 29, 2024
6e78ef6
Merge branch 'main' into noklam/databricks-asset-bundles-docs
noklam Oct 31, 2024
a62fa8f
add index page
noklam Nov 12, 2024
1d592b8
spelling
noklam Nov 12, 2024
7abbe21
move DAB to beginning
noklam Nov 12, 2024
73c900d
add back the instruction:
noklam Nov 12, 2024
3d572f0
Merge branch 'main' into noklam/databricks-asset-bundles-docs
noklam Nov 12, 2024
4026db3
dbx gone for good
noklam Nov 13, 2024
e8c13ef
add more image for running jobs and add existing cluster section
noklam Nov 15, 2024
3017317
langauge
noklam Nov 15, 2024
5c74e27
Merge branch 'main' into noklam/databricks-asset-bundles-docs
noklam Nov 15, 2024
2272014
Merge branch 'main' into noklam/databricks-asset-bundles-docs
noklam Nov 19, 2024
342e179
Update docs/source/deployment/databricks/databricks_ide_development_w…
noklam Nov 21, 2024
5931668
Update docs/source/deployment/databricks/databricks_ide_development_w…
noklam Nov 21, 2024
cbb8f56
address review comments
noklam Nov 26, 2024
b9c111c
Merge branch 'main' into noklam/databricks-asset-bundles-docs
noklam Nov 26, 2024
053d321
rename
noklam Nov 26, 2024
58c1ef3
fic index
noklam Nov 26, 2024
e24d8eb
fix reference
noklam Nov 26, 2024
d751c79
update release noteS
noklam Nov 26, 2024
3ac0299
Merge branch 'main' into noklam/databricks-asset-bundles-docs
noklam Nov 26, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
address review comments
Signed-off-by: Nok <nok.lam.chan@quantumblack.com>
noklam committed Nov 26, 2024
commit cbb8f56251e72516bb6b089045e9f081453f27de
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Use Databricks Asset Bundles to deploy a Kedro project
# Use an IDE and Databricks Asset Bundles to deploy a Kedro project

Check warning on line 1 in docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md

GitHub Actions / vale

[vale] docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md#L1

[Kedro.headings] 'Use an IDE and Databricks Asset Bundles to deploy a Kedro project' should use sentence-style capitalization.
Raw output
{"message": "[Kedro.headings] 'Use an IDE and Databricks Asset Bundles to deploy a Kedro project' should use sentence-style capitalization.", "location": {"path": "docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md", "range": {"start": {"line": 1, "column": 3}}}, "severity": "WARNING"}

```{note}
The `dbx` package is deprecated by Databricks, and dbx workflow documentation is moved to a [new page](./databricks_dbx_workflow.md).
The `dbx` package was deprecated by Databricks, and dbx workflow documentation is moved to a [new page](./databricks_dbx_workflow.md).
```

This guide demonstrates a workflow for developing a Kedro Project on Databricks using Databricks Asset Bundles. You will learn how to develop your project using a local environment, then use `kedro-databricks` and Databricks Asset Bundle to package your code for running pipelines on Databricks. To learn more about Databricks Asset Bundles and customization, read [What are Databricks Asset Bundles](https://docs.databricks.com/en/dev-tools/bundles/index.html).
This guide demonstrates a workflow for developing a Kedro Project on Databricks using Databricks Asset Bundles. You will learn how to develop your project using a local environment, then use `kedro-databricks` and Databricks Asset Bundle to package your code for running pipelines on Databricks. To learn more about Databricks Asset Bundles and customisation, read [What are Databricks Asset Bundles](https://docs.databricks.com/en/dev-tools/bundles/index.html).

## Benefits of local development

@@ -12,7 +12,7 @@

- Auto-completion and suggestions for code, improving your development speed and accuracy.
- Linters like [Ruff](https://docs.astral.sh/ruff) can be integrated to catch potential issues in your code.
- Static type checkers like Mypy can check types in your code, helping to identify potential type-related issues early in the development process.

Check warning on line 15 in docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md

GitHub Actions / vale

[vale] docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md#L15

[Kedro.Spellings] Did you really mean 'Mypy'?
Raw output
{"message": "[Kedro.Spellings] Did you really mean 'Mypy'?", "location": {"path": "docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md", "range": {"start": {"line": 15, "column": 29}}}, "severity": "WARNING"}

To set up these features, look for instructions specific to your IDE (for instance, [VS Code](https://code.visualstudio.com/docs/python/linting)).

@@ -39,9 +39,9 @@
## Set up your project

### Note your Databricks username and host
Note your Databricks **username** and **host** as you will need it for the remainder of this guide.

Check warning on line 42 in docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md

GitHub Actions / vale

[vale] docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md#L42

[Kedro.toowordy] 'remainder' is too wordy
Raw output
{"message": "[Kedro.toowordy] 'remainder' is too wordy", "location": {"path": "docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md", "range": {"start": {"line": 42, "column": 76}}}, "severity": "WARNING"}

Find your Databricks username in the top right of the workspace UI and the host in the browser's URL bar, up to the first slash (e.g., `https://adb-123456789123456.1.azuredatabricks.net/`):

Check warning on line 44 in docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md

GitHub Actions / vale

[vale] docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md#L44

[Kedro.abbreviations] Use 'for example' instead of abbreviations like 'e.g.,'.
Raw output
{"message": "[Kedro.abbreviations] Use 'for example' instead of abbreviations like 'e.g.,'.", "location": {"path": "docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md", "range": {"start": {"line": 44, "column": 130}}}, "severity": "WARNING"}

![Find Databricks host and username](../../meta/images/find_databricks_host_and_username.png)

@@ -62,7 +62,7 @@
```


### Authenticate the Databricks CLI

Check warning on line 65 in docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md

GitHub Actions / vale

[vale] docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md#L65

[Kedro.headings] 'Authenticate the Databricks CLI' should use sentence-style capitalization.
Raw output
{"message": "[Kedro.headings] 'Authenticate the Databricks CLI' should use sentence-style capitalization.", "location": {"path": "docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md", "range": {"start": {"line": 65, "column": 5}}}, "severity": "WARNING"}
**Now, you must authenticate the Databricks CLI with your Databricks instance.**

[Refer to the Databricks documentation](https://docs.databricks.com/en/dev-tools/cli/authentication.html) for a complete guide on how to authenticate your CLI. The key steps are:
@@ -86,7 +86,7 @@
If you are not using the `databricks-iris` starter to create a Kedro project, **and** you are working with a version of Kedro **earlier than 0.19.0**, then you should [disable file-based logging](https://docs.kedro.org/en/0.18.14/logging/logging.html#disable-file-based-logging) to prevent Kedro from attempting to write to the read-only file system.
```

## Create the Databricks Asset Bundles using `kedro-databricks`

Check warning on line 89 in docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md

GitHub Actions / vale

[vale] docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md#L89

[Kedro.headings] 'Create the Databricks Asset Bundles using ****************' should use sentence-style capitalization.
Raw output
{"message": "[Kedro.headings] 'Create the Databricks Asset Bundles using ****************' should use sentence-style capitalization.", "location": {"path": "docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md", "range": {"start": {"line": 89, "column": 1}}}, "severity": "WARNING"}

`kedro-databricks` is a wrapper around the `databricks` CLI. It's the simplest way to get started without getting stuck with configuration.
1. Install `kedro-databricks`:
@@ -111,14 +111,14 @@

This command reads the configuration from `conf/databricks.yml` (if it exists) and generates the Databricks job configuration inside a `resource` folder.

### Running a Databricks Job Using an Existing Cluster

Check warning on line 114 in docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md

GitHub Actions / vale

[vale] docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md#L114

[Kedro.headings] 'Running a Databricks Job Using an Existing Cluster' should use sentence-style capitalization.
Raw output
{"message": "[Kedro.headings] 'Running a Databricks Job Using an Existing Cluster' should use sentence-style capitalization.", "location": {"path": "docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md", "range": {"start": {"line": 114, "column": 5}}}, "severity": "WARNING"}

By default, Databricks creates a new job cluster for each job. However, there are instances where you might prefer to use an existing cluster, such as:

Check warning on line 116 in docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md

GitHub Actions / vale

[vale] docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md#L116

[Kedro.toowordy] 'However' is too wordy
Raw output
{"message": "[Kedro.toowordy] 'However' is too wordy", "location": {"path": "docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md", "range": {"start": {"line": 116, "column": 64}}}, "severity": "WARNING"}

1. Lack of permissions to create a new cluster.
2. The need for a quick start with an all-purpose cluster.

Check warning on line 119 in docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md

GitHub Actions / vale

[vale] docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md#L119

[Kedro.words] Use '' instead of 'quick'.
Raw output
{"message": "[Kedro.words] Use '' instead of 'quick'.", "location": {"path": "docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md", "range": {"start": {"line": 119, "column": 19}}}, "severity": "WARNING"}

While it is generally [**not recommended** to utilise **all-purpose compute** for running jobs](https://docs.databricks.com/en/jobs/compute.html#should-all-purpose-compute-ever-be-used-for-jobs), it is feasible to configure a Databricks job for testing purposes.

Check warning on line 121 in docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md

GitHub Actions / vale

[vale] docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md#L121

[Kedro.toowordy] 'utilise' is too wordy
Raw output
{"message": "[Kedro.toowordy] 'utilise' is too wordy", "location": {"path": "docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md", "range": {"start": {"line": 121, "column": 47}}}, "severity": "WARNING"}

Check warning on line 121 in docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md

GitHub Actions / vale

[vale] docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md#L121

[Kedro.toowordy] 'feasible' is too wordy
Raw output
{"message": "[Kedro.toowordy] 'feasible' is too wordy", "location": {"path": "docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md", "range": {"start": {"line": 121, "column": 203}}}, "severity": "WARNING"}

To begin, you need to determine the `cluster_id`. Navigate to the `Compute` tab and select the `View JSON` option.

@@ -126,7 +126,7 @@
![Find cluster ID through UI](../../meta/images/databricks_cluster_id1.png)

You will see the cluster configuration in JSON format, copy the `cluster_id`
![cluster_id in the JSON view](../../meta/images/databricks_cluster_id2.png)

Check warning on line 129 in docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md

GitHub Actions / vale

[vale] docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md#L129

[Kedro.Spellings] Did you really mean 'cluster_id'?
Raw output
{"message": "[Kedro.Spellings] Did you really mean 'cluster_id'?", "location": {"path": "docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md", "range": {"start": {"line": 129, "column": 3}}}, "severity": "WARNING"}

Next, update `conf/databricks.yml`
```diff
@@ -140,7 +140,7 @@
```
kedro databricks bundle --overwrite
```
## Deploy Databricks Job using Databricks Asset Bundles

Check warning on line 143 in docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md

GitHub Actions / vale

[vale] docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md#L143

[Kedro.headings] 'Deploy Databricks Job using Databricks Asset Bundles' should use sentence-style capitalization.
Raw output
{"message": "[Kedro.headings] 'Deploy Databricks Job using Databricks Asset Bundles' should use sentence-style capitalization.", "location": {"path": "docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md", "range": {"start": {"line": 143, "column": 4}}}, "severity": "WARNING"}

Once you have all the resources generated, deploy the Databricks Asset Bundles to Databricks:

@@ -148,7 +148,7 @@
kedro databricks deploy
```

You should see output similar to:

Check warning on line 151 in docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md

GitHub Actions / vale

[vale] docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md#L151

[Kedro.toowordy] 'similar to' is too wordy
Raw output
{"message": "[Kedro.toowordy] 'similar to' is too wordy", "location": {"path": "docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md", "range": {"start": {"line": 151, "column": 23}}}, "severity": "WARNING"}

```
Uploading databrick_iris-0.1-py3-none-any.whl...
@@ -162,7 +162,7 @@

There are two options to run Databricks Jobs:

### Run Databricks Job with `databricks` CLI

Check warning on line 165 in docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md

GitHub Actions / vale

[vale] docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md#L165

[Kedro.headings] 'Run Databricks Job with ********** CLI' should use sentence-style capitalization.
Raw output
{"message": "[Kedro.headings] 'Run Databricks Job with ********** CLI' should use sentence-style capitalization.", "location": {"path": "docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md", "range": {"start": {"line": 165, "column": 1}}}, "severity": "WARNING"}

```bash
databricks bundle run
@@ -181,7 +181,6 @@

Copy that URL into your browser or go to the `Jobs Run` UI to see the run status.

### Run Databricks Job with Databricks UI

Check warning on line 184 in docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md

GitHub Actions / vale

[vale] docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md#L184

[Kedro.headings] 'Run Databricks Job with Databricks UI' should use sentence-style capitalization.
Raw output
{"message": "[Kedro.headings] 'Run Databricks Job with Databricks UI' should use sentence-style capitalization.", "location": {"path": "docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md", "range": {"start": {"line": 184, "column": 5}}}, "severity": "WARNING"}
Alternatively, you can go to the `Workflow` tab and select the desired job to run directly:

Check warning on line 185 in docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md

GitHub Actions / vale

[vale] docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md#L185

[Kedro.weaselwords] 'Alternatively' is a weasel word!
Raw output
{"message": "[Kedro.weaselwords] 'Alternatively' is a weasel word!", "location": {"path": "docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md", "range": {"start": {"line": 185, "column": 1}}}, "severity": "WARNING"}

Check warning on line 185 in docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md

GitHub Actions / vale

[vale] docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md#L185

[Kedro.toowordy] 'Alternatively' is too wordy
Raw output
{"message": "[Kedro.toowordy] 'Alternatively' is too wordy", "location": {"path": "docs/source/deployment/databricks/databricks_ide_databricks_asset_budnels_workflow.md", "range": {"start": {"line": 185, "column": 1}}}, "severity": "WARNING"}
![alt text](../../meta/images/databricks-job-run.png)
```
![Run deployed Databricks Job with Databricks UI](../../meta/images/databricks-job-run.png)