Skip to content

Commit

Permalink
style
Browse files Browse the repository at this point in the history
Signed-off-by: Nok <[email protected]>
  • Loading branch information
noklam committed Oct 29, 2024
1 parent db2c0b3 commit 8eec4ba
Showing 1 changed file with 78 additions and 43 deletions.
Original file line number Diff line number Diff line change
@@ -1,10 +1,27 @@
Your new documentation style is generally consistent with the existing Kedro documentation. However, I can suggest a few minor adjustments to improve consistency and readability:

Check warning on line 1 in docs/source/deployment/databricks/databricks_ide_development_workflow.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/deployment/databricks/databricks_ide_development_workflow.md#L1

[Kedro.toowordy] 'However' is too wordy
Raw output
{"message": "[Kedro.toowordy] 'However' is too wordy", "location": {"path": "docs/source/deployment/databricks/databricks_ide_development_workflow.md", "range": {"start": {"line": 1, "column": 93}}}, "severity": "WARNING"}

Check warning on line 1 in docs/source/deployment/databricks/databricks_ide_development_workflow.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/deployment/databricks/databricks_ide_development_workflow.md#L1

[Kedro.pronouns] Avoid first-person singular pronouns such as 'I'.
Raw output
{"message": "[Kedro.pronouns] Avoid first-person singular pronouns such as 'I'.", "location": {"path": "docs/source/deployment/databricks/databricks_ide_development_workflow.md", "range": {"start": {"line": 1, "column": 102}}}, "severity": "WARNING"}

Check warning on line 1 in docs/source/deployment/databricks/databricks_ide_development_workflow.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/deployment/databricks/databricks_ide_development_workflow.md#L1

[Kedro.weaselwords] 'few' is a weasel word!
Raw output
{"message": "[Kedro.weaselwords] 'few' is a weasel word!", "location": {"path": "docs/source/deployment/databricks/databricks_ide_development_workflow.md", "range": {"start": {"line": 1, "column": 118}}}, "severity": "WARNING"}

1. Use of headers: Your use of headers is good, but consider using more level 2 (##) and level 3 (###) headers to break up the content further.

2. Code blocks: You're correctly using triple backticks for code blocks, which is good. Consider adding the language identifier after the opening backticks for syntax highlighting, e.g., ```python for Python code.

Check warning on line 5 in docs/source/deployment/databricks/databricks_ide_development_workflow.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/deployment/databricks/databricks_ide_development_workflow.md#L5

[Kedro.weaselwords] 'correctly' is a weasel word!
Raw output
{"message": "[Kedro.weaselwords] 'correctly' is a weasel word!", "location": {"path": "docs/source/deployment/databricks/databricks_ide_development_workflow.md", "range": {"start": {"line": 5, "column": 24}}}, "severity": "WARNING"}

Check warning on line 5 in docs/source/deployment/databricks/databricks_ide_development_workflow.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/deployment/databricks/databricks_ide_development_workflow.md#L5

[Kedro.abbreviations] Use 'for example' instead of abbreviations like 'e.g.,'.
Raw output
{"message": "[Kedro.abbreviations] Use 'for example' instead of abbreviations like 'e.g.,'.", "location": {"path": "docs/source/deployment/databricks/databricks_ide_development_workflow.md", "range": {"start": {"line": 5, "column": 182}}}, "severity": "WARNING"}

3. Notes: The note block format is correct, but consider using a more consistent style throughout the documentation.

4. Lists: Your use of unordered and ordered lists is appropriate.

5. Links: Internal links are correctly formatted.

Check warning on line 11 in docs/source/deployment/databricks/databricks_ide_development_workflow.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/deployment/databricks/databricks_ide_development_workflow.md#L11

[Kedro.weaselwords] 'correctly' is a weasel word!
Raw output
{"message": "[Kedro.weaselwords] 'correctly' is a weasel word!", "location": {"path": "docs/source/deployment/databricks/databricks_ide_development_workflow.md", "range": {"start": {"line": 11, "column": 30}}}, "severity": "WARNING"}

Here's a slightly revised version of your documentation with these adjustments:

```markdown
# Use Databricks Asset Bundles to deploy a Kedro project

```{note}
The `dbx` package is deprecated by Databricks, and dbx workflow documentation is moved to a [new page](./databricks_dbx_workflow.md).
```

This guide demonstrates a wokrflow for developing Kedro Project on Databricks using Databricks Asset Bundles. You will learn how to develop your project using local environment, then use `kedro-databricks` and Databricks Asset Bundle `to package your code for running pipeline on Databricks.
This guide demonstrates a workflow for developing a Kedro Project on Databricks using Databricks Asset Bundles. You will learn how to develop your project using a local environment, then use `kedro-databricks` and Databricks Asset Bundle to package your code for running pipelines on Databricks.

## Benefits of local development

By working in your local environment, you can take advantage of features within an IDE that are not available on Databricks notebooks:

Expand All @@ -14,73 +31,91 @@ By working in your local environment, you can take advantage of features within

To set up these features, look for instructions specific to your IDE (for instance, [VS Code](https://code.visualstudio.com/docs/python/linting)).

```{note}
If you prefer to develop projects in notebooks rather than in an IDE, you should follow our guide on [how to develop a Kedro project within a Databricks workspace](./databricks_notebooks_development_workflow.md) instead.


```

## What this page covers

The main steps in this tutorial are as follows:

- [Use Databricks Asset Bundles to deploy a Kedro project](#use-databricks-asset-bundles-to-deploy-a-kedro-project)
- [What this page covers](#what-this-page-covers)
- [Prerequisites](#prerequisites)
- [Set up your project](#set-up-your-project)
- [Note your Databricks username and host](#note-your-databricks-username-and-host)
- [Install Kedro and databricks CLI in a new virtual environment](#install-kedro-and-databricks-cli-in-a-new-virtual-environment)
- [Authenticate the Databricks CLI](#authenticate-the-databricks-cli)
- [Create a new Kedro Project](#create-a-new-kedro-project)
- [Create the Datanrickls Asset Bundles using `kedro-databricks`](#create-the-datanrickls-asset-bundles-using-kedro-databricks)
- [Deploy Databricks Job using Databricks Asset Bundles](#deploy-databricks-job-using-databricks-asset-bundles)
- [Run Databricks Job with `databricks` CLI](#run-databricks-job-with-databricks-cli)
- [Prerequisites](#prerequisites)
- [Set up your project](#set-up-your-project)
- [Create the Databricks Asset Bundles](#create-the-databricks-asset-bundles-using-kedro-databricks)
- [Deploy Databricks Job](#deploy-databricks-job-using-databricks-asset-bundles)
- [Run Databricks Job](#how-to-run-the-deployed-job)

## Prerequisites

- An active [Databricks deployment](https://docs.databricks.com/getting-started/index.html).
- A [Databricks cluster](https://docs.databricks.com/clusters/configure.html) configured with a recent version (>= 11.3 is recommended) of the Databricks runtime.
- [Conda installed](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html) on your local machine in order to create a virtual environment with a specific version of Python (>= 3.9 is required). If you have Python >= 3.9 installed, you can use other software to create a virtual environment.
- [Conda installed](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html) on your local machine to create a virtual environment with Python >= 3.9.

## Set up your project

### Note your Databricks username and host
### Install Kedro and databricks CLI in a new virtual environment

### Install Kedro and Databricks CLI in a new virtual environment

### Authenticate the Databricks CLI

### Create a new Kedro Project
### Create the Databricks Asset Bundles using `kedro-databricks`
`kedro-databricks` is a wrapper around `databricks` CLI. It is the simplest way to get started without getting stuck with configuration. To find more about Databricks Asset Bundles, customisation, you can read [What are Databricks Asset Bundles](https://docs.databricks.com/en/dev-tools/bundles/index.html)

Install `kedro-databricks` with:
`pip install kedro-databricks`
Then run this command:
`kedro databricks init`
## Create the Databricks Asset Bundles using `kedro-databricks`

Check warning on line 64 in docs/source/deployment/databricks/databricks_ide_development_workflow.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/deployment/databricks/databricks_ide_development_workflow.md#L64

[Kedro.headings] 'Create the Databricks Asset Bundles using ****************' should use sentence-style capitalization.
Raw output
{"message": "[Kedro.headings] 'Create the Databricks Asset Bundles using ****************' should use sentence-style capitalization.", "location": {"path": "docs/source/deployment/databricks/databricks_ide_development_workflow.md", "range": {"start": {"line": 64, "column": 1}}}, "severity": "WARNING"}

`kedro-databricks` is a wrapper around the `databricks` CLI. It's the simplest way to get started without getting stuck with configuration. To learn more about Databricks Asset Bundles and customization, read [What are Databricks Asset Bundles](https://docs.databricks.com/en/dev-tools/bundles/index.html).

Check warning on line 66 in docs/source/deployment/databricks/databricks_ide_development_workflow.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/deployment/databricks/databricks_ide_development_workflow.md#L66

[Kedro.ukspelling] In general, use UK English spelling instead of 'customization'.
Raw output
{"message": "[Kedro.ukspelling] In general, use UK English spelling instead of 'customization'.", "location": {"path": "docs/source/deployment/databricks/databricks_ide_development_workflow.md", "range": {"start": {"line": 66, "column": 190}}}, "severity": "WARNING"}

1. Install `kedro-databricks`:

This will generate a `databricks.yml` sitting inside the `conf` folder. By default it sets the resource, i.e. cluster type you need. Optionally, you can override these configurations.
```bash
pip install kedro-databricks
```

2. Initialize the Databricks configuration:

Check warning on line 74 in docs/source/deployment/databricks/databricks_ide_development_workflow.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/deployment/databricks/databricks_ide_development_workflow.md#L74

[Kedro.ukspelling] In general, use UK English spelling instead of 'Initialize'.
Raw output
{"message": "[Kedro.ukspelling] In general, use UK English spelling instead of 'Initialize'.", "location": {"path": "docs/source/deployment/databricks/databricks_ide_development_workflow.md", "range": {"start": {"line": 74, "column": 4}}}, "severity": "WARNING"}

To create Databricks Asset Bundles, run:
`kedro databricks bundle`
```bash
kedro databricks init
```

If `conf/databricks.yml` exist, it will read the configuration and override the corresponding keys. This generates the Databricks job configuration inside a `resource` folder.
This generates a `databricks.yml` file in the `conf` folder, which sets the default cluster type. You can override these configurations if needed.

### Deploy Databricks Job using Databricks Asset Bundles
Once you have all the resource generated, run this command to deploy Databriks Asset Bundles to Databricks:
`kedro databricks deploy`
3. Create Databricks Asset Bundles:

```bash
kedro databricks bundle
```

You should see something similar:
> Uploading databrick_iris-0.1-py3-none-any.whl...
> Uploading bundle files to /Workspace/Users/xxxxxxx.com/.bundle/databrick_iris/local/files...
> Deploying resources...
> Updating deployment state...
> Deployment complete!
This command reads the configuration from `conf/databricks.yml` (if it exists) and generates the Databricks job configuration inside a `resource` folder.

## Deploy Databricks Job using Databricks Asset Bundles

Check warning on line 90 in docs/source/deployment/databricks/databricks_ide_development_workflow.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/deployment/databricks/databricks_ide_development_workflow.md#L90

[Kedro.headings] 'Deploy Databricks Job using Databricks Asset Bundles' should use sentence-style capitalization.
Raw output
{"message": "[Kedro.headings] 'Deploy Databricks Job using Databricks Asset Bundles' should use sentence-style capitalization.", "location": {"path": "docs/source/deployment/databricks/databricks_ide_development_workflow.md", "range": {"start": {"line": 90, "column": 4}}}, "severity": "WARNING"}

Once you have all the resources generated, deploy the Databricks Asset Bundles to Databricks:

```bash
kedro databricks deploy
```

Once you see the `Deployment complete` log, you can now run your pipelines on Databricks!
You should see output similar to:

Check warning on line 98 in docs/source/deployment/databricks/databricks_ide_development_workflow.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/deployment/databricks/databricks_ide_development_workflow.md#L98

[Kedro.toowordy] 'similar to' is too wordy
Raw output
{"message": "[Kedro.toowordy] 'similar to' is too wordy", "location": {"path": "docs/source/deployment/databricks/databricks_ide_development_workflow.md", "range": {"start": {"line": 98, "column": 23}}}, "severity": "WARNING"}

```
Uploading databrick_iris-0.1-py3-none-any.whl...
Uploading bundle files to /Workspace/Users/xxxxxxx.com/.bundle/databrick_iris/local/files...
Deploying resources...
Updating deployment state...
Deployment complete!
```

## How to run the Deployed job?

#### How to run the Deployed job?
There are two options to run Databricks Jobs:
1. Use the `databricks` CLI
2. Use Databricks UI
#### Run Databricks Job with `databricks` CLI
`databricks bundle run`

### Run Databricks Job with `databricks` CLI

Check warning on line 112 in docs/source/deployment/databricks/databricks_ide_development_workflow.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/deployment/databricks/databricks_ide_development_workflow.md#L112

[Kedro.headings] 'Run Databricks Job with ********** CLI' should use sentence-style capitalization.
Raw output
{"message": "[Kedro.headings] 'Run Databricks Job with ********** CLI' should use sentence-style capitalization.", "location": {"path": "docs/source/deployment/databricks/databricks_ide_development_workflow.md", "range": {"start": {"line": 112, "column": 1}}}, "severity": "WARNING"}

#### Run Databricks Job with Databricks UI
[add images later]
```bash
databricks bundle run
```

### Run Databricks Job with Databricks UI

Check warning on line 118 in docs/source/deployment/databricks/databricks_ide_development_workflow.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/deployment/databricks/databricks_ide_development_workflow.md#L118

[Kedro.headings] 'Run Databricks Job with Databricks UI' should use sentence-style capitalization.
Raw output
{"message": "[Kedro.headings] 'Run Databricks Job with Databricks UI' should use sentence-style capitalization.", "location": {"path": "docs/source/deployment/databricks/databricks_ide_development_workflow.md", "range": {"start": {"line": 118, "column": 5}}}, "severity": "WARNING"}

[add images later]
```

0 comments on commit 8eec4ba

Please sign in to comment.