awsdb-workspace-provisioner

Sample Provisioning Project for AWS Databricks E2 Workspace

Project Structure

dbx_ws_provisioner.py: Controller script to provision a Databricks AWS E2 workspace and its required AWS infrastructure end-to-end in single pass.
dbx_ws_utils.py: Utility interface with primary purpose of interacting with AWS Cloudformation in order to deploy stacks.
dbx_ws_stack_processor.py: Processor interface with primary purpose of processing AWS stack output to get input data for Workspace Accounts APIs.
dbx_ws_accounts_api.py: API interface with primary purpose of creating required objects for a Databricks E2 Workspace.
commons_params.json: A set of common parameters that should be used across the infrastructure components and workspace objects.
cf_templates: Contains cloudformation templates that are used by the provisioning script - to create the necessary networking infra in an existing VPC with existing NAT gateway, to create a restricted IAM role required by Databricks, to create a DBFS root S3 bucket for the workspace, and to create a BYOK KMS key for the workspace notebooks.
cf_template_params: Base parameters for the above cloudformation templates.

Flow of the Script

Create the necessary networking infra in an existing VPC, using Cloudformation
Create the cross-account IAM role required by Databricks, using Cloudformation (it uses some of the output values from first step)
Create the DBFS root S3 bucket for the Databricks workspace, using Cloudformation
Create the BYOK KMS key for the Databricks workspace notebooks, using Cloudformation
Create the Databricks workspace credentials object (using the above IAM role ARN)
Create the Databricks workspace storage config object (using the above S3 bucket name)
Create the Databricks workspace network object (using the references to above networking infra)
Create the Databricks workspace customer managed key object (using the above KMS key ARN and Alias)
Finally create the Databricks workspace (using the references to above credentials, storage configuration, network and customer managed key objects). It waits until the workspace has been provisioned.

Running the Project

Clone the repo
pip install boto3 or conda install boto3 - This should get botocore as well if not there already.
pip install git+git://github.com/abhinavg6/databricks-cli.git - This is a fork synced from main Databricks CLI, and contains the preview E2 account API.
Make sure that the relevant AWS user credentials exist in the home directory at ~/.aws/credentials.
Provide relevant param values for cloudformation templates in *params.json files as per your environment. See this template repo for updated templates.
Provide relevant master parameter values in common_params.json as per your environment.
If you're changing the template structure or using a different template altogether, just make sure that relevant parameters and output values are referenced in the scripts.
Execute as python dbx_ws_provisioner.py

Note: Databricks E2 on AWS is currently a private preview functionality that requires Databricks to create a master account id and whitelist relevant operations in order to create E2 workspaces. Please reach out to your Databricks account team before starting to use this sample solution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

awsdb-workspace-provisioner

Project Structure

Flow of the Script

Running the Project

Files

README.md

Latest commit

History

README.md

File metadata and controls

awsdb-workspace-provisioner

Project Structure

Flow of the Script

Running the Project