From 025ce5a1196401d646773739a8f9ec27af2f4303 Mon Sep 17 00:00:00 2001 From: Jason Ng Date: Mon, 30 Oct 2023 15:47:42 -0700 Subject: [PATCH 1/5] add databricks-workspace module --- databricks-workspace-e2/CHANGELOG.md | 69 ++++++ databricks-workspace-e2/README.md | 116 +++++++++ databricks-workspace-e2/aws_iam_role.tf | 282 ++++++++++++++++++++++ databricks-workspace-e2/bucket.tf | 33 +++ databricks-workspace-e2/main.tf | 48 ++++ databricks-workspace-e2/outputs.tf | 14 ++ databricks-workspace-e2/security_group.tf | 30 +++ databricks-workspace-e2/variables.tf | 64 +++++ databricks-workspace-e2/versions.tf | 11 + 9 files changed, 667 insertions(+) create mode 100644 databricks-workspace-e2/CHANGELOG.md create mode 100644 databricks-workspace-e2/README.md create mode 100644 databricks-workspace-e2/aws_iam_role.tf create mode 100644 databricks-workspace-e2/bucket.tf create mode 100644 databricks-workspace-e2/main.tf create mode 100644 databricks-workspace-e2/outputs.tf create mode 100644 databricks-workspace-e2/security_group.tf create mode 100644 databricks-workspace-e2/variables.tf create mode 100644 databricks-workspace-e2/versions.tf diff --git a/databricks-workspace-e2/CHANGELOG.md b/databricks-workspace-e2/CHANGELOG.md new file mode 100644 index 00000000..90538c55 --- /dev/null +++ b/databricks-workspace-e2/CHANGELOG.md @@ -0,0 +1,69 @@ +# Changelog + +## [2.4.0](https://github.com/chanzuckerberg/shared-infra/compare/databricks-workspace-e2-v2.3.1...databricks-workspace-e2-v2.4.0) (2023-10-17) + + +### Features + +* CDI-2030 Allow overriding databricks workspace name ([#8547](https://github.com/chanzuckerberg/shared-infra/issues/8547)) ([4ec1c7f](https://github.com/chanzuckerberg/shared-infra/commit/4ec1c7f65a2b4dbe3fa24764099602ab98912773)) + +## [2.3.1](https://github.com/chanzuckerberg/shared-infra/compare/databricks-workspace-e2-v2.3.0...databricks-workspace-e2-v2.3.1) (2023-10-09) + + +### Bug Fixes + +* CDI-2022 - Update all bucket modules ([#8529](https://github.com/chanzuckerberg/shared-infra/issues/8529)) ([bd25e9d](https://github.com/chanzuckerberg/shared-infra/commit/bd25e9d2a61cbcced27f020665ab0b567f1ad485)) + +## [2.3.0](https://github.com/chanzuckerberg/shared-infra/compare/databricks-workspace-e2-v2.2.1...databricks-workspace-e2-v2.3.0) (2023-06-27) + + +### Features + +* CDI-1583: Databricks IAM access to read/write logs ([#7980](https://github.com/chanzuckerberg/shared-infra/issues/7980)) ([081669c](https://github.com/chanzuckerberg/shared-infra/commit/081669c6eb41f047488b7663417b2bdacb189b23)) +* CDI-1603 Set up new databricks cluster log mounts ([#7985](https://github.com/chanzuckerberg/shared-infra/issues/7985)) ([da402a6](https://github.com/chanzuckerberg/shared-infra/commit/da402a67ade93e74ef179937d323b53aa5813f04)) + +## [2.2.1](https://github.com/chanzuckerberg/shared-infra/compare/databricks-workspace-e2-v2.2.0...databricks-workspace-e2-v2.2.1) (2023-06-14) + + +### Bug Fixes + +* databricks-workspace-e2: Bump aws-s3-private-bucket version for … ([#7949](https://github.com/chanzuckerberg/shared-infra/issues/7949)) ([9ac0903](https://github.com/chanzuckerberg/shared-infra/commit/9ac0903dc8360ed9e431326a9a7e0f8bf870d2b5)) + +## [2.2.0](https://github.com/chanzuckerberg/shared-infra/compare/databricks-workspace-e2-v2.1.0...databricks-workspace-e2-v2.2.0) (2023-06-13) + + +### Features + +* Expose bucket object ownership flag for databricks-workspace-e2 ([#7932](https://github.com/chanzuckerberg/shared-infra/issues/7932)) ([d974d0f](https://github.com/chanzuckerberg/shared-infra/commit/d974d0f7458b2daa08898eed0d28ea8bb906da78)) + +## [2.1.0](https://github.com/chanzuckerberg/shared-infra/compare/databricks-workspace-e2-v2.0.0...databricks-workspace-e2-v2.1.0) (2023-06-12) + + +### Features + +* Expose bucket object ownership attribute (via version bump) ([#7928](https://github.com/chanzuckerberg/shared-infra/issues/7928)) ([8fd0f31](https://github.com/chanzuckerberg/shared-infra/commit/8fd0f31217c33baa512777e3c21e232b5ccec3a4)) + +## [2.0.0](https://github.com/chanzuckerberg/shared-infra/compare/databricks-workspace-e2-v1.1.0...databricks-workspace-e2-v2.0.0) (2023-04-28) + + +### ⚠ BREAKING CHANGES + +* k8s-core major version bump (#7726) + +### Features + +* k8s-core major version bump ([#7726](https://github.com/chanzuckerberg/shared-infra/issues/7726)) ([1c44772](https://github.com/chanzuckerberg/shared-infra/commit/1c4477285cf5a26411a73396bb631eea39a67e6b)) + +## [1.1.0](https://github.com/chanzuckerberg/shared-infra/compare/databricks-workspace-e2-v1.0.0...databricks-workspace-e2-v1.1.0) (2023-03-23) + + +### Features + +* bump all shared-infra to 1.3.0 ([#7514](https://github.com/chanzuckerberg/shared-infra/issues/7514)) ([c56e63e](https://github.com/chanzuckerberg/shared-infra/commit/c56e63eac215442570762e62f27bab222f1837cb)) + +## 1.0.0 (2023-01-31) + + +### Features + +* CDI-1019 trigger databricks-workspace-ec2 versioning ([#7136](https://github.com/chanzuckerberg/shared-infra/issues/7136)) ([f7b791d](https://github.com/chanzuckerberg/shared-infra/commit/f7b791d73caf5aaf1febd3aa2bf4488a04d32a37)) diff --git a/databricks-workspace-e2/README.md b/databricks-workspace-e2/README.md new file mode 100644 index 00000000..46c1934d --- /dev/null +++ b/databricks-workspace-e2/README.md @@ -0,0 +1,116 @@ +## Databricks Multi-workspace +1. In `terraform/accounts/databricks-network/databricks.tf`, add a section in `networks` to specify the base CIDR block for the new workspace. Then, to specify the subnets, add a section similar to `meta_workspace_subnets`. +2. In `terraform/accounts/databricks-network/outputs.tf`, add the output for the cidr blocks you just created. +3. To setup this new workspace in TFE, here are the [docs](https://czi.atlassian.net/wiki/spaces/SI/pages/1786741987/Terraform+Enterprise#TerraformEnterprise-Migratinganewworkspaceinapre-configuredrepotoTFE) to do so. +4. After you configure TFE, in `fogg.yml`, for the env you're working in, add a `databricks-workspace` component. It will look like this: +```yaml + databricks-workspace: + backend: + host_name: si.prod.tfe.czi.technology + kind: remote + organization: shared-infra + extra_vars: + databricks_external_id: value-for-external-id +``` +To get the `databricks_external_id`, ask the Databricks point of contact (Taha Syed has been our POC for TLP and Meta: taha@databricks.com). It is [this value](https://databrickslabs.github.io/terraform-provider-databricks/resources/mws_workspaces/). + +5. Run `fogg apply` at the root. +6. In the newly created component, create a file called `cloud-env.tf` and copy what you see in `terraform/envs/meta-prod/databricks-workspace/cloud-env.tf`. Replace the values for `database_subnet_cidrs`, `private_subnet_cidrs`, `public_subnet_cidrs`, and `vpc_cidr` with the output that you created in `outputs.tf`. +7. At minimum, you'll need something like this in `main.tf` +```terraform + module databricks-workspace { + source = "../../../modules/databricks-workspace-e2" + databricks_external_id = var.databricks_external_id + vpc_id = module.aws-env.vpc_id + private_subnets = module.aws-env.private_subnets + project = var.project + env = var.env + service = var.component + owner = var.owner + } +``` +where the `databricks_external_id` is value you specified in the `databricks-workspace` component in `fogg.yml`. + +8. Add a new file called `provider.tf`. To authenticate to Databricks, we use `basic_auth` with environment variables `DATABRICKS_USERNAME` and `DATABRICKS_PASSWORD`. We store these as environment variables in the TFE workspace you're working in. They have to be named exactly those words to have the `basic_auth` pick them up. This is the default provider for Databricks: +```terraform + provider "databricks" { + version = "v0.2.3" + host = "https://accounts.cloud.databricks.com" + basic_auth {} + } +``` +You'll also need this, if you want to authenticate per workspace: +```terraform + provider "databricks" { + alias = "within_workspace" + version = "v0.2.3" + host = module.databricks-workspace.workspace_url + basic_auth {} + } +``` +To get the URL of the deployment, it'll be the `deployment_name` which is `name = "${var.project}-${var.env}-${var.component}"` which is the local in the `databricks-workspace` module appended by `.cloud.databricks.com`. + +## References +* [Here](https://databrickslabs.github.io/terraform-provider-databricks/overview/) is the provider docs. + + +## Requirements + +| Name | Version | +|------|---------| +| [terraform](#requirement\_terraform) | >= 0.13 | + +## Providers + +| Name | Version | +|------|---------| +| [aws](#provider\_aws) | n/a | +| [databricks](#provider\_databricks) | n/a | + +## Modules + +| Name | Source | Version | +|------|--------|---------| +| [databricks\_bucket](#module\_databricks\_bucket) | github.com/chanzuckerberg/cztack//aws-s3-private-bucket | v0.60.1 | + +## Resources + +| Name | Type | +|------|------| +| [aws_iam_role.databricks](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role) | resource | +| [aws_iam_role_policy.policy](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role_policy) | resource | +| [aws_security_group.databricks](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/security_group) | resource | +| [databricks_mws_credentials.databricks](https://registry.terraform.io/providers/databricks/databricks/latest/docs/resources/mws_credentials) | resource | +| [databricks_mws_networks.networking](https://registry.terraform.io/providers/databricks/databricks/latest/docs/resources/mws_networks) | resource | +| [databricks_mws_storage_configurations.databricks](https://registry.terraform.io/providers/databricks/databricks/latest/docs/resources/mws_storage_configurations) | resource | +| [databricks_mws_workspaces.databricks](https://registry.terraform.io/providers/databricks/databricks/latest/docs/resources/mws_workspaces) | resource | +| [aws_caller_identity.current](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/caller_identity) | data source | +| [aws_iam_policy_document.databricks-s3](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source | +| [aws_iam_policy_document.databricks-setup-assume-role](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source | +| [aws_iam_policy_document.policy](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source | +| [aws_region.current](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/region) | data source | + +## Inputs + +| Name | Description | Type | Default | Required | +|------|-------------|------|---------|:--------:| +| [audit\_log\_bucket\_name](#input\_audit\_log\_bucket\_name) | Name of bucket to write cluster logs to - also where the audit logs go, too | `string` | `"czi-audit-logs"` | no | +| [databricks\_external\_id](#input\_databricks\_external\_id) | The ID of a Databricks root account. | `string` | n/a | yes | +| [env](#input\_env) | The environment / stage. Aka staging, dev, prod. | `string` | n/a | yes | +| [object\_ownership](#input\_object\_ownership) | Set default owner of all objects within bucket (e.g., bucket vs. object owner) | `string` | `null` | no | +| [owner](#input\_owner) | n/a | `string` | n/a | yes | +| [passable\_role\_arn](#input\_passable\_role\_arn) | A role to allow the cross-account role to pass to other accounts | `string` | `""` | no | +| [private\_subnets](#input\_private\_subnets) | List of private subnets. | `list(string)` | n/a | yes | +| [project](#input\_project) | A high level name, typically the name of the site. | `string` | n/a | yes | +| [service](#input\_service) | The service. Aka databricks-workspace. | `string` | n/a | yes | +| [vpc\_id](#input\_vpc\_id) | ID of the VPC. | `string` | n/a | yes | +| [workspace\_name\_override](#input\_workspace\_name\_override) | Override the workspace name. If not set, the workspace name will be set to the project, env, and service. | `string` | `null` | no | + +## Outputs + +| Name | Description | +|------|-------------| +| [role\_arn](#output\_role\_arn) | ARN of the AWS IAM role. | +| [workspace\_id](#output\_workspace\_id) | ID of the workspace. | +| [workspace\_url](#output\_workspace\_url) | Url of the deployed workspace. | + diff --git a/databricks-workspace-e2/aws_iam_role.tf b/databricks-workspace-e2/aws_iam_role.tf new file mode 100644 index 00000000..6dd00cfd --- /dev/null +++ b/databricks-workspace-e2/aws_iam_role.tf @@ -0,0 +1,282 @@ +locals { + cluster_log_bucket_prefix = "databricks-cluster-logs" +} + +data "aws_iam_policy_document" "databricks-setup-assume-role" { + statement { + principals { + type = "AWS" + identifiers = ["arn:aws:iam::${local.databricks_aws_account}:root"] + } + + actions = ["sts:AssumeRole"] + condition { + test = "StringLike" + variable = "sts:ExternalId" + values = [var.databricks_external_id] + } + } +} + +resource "aws_iam_role" "databricks" { + name = local.name + assume_role_policy = data.aws_iam_policy_document.databricks-setup-assume-role.json + tags = local.tags +} + +data "aws_iam_policy_document" "policy" { + statement { + sid = "NonResourceBasedPermissions" + actions = [ + "ec2:CancelSpotInstanceRequests", + "ec2:DescribeAvailabilityZones", + "ec2:DescribeIamInstanceProfileAssociations", + "ec2:DescribeInstanceStatus", + "ec2:DescribeInstances", + "ec2:DescribeInternetGateways", + "ec2:DescribeNatGateways", + "ec2:DescribeNetworkAcls", + "ec2:DescribePlacementGroups", + "ec2:DescribePrefixLists", + "ec2:DescribeReservedInstancesOfferings", + "ec2:DescribeRouteTables", + "ec2:DescribeSecurityGroups", + "ec2:DescribeSpotInstanceRequests", + "ec2:DescribeSpotPriceHistory", + "ec2:DescribeSubnets", + "ec2:DescribeVolumes", + "ec2:DescribeVpcAttribute", + "ec2:DescribeVpcs", + "ec2:CreatePlacementGroup", + "ec2:DeletePlacementGroup", + "ec2:CreateKeyPair", + "ec2:DeleteKeyPair", + "ec2:CreateTags", + "ec2:DeleteTags", + "ec2:RequestSpotInstances", + ] + resources = ["*"] + effect = "Allow" + } + + statement { + effect = "Allow" + actions = ["iam:PassRole"] + resources = ["arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/databricks/*"] + } + + dynamic "statement" { + for_each = length(var.passable_role_arn) > 0 ? [1] : [] + + content { + actions = [ + "iam:PassRole" + ] + resources = [ + var.passable_role_arn + ] + } + } + + statement { + sid = "InstancePoolsSupport" + actions = [ + "ec2:AssociateIamInstanceProfile", + "ec2:DisassociateIamInstanceProfile", + "ec2:ReplaceIamInstanceProfileAssociation", + ] + + resources = ["${local.ec2_arn_base}:instance/*"] + + condition { + test = "StringEquals" + variable = "ec2:ResourceTag/Vendor" + values = ["Databricks"] + } + } + + statement { + sid = "AllowEc2RunInstancePerTag" + actions = [ + "ec2:RunInstances", + ] + + resources = [ + "${local.ec2_arn_base}:instance/*", + "${local.ec2_arn_base}:volume/*", + ] + + condition { + test = "StringEquals" + variable = "aws:RequestTag/Vendor" + values = ["Databricks"] + } + } + + statement { + sid = "AllowEc2RunInstanceImagePerTag" + actions = [ + "ec2:RunInstances", + ] + + resources = [ + "${local.ec2_arn_base}:image/*", + ] + + condition { + test = "StringEquals" + variable = "aws:ResourceTag/Vendor" + values = ["Databricks"] + } + } + + statement { + sid = "AllowEc2RunInstancePerVPCid" + actions = [ + "ec2:RunInstances", + ] + + resources = [ + "${local.ec2_arn_base}:network-interface/*", + "${local.ec2_arn_base}:subnet/*", + "${local.ec2_arn_base}:security-group/*", + ] + + condition { + test = "StringEquals" + variable = "ec2:vpc" + values = ["${local.ec2_arn_base}:vpc/${var.vpc_id}"] + } + } + + statement { + sid = "AllowEc2RunInstanceOtherResources" + actions = [ + "ec2:RunInstances", + ] + + not_resources = [ + "${local.ec2_arn_base}:image/*", + "${local.ec2_arn_base}:network-interface/*", + "${local.ec2_arn_base}:subnet/*", + "${local.ec2_arn_base}:security-group/*", + "${local.ec2_arn_base}:volume/*", + "${local.ec2_arn_base}:instance/*" + ] + } + + statement { + sid = "EC2TerminateInstancesTag" + actions = [ + "ec2:TerminateInstances", + ] + + resources = [ + "${local.ec2_arn_base}:instance/*", + ] + + condition { + test = "StringEquals" + variable = "ec2:ResourceTag/Vendor" + values = ["Databricks"] + } + } + + statement { + sid = "EC2AttachDetachVolumeTag" + actions = [ + "ec2:AttachVolume", + "ec2:DetachVolume", + ] + + resources = [ + "${local.ec2_arn_base}:instance/*", + "${local.ec2_arn_base}:volume/*", + ] + + condition { + test = "StringEquals" + variable = "ec2:ResourceTag/Vendor" + values = ["Databricks"] + } + } + + statement { + sid = "EC2CreateVolumeByTag" + actions = [ + "ec2:CreateVolume", + ] + + resources = [ + "${local.ec2_arn_base}:volume/*", + ] + + condition { + test = "StringEquals" + variable = "aws:RequestTag/Vendor" + values = ["Databricks"] + } + } + + statement { + sid = "EC2DeleteVolumeByTag" + actions = [ + "ec2:DeleteVolume", + ] + + resources = [ + "${local.ec2_arn_base}:volume/*", + ] + + condition { + test = "StringEquals" + variable = "ec2:ResourceTag/Vendor" + values = ["Databricks"] + } + } + + statement { + actions = [ + "iam:CreateServiceLinkedRole", + "iam:PutRolePolicy", + ] + + resources = [ + "arn:aws:iam::*:role/aws-service-role/spot.amazonaws.com/AWSServiceRoleForEC2Spot", + ] + + condition { + test = "StringLike" + variable = "iam:AWSServiceName" + values = ["spot.amazonaws.com"] + } + + effect = "Allow" + } + + statement { + sid = "VpcNonresourceSpecificActions" + actions = [ + "ec2:AuthorizeSecurityGroupEgress", + "ec2:AuthorizeSecurityGroupIngress", + "ec2:RevokeSecurityGroupEgress", + "ec2:RevokeSecurityGroupIngress", + ] + + resources = [ + "${local.ec2_arn_base}:security-group/${aws_security_group.databricks.id}", + ] + + condition { + test = "StringEquals" + variable = "ec2:vpc" + values = ["${local.ec2_arn_base}:vpc/${var.vpc_id}"] + } + } +} + +resource "aws_iam_role_policy" "policy" { + name = "extras" + role = aws_iam_role.databricks.id + policy = data.aws_iam_policy_document.policy.json +} diff --git a/databricks-workspace-e2/bucket.tf b/databricks-workspace-e2/bucket.tf new file mode 100644 index 00000000..b2095387 --- /dev/null +++ b/databricks-workspace-e2/bucket.tf @@ -0,0 +1,33 @@ +data "aws_iam_policy_document" "databricks-s3" { + statement { + sid = "grant databricks access" + effect = "Allow" + principals { + type = "AWS" + identifiers = ["arn:aws:iam::${local.databricks_aws_account}:root"] + } + actions = [ + "s3:GetObject", + "s3:GetObjectVersion", + "s3:PutObject", + "s3:DeleteObject", + "s3:ListBucket", + "s3:GetBucketLocation", + ] + resources = [ + "arn:aws:s3:::${local.name}/*", + "arn:aws:s3:::${local.name}", + ] + } +} + +module "databricks_bucket" { + source = "github.com/chanzuckerberg/cztack//aws-s3-private-bucket?ref=v0.60.1" + bucket_name = local.name + bucket_policy = data.aws_iam_policy_document.databricks-s3.json + project = var.project + env = var.env + service = var.service + owner = var.owner + object_ownership = var.object_ownership +} diff --git a/databricks-workspace-e2/main.tf b/databricks-workspace-e2/main.tf new file mode 100644 index 00000000..e0393532 --- /dev/null +++ b/databricks-workspace-e2/main.tf @@ -0,0 +1,48 @@ +// https://docs.databricks.com/administration-guide/multiworkspace/iam-role.html#language-Your%C2%A0VPC,%C2%A0custom +locals { + databricks_aws_account = "414351767826" # Databricks' own AWS account, not CZI's. See https://docs.databricks.com/en/administration-guide/account-settings-e2/credentials.html#step-1-create-a-cross-account-iam-role + ec2_arn_base = "arn:aws:ec2:${data.aws_region.current.name}:${data.aws_caller_identity.current.account_id}" + name = coalesce(var.workspace_name_override, "${var.project}-${var.env}-${var.service}") + security_group_ids = [aws_security_group.databricks.id] + tags = { + project = var.project + env = var.env + service = var.service + owner = var.owner + managedBy = "terraform" + } +} + +data "aws_caller_identity" "current" {} + +data "aws_region" "current" {} + +resource "databricks_mws_networks" "networking" { + account_id = var.databricks_external_id + network_name = local.name + vpc_id = var.vpc_id + subnet_ids = var.private_subnets + security_group_ids = local.security_group_ids +} + +resource "databricks_mws_storage_configurations" "databricks" { + account_id = var.databricks_external_id + storage_configuration_name = local.name + bucket_name = module.databricks_bucket.id +} + +resource "databricks_mws_credentials" "databricks" { + account_id = var.databricks_external_id + credentials_name = local.name + role_arn = aws_iam_role.databricks.arn +} + +resource "databricks_mws_workspaces" "databricks" { + account_id = var.databricks_external_id + workspace_name = local.name + deployment_name = local.name + aws_region = data.aws_region.current.name + credentials_id = databricks_mws_credentials.databricks.credentials_id + storage_configuration_id = databricks_mws_storage_configurations.databricks.storage_configuration_id + network_id = databricks_mws_networks.networking.network_id +} diff --git a/databricks-workspace-e2/outputs.tf b/databricks-workspace-e2/outputs.tf new file mode 100644 index 00000000..fb972c47 --- /dev/null +++ b/databricks-workspace-e2/outputs.tf @@ -0,0 +1,14 @@ +output "workspace_id" { + description = "ID of the workspace." + value = databricks_mws_workspaces.databricks.workspace_id +} + +output "workspace_url" { + description = "Url of the deployed workspace." + value = databricks_mws_workspaces.databricks.workspace_url +} + +output "role_arn" { + description = "ARN of the AWS IAM role." + value = aws_iam_role.databricks.arn +} diff --git a/databricks-workspace-e2/security_group.tf b/databricks-workspace-e2/security_group.tf new file mode 100644 index 00000000..3a2c9cf9 --- /dev/null +++ b/databricks-workspace-e2/security_group.tf @@ -0,0 +1,30 @@ +resource "aws_security_group" "databricks" { + name = local.name + description = "self tcp and udp all ports and all outbound" + vpc_id = var.vpc_id + + ingress { + description = "self tcp all ports" + from_port = 0 + to_port = 65535 + protocol = "tcp" + self = true + } + + ingress { + description = "self udp all ports" + from_port = 0 + to_port = 65535 + protocol = "udp" + self = true + } + + egress { + from_port = 0 + to_port = 0 + protocol = "-1" + cidr_blocks = ["0.0.0.0/0"] + } + + tags = local.tags +} diff --git a/databricks-workspace-e2/variables.tf b/databricks-workspace-e2/variables.tf new file mode 100644 index 00000000..a1cea0fb --- /dev/null +++ b/databricks-workspace-e2/variables.tf @@ -0,0 +1,64 @@ +variable "vpc_id" { + description = "ID of the VPC." + type = string +} + +variable "private_subnets" { + description = "List of private subnets." + type = list(string) +} + +variable "databricks_external_id" { + description = "The ID of a Databricks root account." + type = string +} + +variable "project" { + description = "A high level name, typically the name of the site." + type = string +} + +variable "env" { + description = "The environment / stage. Aka staging, dev, prod." + type = string +} + +variable "service" { + description = "The service. Aka databricks-workspace." + type = string +} + +variable "owner" { + type = string +} + +variable "passable_role_arn" { + description = "A role to allow the cross-account role to pass to other accounts" + type = string + default = "" +} + +# check if argument is null or is in list (2nd parameter of contains() cannot be null) +variable "object_ownership" { + type = string + default = null + description = "Set default owner of all objects within bucket (e.g., bucket vs. object owner)" + + validation { + condition = var.object_ownership == null ? true : contains(["BucketOwnerEnforced", "BucketOwnerPreferred", "ObjectWriter"], var.object_ownership) + error_message = "Valid values for var.object_ownership are ('BucketOwnerEnforced', 'BucketOwnerPreferred', 'ObjectWriter')." + + } +} + +variable "audit_log_bucket_name" { + type = string + default = "czi-audit-logs" + description = "Name of bucket to write cluster logs to - also where the audit logs go, too" +} + +variable "workspace_name_override" { + type = string + default = null + description = "Override the workspace name. If not set, the workspace name will be set to the project, env, and service." +} \ No newline at end of file diff --git a/databricks-workspace-e2/versions.tf b/databricks-workspace-e2/versions.tf new file mode 100644 index 00000000..159e8002 --- /dev/null +++ b/databricks-workspace-e2/versions.tf @@ -0,0 +1,11 @@ +terraform { + required_providers { + aws = { + source = "hashicorp/aws" + } + databricks = { + source = "databricks/databricks" + } + } + required_version = ">= 1.3.0" +} From f1918f504de9372dbff142f3c02e742976b95e12 Mon Sep 17 00:00:00 2001 From: Jason Ng Date: Mon, 30 Oct 2023 16:15:01 -0700 Subject: [PATCH 2/5] remove changelog --- databricks-workspace-e2/CHANGELOG.md | 69 ---------------------------- 1 file changed, 69 deletions(-) delete mode 100644 databricks-workspace-e2/CHANGELOG.md diff --git a/databricks-workspace-e2/CHANGELOG.md b/databricks-workspace-e2/CHANGELOG.md deleted file mode 100644 index 90538c55..00000000 --- a/databricks-workspace-e2/CHANGELOG.md +++ /dev/null @@ -1,69 +0,0 @@ -# Changelog - -## [2.4.0](https://github.com/chanzuckerberg/shared-infra/compare/databricks-workspace-e2-v2.3.1...databricks-workspace-e2-v2.4.0) (2023-10-17) - - -### Features - -* CDI-2030 Allow overriding databricks workspace name ([#8547](https://github.com/chanzuckerberg/shared-infra/issues/8547)) ([4ec1c7f](https://github.com/chanzuckerberg/shared-infra/commit/4ec1c7f65a2b4dbe3fa24764099602ab98912773)) - -## [2.3.1](https://github.com/chanzuckerberg/shared-infra/compare/databricks-workspace-e2-v2.3.0...databricks-workspace-e2-v2.3.1) (2023-10-09) - - -### Bug Fixes - -* CDI-2022 - Update all bucket modules ([#8529](https://github.com/chanzuckerberg/shared-infra/issues/8529)) ([bd25e9d](https://github.com/chanzuckerberg/shared-infra/commit/bd25e9d2a61cbcced27f020665ab0b567f1ad485)) - -## [2.3.0](https://github.com/chanzuckerberg/shared-infra/compare/databricks-workspace-e2-v2.2.1...databricks-workspace-e2-v2.3.0) (2023-06-27) - - -### Features - -* CDI-1583: Databricks IAM access to read/write logs ([#7980](https://github.com/chanzuckerberg/shared-infra/issues/7980)) ([081669c](https://github.com/chanzuckerberg/shared-infra/commit/081669c6eb41f047488b7663417b2bdacb189b23)) -* CDI-1603 Set up new databricks cluster log mounts ([#7985](https://github.com/chanzuckerberg/shared-infra/issues/7985)) ([da402a6](https://github.com/chanzuckerberg/shared-infra/commit/da402a67ade93e74ef179937d323b53aa5813f04)) - -## [2.2.1](https://github.com/chanzuckerberg/shared-infra/compare/databricks-workspace-e2-v2.2.0...databricks-workspace-e2-v2.2.1) (2023-06-14) - - -### Bug Fixes - -* databricks-workspace-e2: Bump aws-s3-private-bucket version for … ([#7949](https://github.com/chanzuckerberg/shared-infra/issues/7949)) ([9ac0903](https://github.com/chanzuckerberg/shared-infra/commit/9ac0903dc8360ed9e431326a9a7e0f8bf870d2b5)) - -## [2.2.0](https://github.com/chanzuckerberg/shared-infra/compare/databricks-workspace-e2-v2.1.0...databricks-workspace-e2-v2.2.0) (2023-06-13) - - -### Features - -* Expose bucket object ownership flag for databricks-workspace-e2 ([#7932](https://github.com/chanzuckerberg/shared-infra/issues/7932)) ([d974d0f](https://github.com/chanzuckerberg/shared-infra/commit/d974d0f7458b2daa08898eed0d28ea8bb906da78)) - -## [2.1.0](https://github.com/chanzuckerberg/shared-infra/compare/databricks-workspace-e2-v2.0.0...databricks-workspace-e2-v2.1.0) (2023-06-12) - - -### Features - -* Expose bucket object ownership attribute (via version bump) ([#7928](https://github.com/chanzuckerberg/shared-infra/issues/7928)) ([8fd0f31](https://github.com/chanzuckerberg/shared-infra/commit/8fd0f31217c33baa512777e3c21e232b5ccec3a4)) - -## [2.0.0](https://github.com/chanzuckerberg/shared-infra/compare/databricks-workspace-e2-v1.1.0...databricks-workspace-e2-v2.0.0) (2023-04-28) - - -### ⚠ BREAKING CHANGES - -* k8s-core major version bump (#7726) - -### Features - -* k8s-core major version bump ([#7726](https://github.com/chanzuckerberg/shared-infra/issues/7726)) ([1c44772](https://github.com/chanzuckerberg/shared-infra/commit/1c4477285cf5a26411a73396bb631eea39a67e6b)) - -## [1.1.0](https://github.com/chanzuckerberg/shared-infra/compare/databricks-workspace-e2-v1.0.0...databricks-workspace-e2-v1.1.0) (2023-03-23) - - -### Features - -* bump all shared-infra to 1.3.0 ([#7514](https://github.com/chanzuckerberg/shared-infra/issues/7514)) ([c56e63e](https://github.com/chanzuckerberg/shared-infra/commit/c56e63eac215442570762e62f27bab222f1837cb)) - -## 1.0.0 (2023-01-31) - - -### Features - -* CDI-1019 trigger databricks-workspace-ec2 versioning ([#7136](https://github.com/chanzuckerberg/shared-infra/issues/7136)) ([f7b791d](https://github.com/chanzuckerberg/shared-infra/commit/f7b791d73caf5aaf1febd3aa2bf4488a04d32a37)) From 96ae2a36e4eaa309c41b125ebf3a35437ad36546 Mon Sep 17 00:00:00 2001 From: Jason Ng Date: Mon, 30 Oct 2023 16:18:08 -0700 Subject: [PATCH 3/5] remove default --- databricks-workspace-e2/variables.tf | 1 - 1 file changed, 1 deletion(-) diff --git a/databricks-workspace-e2/variables.tf b/databricks-workspace-e2/variables.tf index a1cea0fb..7eb3f9d2 100644 --- a/databricks-workspace-e2/variables.tf +++ b/databricks-workspace-e2/variables.tf @@ -53,7 +53,6 @@ variable "object_ownership" { variable "audit_log_bucket_name" { type = string - default = "czi-audit-logs" description = "Name of bucket to write cluster logs to - also where the audit logs go, too" } From 0183835e2b448233724d4ed6923b5583f5b202e2 Mon Sep 17 00:00:00 2001 From: Jason Ng Date: Mon, 30 Oct 2023 16:19:18 -0700 Subject: [PATCH 4/5] remove czi contact --- databricks-workspace-e2/README.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/databricks-workspace-e2/README.md b/databricks-workspace-e2/README.md index 46c1934d..930296c9 100644 --- a/databricks-workspace-e2/README.md +++ b/databricks-workspace-e2/README.md @@ -12,11 +12,11 @@ extra_vars: databricks_external_id: value-for-external-id ``` -To get the `databricks_external_id`, ask the Databricks point of contact (Taha Syed has been our POC for TLP and Meta: taha@databricks.com). It is [this value](https://databrickslabs.github.io/terraform-provider-databricks/resources/mws_workspaces/). +To get the `databricks_external_id`, ask the Databricks point of contact. It is [this value](https://databrickslabs.github.io/terraform-provider-databricks/resources/mws_workspaces/). -5. Run `fogg apply` at the root. -6. In the newly created component, create a file called `cloud-env.tf` and copy what you see in `terraform/envs/meta-prod/databricks-workspace/cloud-env.tf`. Replace the values for `database_subnet_cidrs`, `private_subnet_cidrs`, `public_subnet_cidrs`, and `vpc_cidr` with the output that you created in `outputs.tf`. -7. At minimum, you'll need something like this in `main.tf` +1. Run `fogg apply` at the root. +2. In the newly created component, create a file called `cloud-env.tf` and copy what you see in `terraform/envs/meta-prod/databricks-workspace/cloud-env.tf`. Replace the values for `database_subnet_cidrs`, `private_subnet_cidrs`, `public_subnet_cidrs`, and `vpc_cidr` with the output that you created in `outputs.tf`. +3. At minimum, you'll need something like this in `main.tf` ```terraform module databricks-workspace { source = "../../../modules/databricks-workspace-e2" @@ -31,7 +31,7 @@ To get the `databricks_external_id`, ask the Databricks point of contact (Taha S ``` where the `databricks_external_id` is value you specified in the `databricks-workspace` component in `fogg.yml`. -8. Add a new file called `provider.tf`. To authenticate to Databricks, we use `basic_auth` with environment variables `DATABRICKS_USERNAME` and `DATABRICKS_PASSWORD`. We store these as environment variables in the TFE workspace you're working in. They have to be named exactly those words to have the `basic_auth` pick them up. This is the default provider for Databricks: +1. Add a new file called `provider.tf`. To authenticate to Databricks, we use `basic_auth` with environment variables `DATABRICKS_USERNAME` and `DATABRICKS_PASSWORD`. We store these as environment variables in the TFE workspace you're working in. They have to be named exactly those words to have the `basic_auth` pick them up. This is the default provider for Databricks: ```terraform provider "databricks" { version = "v0.2.3" From 759854d53b2f02a21df5a0de8c7516872bb44afc Mon Sep 17 00:00:00 2001 From: Jason Ng Date: Mon, 30 Oct 2023 16:19:53 -0700 Subject: [PATCH 5/5] clean up Readme --- databricks-workspace-e2/README.md | 52 ------------------------------- 1 file changed, 52 deletions(-) diff --git a/databricks-workspace-e2/README.md b/databricks-workspace-e2/README.md index 930296c9..9f879b4f 100644 --- a/databricks-workspace-e2/README.md +++ b/databricks-workspace-e2/README.md @@ -1,55 +1,3 @@ -## Databricks Multi-workspace -1. In `terraform/accounts/databricks-network/databricks.tf`, add a section in `networks` to specify the base CIDR block for the new workspace. Then, to specify the subnets, add a section similar to `meta_workspace_subnets`. -2. In `terraform/accounts/databricks-network/outputs.tf`, add the output for the cidr blocks you just created. -3. To setup this new workspace in TFE, here are the [docs](https://czi.atlassian.net/wiki/spaces/SI/pages/1786741987/Terraform+Enterprise#TerraformEnterprise-Migratinganewworkspaceinapre-configuredrepotoTFE) to do so. -4. After you configure TFE, in `fogg.yml`, for the env you're working in, add a `databricks-workspace` component. It will look like this: -```yaml - databricks-workspace: - backend: - host_name: si.prod.tfe.czi.technology - kind: remote - organization: shared-infra - extra_vars: - databricks_external_id: value-for-external-id -``` -To get the `databricks_external_id`, ask the Databricks point of contact. It is [this value](https://databrickslabs.github.io/terraform-provider-databricks/resources/mws_workspaces/). - -1. Run `fogg apply` at the root. -2. In the newly created component, create a file called `cloud-env.tf` and copy what you see in `terraform/envs/meta-prod/databricks-workspace/cloud-env.tf`. Replace the values for `database_subnet_cidrs`, `private_subnet_cidrs`, `public_subnet_cidrs`, and `vpc_cidr` with the output that you created in `outputs.tf`. -3. At minimum, you'll need something like this in `main.tf` -```terraform - module databricks-workspace { - source = "../../../modules/databricks-workspace-e2" - databricks_external_id = var.databricks_external_id - vpc_id = module.aws-env.vpc_id - private_subnets = module.aws-env.private_subnets - project = var.project - env = var.env - service = var.component - owner = var.owner - } -``` -where the `databricks_external_id` is value you specified in the `databricks-workspace` component in `fogg.yml`. - -1. Add a new file called `provider.tf`. To authenticate to Databricks, we use `basic_auth` with environment variables `DATABRICKS_USERNAME` and `DATABRICKS_PASSWORD`. We store these as environment variables in the TFE workspace you're working in. They have to be named exactly those words to have the `basic_auth` pick them up. This is the default provider for Databricks: -```terraform - provider "databricks" { - version = "v0.2.3" - host = "https://accounts.cloud.databricks.com" - basic_auth {} - } -``` -You'll also need this, if you want to authenticate per workspace: -```terraform - provider "databricks" { - alias = "within_workspace" - version = "v0.2.3" - host = module.databricks-workspace.workspace_url - basic_auth {} - } -``` -To get the URL of the deployment, it'll be the `deployment_name` which is `name = "${var.project}-${var.env}-${var.component}"` which is the local in the `databricks-workspace` module appended by `.cloud.databricks.com`. - ## References * [Here](https://databrickslabs.github.io/terraform-provider-databricks/overview/) is the provider docs.