This module creates partition of TPU nodeset. TPUs are Google's custom-developed application specific ICs to accelerate machine learning workloads.
The following code snippet creates TPU partition with following attributes.
- TPU nodeset module is connected to
network
module. - TPU nodeset is of type
v2-8
and version2.10.0
, you can check different configuration configuration - TPU vms are preemptible.
preserve_tpu
is set to false. This means, suspended vms will be deleted.- Partition module uses this defined
tpu_nodeset
module and this partition can be accessed astpu
partition.
- id: tpu_nodeset
source: community/modules/compute/schedmd-slurm-gcp-v6-nodeset-tpu
use: [network]
settings:
node_type: v2-8
tf_version: 2.10.0
disable_public_ips: false
preemptible: true
preserve_tpu: false
- id: tpu_partition
source: community/modules/compute/schedmd-slurm-gcp-v6-partition
use: [tpu_nodeset]
settings:
partition_name: tpu
Name | Version |
---|---|
terraform | >= 1.3 |
No providers.
No modules.
No resources.
Name | Description | Type | Default | Required |
---|---|---|---|---|
accelerator_config | Nodeset accelerator config, see https://cloud.google.com/tpu/docs/supported-tpu-configurations for details. | object({ |
{ |
no |
data_disks | The data disks to include in the TPU node | list(string) |
[] |
no |
disable_public_ips | DEPRECATED: Use enable_public_ips instead. |
bool |
null |
no |
docker_image | The gcp container registry id docker image to use in the TPU vms, it defaults to gcr.io/schedmd-slurm-public/tpu:slurm-gcp-6-8-tf-<var.tf_version> | string |
null |
no |
enable_public_ips | If set to true. The node group VMs will have a random public IP assigned to it. Ignored if access_config is set. | bool |
false |
no |
name | Name of the nodeset. Automatically populated by the module id if not set. If setting manually, ensure a unique value across all nodesets. |
string |
n/a | yes |
network_storage | An array of network attached storage mounts to be configured on nodes. | list(object({ |
[] |
no |
node_count_dynamic_max | Maximum number of auto-scaling worker nodes allowed in this partition. For larger TPU machines, there are multiple worker nodes required per machine (1 for every 8 cores). See https://cloud.google.com/tpu/docs/v4#large-topologies, for more information about these machine types. |
number |
0 |
no |
node_count_static | Number of worker nodes to be statically created. For larger TPU machines, there are multiple worker nodes required per machine (1 for every 8 cores). See https://cloud.google.com/tpu/docs/v4#large-topologies, for more information about these machine types. |
number |
0 |
no |
node_type | Specify a node type to base the vm configuration upon it. | string |
"" |
no |
preemptible | Should use preemptibles to burst. | bool |
false |
no |
preserve_tpu | Specify whether TPU-vms will get preserve on suspend, if set to true, on suspend vm is stopped, on false it gets deleted | bool |
false |
no |
project_id | Project ID to create resources in. | string |
n/a | yes |
reserved | Specify whether TPU-vms in this nodeset are created under a reservation. | bool |
false |
no |
service_account | DEPRECATED: Use service_account_email and service_account_scopes instead. |
object({ |
null |
no |
service_account_email | Service account e-mail address to attach to the TPU-vm. | string |
null |
no |
service_account_scopes | Scopes to attach to the TPU-vm. | set(string) |
[ |
no |
subnetwork_self_link | The name of the subnetwork to attach the TPU-vm of this nodeset to. | string |
n/a | yes |
tf_version | Nodeset Tensorflow version, see https://cloud.google.com/tpu/docs/supported-tpu-configurations#tpu_vm for details. | string |
"2.14.0" |
no |
zone | Zone in which to create compute VMs. TPU partitions can only specify a single zone. | string |
n/a | yes |
Name | Description |
---|---|
nodeset_tpu | Details of the nodeset tpu. Typically used as input to schedmd-slurm-gcp-v6-partition . |