This module is used to create a Kubernetes job template file.
The job template file can be submitted as is or used as a template for further
customization. Add the instructions
output to a blueprint (as shown below) to
get instructions on how to use kubectl
to submit the job.
This module is designed to use
one or more gke-node-pool
modules. The job
will be configured to run on any of the specified node pools.
NOTE: This is an experimental module and the functionality and documentation will likely be updated in the near future. This module has only been tested in limited capacity.
The following example creates a GKE job template file.
- id: job-template
source: modules/compute/gke-job-template
use: [compute_pool]
settings:
node_count: 3
outputs: [instructions]
Also see a full GKE example blueprint.
This module natively supports:
- Filestore as a shared file system between pods/nodes.
- Pod level ephemeral storage options:
- memory backed emptyDir
- local SSD backed emptyDir
- SSD persistent disk backed ephemeral volume
- balanced persistent disk backed ephemeral volume
See the storage-gke.yaml blueprint and the associated documentation for examples of how to use Filestore and ephemeral storage with this module.
When one or more gke-node-pool
modules are referenced with the use
field.
The requested resources will be populated to achieve a 1 pod per node packing
while still leaving some headroom for required system pods.
This functionality can be overridden by specifying the desired cpu requirement
using the requested_cpu_per_pod
setting.
Copyright 2023 Google LLC
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Name | Version |
---|---|
terraform | >= 1.2 |
local | >= 2.0.0 |
random | ~> 3.0 |
Name | Version |
---|---|
local | >= 2.0.0 |
random | ~> 3.0 |
No modules.
Name | Type |
---|---|
local_file.job_template | resource |
random_id.resource_name_suffix | resource |
Name | Description | Type | Default | Required |
---|---|---|---|---|
allocatable_cpu_per_node | The allocatable cpu per node. Used to claim whole nodes. Generally populated from gke-node-pool via use field. |
list(number) |
[ |
no |
allocatable_gpu_per_node | The allocatable gpu per node. Used to claim whole nodes. Generally populated from gke-node-pool via use field. |
list(number) |
[ |
no |
backoff_limit | Controls the number of retries before considering a Job as failed. Set to zero for shared fate. | number |
0 |
no |
command | The command and arguments for the container that run in the Pod. The command field corresponds to entrypoint in some container runtimes. | list(string) |
[ |
no |
completion_mode | Sets value of completionMode on the job. Default uses indexed jobs. See documentation for more information |
string |
"Indexed" |
no |
ephemeral_volumes | Will create an emptyDir or ephemeral volume that is backed by the specified type: memory , local-ssd , pd-balanced , pd-ssd . size_gb is provided in GiB. |
list(object({ |
[] |
no |
has_gpu | Indicates that the job should request nodes with GPUs. Typically supplied by a gke-node-pool module. | list(bool) |
[ |
no |
image | The container image the job should use. | string |
"debian" |
no |
k8s_service_account_name | Kubernetes service account to run the job as. If null then no service account is specified. | string |
null |
no |
labels | Labels to add to the GKE job template. Key-value pairs. | map(string) |
n/a | yes |
machine_family | The machine family to use in the node selector (example: n2 ). If null then machine family will not be used as selector criteria. |
string |
null |
no |
name | The name of the job. | string |
"my-job" |
no |
node_count | How many nodes the job should run in parallel. | number |
1 |
no |
node_pool_name | A list of node pool names on which to run the job. Can be populated via use field. |
list(string) |
[] |
no |
node_selectors | A list of node selectors to use to place the job. | list(object({ |
[] |
no |
persistent_volume_claims | A list of objects that describes a k8s PVC that is to be used and mounted on the job. Generally supplied by the gke-persistent-volume module. | list(object({ |
[] |
no |
random_name_sufix | Appends a random suffix to the job name to avoid clashes. | bool |
true |
no |
requested_cpu_per_pod | The requested cpu per pod. If null, allocatable_cpu_per_node will be used to claim whole nodes. If provided will override allocatable_cpu_per_node. | number |
-1 |
no |
requested_gpu_per_pod | The requested gpu per pod. If null, allocatable_gpu_per_node will be used to claim whole nodes. If provided will override allocatable_gpu_per_node. | number |
-1 |
no |
restart_policy | Job restart policy. Only a RestartPolicy equal to Never or OnFailure is allowed. |
string |
"Never" |
no |
security_context | The security options the container should be run with. More info: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/ | list(object({ |
[] |
no |
tolerations | Tolerations allow the scheduler to schedule pods with matching taints. Generally populated from gke-node-pool via use field. |
list(object({ |
[ |
no |
Name | Description |
---|---|
instructions | Instructions for submitting the GKE job. |