Skip to content

Commit

Permalink
Merge pull request #3437 from mohitchaurasia91/release-branch
Browse files Browse the repository at this point in the history
[Cherry Pick] Update README and related test setup for GKE managed parallelstore blueprint
  • Loading branch information
nick-stroud authored Dec 19, 2024
2 parents 8bb384f + dbc0740 commit 4d362f6
Show file tree
Hide file tree
Showing 6 changed files with 34 additions and 10 deletions.
24 changes: 24 additions & 0 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -1518,6 +1518,30 @@ cleaned up when the job is deleted.

[storage-gke.yaml]: ../examples/storage-gke.yaml

### [gke-managed-parallelstore.yaml] ![core-badge] ![experimental-badge]

This blueprint shows how to use managed parallelstore storage options with GKE in the toolkit.

The blueprint contains the following:

* A K8s Job that uses a managed parallelstore storage volume option.
* A K8s Job that demonstrates ML training workload with managed parallelstore storage disk operation.

> **Warning**: In this example blueprint, when storage type `Parallelstore` is specified in `gke-storage` module.
> The lifecycle of the parallelstore is managed by the blueprint.
> On glcuster destroy operation, the Parallelstore storage created will also be destroyed.
>
> [!Note]
> The Kubernetes API server will only allow requests from authorized networks.
> The `gke-cluster` module needs access to the Kubernetes API server
> to create a Persistent Volume and a Persistent Volume Claim. **You must use
> the `authorized_cidr` variable to supply an authorized network which contains
> the IP address of the machine deploying the blueprint, for example
> `--vars authorized_cidr=<your-ip-address>/32`.** You can use a service like
> [whatismyip.com](https://whatismyip.com) to determine your IP address.

[gke-managed-parallelstore.yaml]: ../examples/gke-managed-parallelstore.yaml

### [gke-a3-megagpu.yaml] ![core-badge] ![experimental-badge]

This blueprint shows how to provision a GKE cluster with A3 Mega machines in the toolkit.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,10 @@
# See the License for the specific language governing permissions and
# limitations under the License.
---
blueprint_name: gke-storage-parallelstore
blueprint_name: gke-managed-parallelstore
vars:
project_id: ## Set GCP Project ID Here ##
deployment_name: gke-storage-ps
deployment_name: gke-storage-managed-ps
region: us-central1
zone: us-central1-c
# Cidr block containing the IP of the machine calling terraform.
Expand Down
2 changes: 1 addition & 1 deletion modules/file-system/gke-storage/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ then use them in a `gke-job-template` to dynamically provision the resource.
```
See example
[gke-storage-parallelstore.yaml](../../../examples/README.md#gke-storage-parallelstoreyaml--) blueprint
[gke-managed-parallelstore.yaml](../../../examples/README.md#gke-managed-parallelstoreyaml--) blueprint
for a complete example.
### Authorized Network
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ timeout: 14400s # 4hr

steps:
## Test GKE
- id: gke-storage-parallelstore
- id: gke-managed-parallelstore
name: us-central1-docker.pkg.dev/$PROJECT_ID/hpc-toolkit-repo/test-runner
entrypoint: /bin/bash
env:
Expand All @@ -40,7 +40,7 @@ steps:
cd /workspace && make
BUILD_ID_FULL=$BUILD_ID
BUILD_ID_SHORT=$${BUILD_ID_FULL:0:6}
SG_EXAMPLE=examples/gke-storage-parallelstore.yaml
SG_EXAMPLE=examples/gke-managed-parallelstore.yaml
# adding vm to act as remote node
echo ' - id: remote-node' >> $${SG_EXAMPLE}
Expand All @@ -58,4 +58,4 @@ steps:
ansible-playbook tools/cloud-build/daily-tests/ansible_playbooks/base-integration-test.yml \
--user=sa_106486320838376751393 --extra-vars="project=${PROJECT_ID} build=$${BUILD_ID_SHORT}" \
--extra-vars="@tools/cloud-build/daily-tests/tests/gke-storage-parallelstore.yml"
--extra-vars="@tools/cloud-build/daily-tests/tests/gke-managed-parallelstore.yml"
Original file line number Diff line number Diff line change
Expand Up @@ -12,16 +12,16 @@
# See the License for the specific language governing permissions and
# limitations under the License.
---
test_name: gke-storage-parallelstore
deployment_name: gke-storage-parallelstore-{{ build }}
test_name: gke-managed-parallelstore
deployment_name: gke-managed-parallelstore-{{ build }}
zone: us-central1-a # for remote node
region: us-central1
workspace: /workspace
blueprint_yaml: "{{ workspace }}/examples/gke-storage-parallelstore.yaml"
blueprint_yaml: "{{ workspace }}/examples/gke-managed-parallelstore.yaml"
network: "{{ deployment_name }}-net"
remote_node: "{{ deployment_name }}-0"
post_deploy_tests:
- test-validation/test-gke-storage-parallelstore.yml
- test-validation/test-gke-managed-parallelstore.yml
custom_vars:
project: "{{ project }}"
cli_deployment_vars:
Expand Down

0 comments on commit 4d362f6

Please sign in to comment.