Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync changes from develop #3278

Merged

Conversation

annuay-google
Copy link
Contributor

@annuay-google annuay-google commented Nov 18, 2024

Back merge of most recent release

Submission Checklist

NOTE: Community submissions can take up to 2 weeks to be reviewed.

Please take the following actions before submitting this pull request.

  • Fork your PR branch from the Toolkit "develop" branch (not main)
  • Test all changes with pre-commit in a local branch #
  • Confirm that "make tests" passes all tests
  • Add or modify unit tests to cover code changes
  • Ensure that unit test coverage remains above 80%
  • Update all applicable documentation
  • Follow Cluster Toolkit Contribution guidelines #

mr0re1 and others added 30 commits November 1, 2024 23:36
SlurmGCP. Add `set -e` to prolog mux
SlurmGCP. Escape GCP error reasons that may cause malformed CLI args
…warning

Fix formatting of Docker config warning
Filestore deletion protection ensures that instances are not
unintentionally deleted. A typical lifecycle for a user will look like:

1. Deploy a blueprint with deletion protection enabled
2. Disable deletion protection in blueprint
3. Redeploy blueprint
4. Destroy deployment

In particular, enabling Filestore deletion protection does not prevent
Terraform from destroying other resources. So a `gcluster destroy`
command will destroy all resources except the Filestore and its
dependencies.
This is a temporary workaround to prevent failures while we debug the issue.
…able_lustre

Don't install Lustre in A3 tests
…eletion

Add support for Filestore deletion protection
…ate_docs

Remove CentOS 7 from list of supported images
…health

add gpu health check in prolog and epilog
…g-latest

Fix a bug where try was hiding extraction of gpu driver version
SlurmGCP. Don't use remote module to create controller instance
ankitkinra and others added 20 commits November 14, 2024 12:26
…t-version-2

Fix the gpu_installation_config default for case where no customer input
add firewall to allow tcp traffic for parallelstore
…m-node-pool-disk

Allow specifying GKE's system node pool disk properties
…-doc-update

Add documentation on opportunistic GCP maintenance support in Slurm
…revert-3232-develop

Revert "update a3 machines local ssd to use nvme instead of scsi for better performance"
…_dockerfile

Adds Cluster Toolkit Dockerfile for backend integration with XPK
…l-kueue

add support for kueue v0.9.0 to enable TAS
…ud_ops

Add cluster and hostname as cloud ops labels
Add support for kueue v0.9.0 to enable TAS
@annuay-google annuay-google changed the base branch from experimental to develop November 18, 2024 21:03
@annuay-google annuay-google changed the base branch from develop to experimental November 18, 2024 21:04
@annuay-google annuay-google merged commit 5a82958 into GoogleCloudPlatform:experimental Nov 18, 2024
5 of 55 checks passed
@annuay-google annuay-google deleted the exp-copy-annuay branch November 18, 2024 22:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.