-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add multi-mount parallelstore support #3256
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
harshthakkar01
added
the
release-improvements
Added to release notes under the "Improvements" heading.
label
Nov 13, 2024
harshthakkar01
force-pushed
the
ps-fix-2
branch
4 times, most recently
from
November 15, 2024 06:50
727d580
to
f607f80
Compare
harshthakkar01
changed the title
Update mount parallelstore script to support multiple parallelstore
Add multi-mount parallelstore support
Nov 15, 2024
harshthakkar01
force-pushed
the
ps-fix-2
branch
from
November 15, 2024 07:18
f607f80
to
3f01fcb
Compare
harshthakkar01
force-pushed
the
ps-fix-2
branch
from
November 15, 2024 20:07
3f01fcb
to
02c7962
Compare
6 tasks
wiktorn
reviewed
Nov 16, 2024
tpdownes
requested changes
Nov 18, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to know a bit more about what attempting to support multiple Parallelstore instances. In the meantime, the changes I suggest will improve reliability.
wiktorn
reviewed
Nov 30, 2024
modules/file-system/pre-existing-network-storage/scripts/mount-daos.sh
Outdated
Show resolved
Hide resolved
tpdownes
added a commit
to harshthakkar01/hpc-toolkit
that referenced
this pull request
Dec 3, 2024
TESTED: - simple Debian and Ubuntu VMs with one NIC TODO: - rewrite find command to address 2 gVNIC? - fix quoting of ignored interfaces
TESTED: - simple Debian and Ubuntu VMs with one NIC - a3-megagpu-8g Ubuntu and HPC Rocky 8
In addition to the standard tests I tested against this blueprint: ---
blueprint_name: test-ps
vars:
deployment_name: test-ps
project_id: hpc-toolkit-gsc
region: us-central1
zone: us-central1-c
parallelstore_ips: "[10.80.175.133,10.80.175.132,10.80.175.130]"
deployment_groups:
- group: primary
modules:
- id: network
source: modules/network/pre-existing-vpc
settings:
network_name: a3mega-sys-net
subnetwork_name: a3mega-sys-subnet
- id: gpunet
source: modules/network/pre-existing-vpc
settings:
network_name: a3mega-cluster-dev-gpunet-0
subnetwork_name: a3mega-cluster-dev-gpunet-0-subnet
- id: parallelstore-rwx
source: modules/file-system/pre-existing-network-storage
settings:
fs_type: daos
remote_mount: $(vars.parallelstore_ips)
local_mount: /parallelstore/rwx
mount_options: disable-caching,thread-count=26,eq-count=13,multi-user
- id: parallelstore-rwo
source: modules/file-system/pre-existing-network-storage
settings:
fs_type: daos
remote_mount: $(vars.parallelstore_ips)
local_mount: /parallelstore/rwo
mount_options: disable-wb-cache,thread-count=26,eq-count=13,multi-user
- id: vm
source: modules/compute/vm-instance
use:
- parallelstore-rwo
- parallelstore-rwx
settings:
machine_type: n2-standard-8
name_prefix: id
disk_type: pd-ssd
network_interfaces:
- network: null
subnetwork: $(network.subnetwork_self_link)
subnetwork_project: null
network_ip: null
stack_type: null
access_config: []
ipv6_access_config: []
alias_ip_range: []
queue_count: null
nic_type: GVNIC
- network: null
subnetwork: $(gpunet.subnetwork_self_link)
subnetwork_project: null
network_ip: null
stack_type: null
access_config: []
ipv6_access_config: []
alias_ip_range: []
queue_count: null
nic_type: GVNIC and observed the expected outcome:
|
tpdownes
approved these changes
Dec 5, 2024
tpdownes
added a commit
to tpdownes/hpc-toolkit
that referenced
this pull request
Dec 5, 2024
tpdownes
added a commit
to tpdownes/hpc-toolkit
that referenced
this pull request
Dec 6, 2024
tpdownes
added a commit
to tpdownes/hpc-toolkit
that referenced
this pull request
Dec 6, 2024
tpdownes
added a commit
to tpdownes/hpc-toolkit
that referenced
this pull request
Dec 6, 2024
cdunbar13
pushed a commit
to cdunbar13/cluster-toolkit
that referenced
this pull request
Dec 18, 2024
TESTED: - simple Debian and Ubuntu VMs with one NIC - a3-megagpu-8g Ubuntu and HPC Rocky 8
cdunbar13
pushed a commit
to cdunbar13/cluster-toolkit
that referenced
this pull request
Dec 18, 2024
Merged
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR,
Submission Checklist
NOTE: Community submissions can take up to 2 weeks to be reviewed.
Please take the following actions before submitting this pull request.