Skip to content

Commit

Permalink
Use templates in pre-existing-storage
Browse files Browse the repository at this point in the history
  • Loading branch information
wiktorn committed Nov 16, 2024
1 parent 63e84c6 commit 87f6475
Show file tree
Hide file tree
Showing 11 changed files with 132 additions and 60 deletions.
28 changes: 26 additions & 2 deletions modules/file-system/parallelstore/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,30 @@ Here you can replace `import_gcs_bucket_uri` with the uri of sub folder within G
bucket and `import_destination_path` with local directory within parallelstore
instance.

### Additional configuration for DAOS agent and dfuse
Use `daos_agent_config` to provide additional configuration for `daos_agent`, for example:

```yaml
- id: parallelstorefs
source: modules/file-system/pre-existing-network-storage
settings:
daos_agent_config: |
credential_config:
cache_expiration: 1m
```

Use `dfuse_environment` to provide additional environment variables for `dfuse` process, for example:

```yaml
- id: parallelstorefs
source: modules/file-system/parallelstore
settings:
dfuse_environment:
D_LOG_FILE: /tmp/client.log
D_APPEND_PID_TO_LOG: 1
D_LOG_MASK: debug
```

<!-- BEGINNING OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
Copyright 2024 Google LLC

Expand Down Expand Up @@ -151,8 +175,8 @@ No modules.
| <a name="input_local_mount"></a> [local\_mount](#input\_local\_mount) | The mount point where the contents of the device may be accessed after mounting. | `string` | `"/parallelstore"` | no |
| <a name="input_mount_options"></a> [mount\_options](#input\_mount\_options) | Options describing various aspects of the parallelstore instance. | `string` | `"disable-wb-cache,thread-count=16,eq-count=8"` | no |
| <a name="input_name"></a> [name](#input\_name) | Name of parallelstore instance. | `string` | `null` | no |
| <a name="input_network_id"></a> [network\_id](#input\_network\_id) | The ID of the GCE VPC network to which the instance is connected given in the format:<br>`projects/<project_id>/global/networks/<network_name>`" | `string` | n/a | yes |
| <a name="input_private_vpc_connection_peering"></a> [private\_vpc\_connection\_peering](#input\_private\_vpc\_connection\_peering) | The name of the VPC Network peering connection.<br>If using new VPC, please use community/modules/network/private-service-access to create private-service-access and<br>If using existing VPC with private-service-access enabled, set this manually." | `string` | n/a | yes |
| <a name="input_network_id"></a> [network\_id](#input\_network\_id) | The ID of the GCE VPC network to which the instance is connected given in the format:<br/>`projects/<project_id>/global/networks/<network_name>`" | `string` | n/a | yes |
| <a name="input_private_vpc_connection_peering"></a> [private\_vpc\_connection\_peering](#input\_private\_vpc\_connection\_peering) | The name of the VPC Network peering connection.<br/>If using new VPC, please use community/modules/network/private-service-access to create private-service-access and<br/>If using existing VPC with private-service-access enabled, set this manually." | `string` | n/a | yes |
| <a name="input_project_id"></a> [project\_id](#input\_project\_id) | Project in which the HPC deployment will be created. | `string` | n/a | yes |
| <a name="input_size_gb"></a> [size\_gb](#input\_size\_gb) | Storage size of the parallelstore instance in GB. | `number` | `12000` | no |
| <a name="input_zone"></a> [zone](#input\_zone) | Location for parallelstore instance. | `string` | n/a | yes |
Expand Down
2 changes: 1 addition & 1 deletion modules/file-system/parallelstore/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ locals {
local_mount = var.local_mount
mount_options = join(" ", [for opt in split(",", var.mount_options) : "--${opt}"])
})
"destination" = "mount_daos.sh"
"destination" = "mount_filesystem${replace(var.local_mount, "/", "_")}.sh"
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ EOF
if [ -x /usr/bin/google_disable_automatic_updates ]; then
/usr/bin/google_disable_automatic_updates
fi
dnf clean all
dnf makecache

# 2) Install daos-client
Expand Down
33 changes: 17 additions & 16 deletions modules/file-system/parallelstore/templates/mount-daos.sh.tftpl
Original file line number Diff line number Diff line change
Expand Up @@ -25,24 +25,25 @@ daos_config=/etc/daos/daos_agent.yml
# rewrite $daos_config from scratch
mv $daos_config $${daos_config}.orig

exclude_fabric_ifaces=""
# Get interface names with "s0f0" suffix
if ifconfig -a | grep 's0f0'; then
sof0_interfaces=$(ifconfig -a | grep 's0f0:' | awk '{print $1}' | tr ':' '\n' | grep -v '^$' | awk '!a[$0]++' | sed 's/^/"/g' | sed 's/$/"/g' | paste -sd, -)

# Append the sof0_interfaces to the existing list
exclude_fabric_ifaces="lo,$sof0_interfaces"
fi

cat > $daos_config <<EOF
access_points: ${access_points}
transport_config:
allow_insecure: true
$exclude_fabric_ifaces
${daos_agent_config}
EOF

# Get interface names with "s0f0" suffix
if ifconfig -a | grep 's0f0'; then
sof0_interfaces=$(ifconfig -a | grep 's0f0:' | awk '{print $1}' | tr ':' '\n' | grep -v '^$' | awk '!a[$0]++' | sed 's/^/"/g' | sed 's/$/"/g' | paste -sd, -)

# Append the sof0_interfaces to the existing list
exclude_fabric_ifaces="lo,$sof0_interfaces"

# Update the file with the new list
sed -i "s/#.*exclude_fabric_ifaces: \[.*/exclude_fabric_ifaces: [$exclude_fabric_ifaces]/" $daos_config
fi

# Start service
if { [ "$OS_ID" = "rocky" ] || [ "$OS_ID" = "rhel" ]; } && { [ "$OS_VERSION_MAJOR" = "8" ] || [ "$OS_VERSION_MAJOR" = "9" ]; }; then
Expand All @@ -55,7 +56,7 @@ if { [ "$OS_ID" = "rocky" ] || [ "$OS_ID" = "rhel" ]; } && { [ "$OS_VERSION_MAJO
systemctl start daos_agent.service
elif { [ "$OS_ID" = "ubuntu" ] && [ "$OS_VERSION" = "22.04" ]; } || { [ "$OS_ID" = "debian" ] && [ "$OS_VERSION_MAJOR" = "12" ]; }; then
mkdir -p /var/run/daos_agent
daos_agent -o /etc/daos/daos_agent.yml >/dev/null 2>&1 &
daos_agent -o $daos_config >/dev/null 2>&1 &
else
echo "Unsupported operating system $OS_ID $OS_VERSION. This script only supports Rocky Linux 8, Redhat 8, Redhat 9, Ubuntu 22.04, and Debian 12."
exit 1
Expand All @@ -73,13 +74,13 @@ sed -i "s/#.*user_allow_other/user_allow_other/g" $fuse_config
ulimit -n 1048576

# Store the mounting logic in a variable
mount_command="if mountpoint -q '$local_mount'; then fusermount3 -u '$local_mount'; fi; for i in {1..10}; do /bin/dfuse -m '${local_mount}' --pool default-pool --container default-container --multi-user ${mount_options} --foreground && break; sleep 1; done"
mount_command="if mountpoint -q '${local_mount}'; then fusermount3 -z -u '${local_mount}'; fi; for i in {1..10}; do /bin/dfuse -m '${local_mount}' --pool default-pool --container default-container --multi-user ${mount_options} --foreground && break; sleep 1; done"

# Construct the service name with the local_mount suffix
service_name="mount_parallelstore_${local_mount//\//_}.service"
service_name="mount_parallelstore_${replace(local_mount, "/", "_")}.service"

# --- Begin: Add systemd service creation ---
cat >/usr/lib/systemd/system/"${service_name}" <<EOF
cat > "/usr/lib/systemd/system/$${service_name}" <<EOF
[Unit]
Description=DAOS Mount Service
After=network-online.target daos_agent.service
Expand All @@ -91,7 +92,7 @@ Group=root
Restart=always
RestartSec=1
ExecStart=/bin/bash -c "$mount_command"
ExecStop=fusermount3 -z -u '$local_mount'
ExecStop=fusermount3 -z -u '${local_mount}'
%{ for env_key, env_value in dfuse_environment ~}
Environment="${env_key}=${env_value}"
%{ endfor ~}
Expand All @@ -100,8 +101,8 @@ Environment="${env_key}=${env_value}"
WantedBy=multi-user.target
EOF

systemctl enable "${service_name}"
systemctl start "${service_name}"
systemctl enable "$${service_name}"
systemctl start "$${service_name}"
# --- End: Add systemd service creation ---

exit 0
33 changes: 33 additions & 0 deletions modules/file-system/pre-existing-network-storage/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,38 @@ for `parallelstore` instance.
mount_options: disable-wb-cache,thread-count=16,eq-count=8
```

Parallelstore supports additional options for its mountpoints under `parallelstore_options` setting.
Use `daos_agent_config` to provide additional configuration for `daos_agent`, for example:

```yaml
- id: parallelstorefs
source: modules/file-system/pre-existing-network-storage
settings:
fs_type: daos
remote_mount: "[10.246.99.2,10.246.99.3,10.246.99.4]"
mount_options: disable-wb-cache,thread-count=16,eq-count=8
parallelstore_options:
daos_agent_config: |
credential_config:
cache_expiration: 1m
```

Use `dfuse_environment` to provide additional environment variables for `dfuse` process, for example:

```yaml
- id: parallelstorefs
source: modules/file-system/pre-existing-network-storage
settings:
fs_type: daos
remote_mount: "[10.246.99.2,10.246.99.3,10.246.99.4]"
mount_options: disable-wb-cache,thread-count=16,eq-count=8
parallelstore_options:
dfuse_environment:
D_LOG_FILE: /tmp/client.log
D_APPEND_PID_TO_LOG: 1
D_LOG_MASK: debug
```

### Mounting

For the `fs_type` listed below, this module will provide `client_install_runner`
Expand Down Expand Up @@ -126,6 +158,7 @@ No resources.
| <a name="input_fs_type"></a> [fs\_type](#input\_fs\_type) | Type of file system to be mounted (e.g., nfs, lustre) | `string` | `"nfs"` | no |
| <a name="input_local_mount"></a> [local\_mount](#input\_local\_mount) | The mount point where the contents of the device may be accessed after mounting. | `string` | `"/mnt"` | no |
| <a name="input_mount_options"></a> [mount\_options](#input\_mount\_options) | Options describing various aspects of the file system. Consider adding setting to 'defaults,\_netdev,implicit\_dirs' when using gcsfuse. | `string` | `"defaults,_netdev"` | no |
| <a name="input_parallelstore_options"></a> [parallelstore\_options](#input\_parallelstore\_options) | Parallelstore specific options | <pre>object({<br/> daos_agent_config = optional(string, "")<br/> dfuse_environment = optional(map(string), {})<br/> })</pre> | `{}` | no |
| <a name="input_remote_mount"></a> [remote\_mount](#input\_remote\_mount) | Remote FS name or export. This is the exported directory for nfs, fs name for lustre, and bucket name (without gs://) for gcsfuse. | `string` | n/a | yes |
| <a name="input_server_ip"></a> [server\_ip](#input\_server\_ip) | The device name as supplied to fs-tab, excluding remote fs-name(for nfs, that is the server IP, for lustre <MGS NID>[:<MGS NID>]). This can be omitted for gcsfuse. | `string` | `""` | no |

Expand Down
12 changes: 9 additions & 3 deletions modules/file-system/pre-existing-network-storage/outputs.tf
Original file line number Diff line number Diff line change
Expand Up @@ -83,9 +83,15 @@ locals {
}

mount_runner_daos = {
"type" = "shell"
"content" = file("${path.module}/scripts/mount-daos.sh")
"args" = "--access_points=\"${var.remote_mount}\" --local_mount=\"${var.local_mount}\" --mount_options=\"${var.mount_options}\""
"type" = "shell"
"content" = templatefile("${path.module}/templates/mount-daos.sh.tftpl", {
access_points = var.remote_mount
daos_agent_config = var.parallelstore_options.daos_agent_config
dfuse_environment = var.parallelstore_options.dfuse_environment
local_mount = var.local_mount
# avoid passing "--" as mount option to dfuse
mount_options = length(var.mount_options) == 0 ? "" : join(" ", [for opt in split(",", var.mount_options) : "--${opt}"])
})
"destination" = "mount_filesystem${replace(var.local_mount, "/", "_")}.sh"
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ EOF
if [ -x /usr/bin/google_disable_automatic_updates ]; then
/usr/bin/google_disable_automatic_updates
fi
dnf clean all
dnf makecache

# 2) Install daos-client
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,59 +19,52 @@ OS_ID=$(awk -F '=' '/^ID=/ {print $2}' /etc/os-release | sed -e 's/"//g')
OS_VERSION=$(awk -F '=' '/VERSION_ID/ {print $2}' /etc/os-release | sed -e 's/"//g')
OS_VERSION_MAJOR=$(awk -F '=' '/VERSION_ID/ {print $2}' /etc/os-release | sed -e 's/"//g' -e 's/\..*$//')

# Parse local_mount, mount_options from argument.
# Format mount-options string to be compatible to dfuse mount command.
# e.g. "disable-wb-cache,eq-count=8" --> --disable-wb-cache --eq-count=8.
for arg in "$@"; do
if [[ $arg == --access_points=* ]]; then
access_points="${arg#*=}"
fi
if [[ $arg == --local_mount=* ]]; then
local_mount="${arg#*=}"
fi
if [[ $arg == --mount_options=* ]]; then
mount_options="${arg#*=}"
mount_options="--${mount_options//,/ --}"
fi
done

# Edit agent config
daos_config=/etc/daos/daos_agent.yml
sed -i "s/#.*transport_config/transport_config/g" $daos_config
sed -i "s/#.*allow_insecure:.*false/ allow_insecure: true/g" $daos_config
sed -i "s/.*access_points.*/access_points: $access_points/g" $daos_config

# rewrite $daos_config from scratch
mv $daos_config $${daos_config}.orig

exclude_fabric_ifaces=""
# Get interface names with "s0f0" suffix
if ifconfig -a | grep 's0f0'; then
sof0_interfaces=$(ifconfig -a | grep 's0f0:' | awk '{print $1}' | tr ':' '\n' | grep -v '^$' | awk '!a[$0]++' | sed 's/^/"/g' | sed 's/$/"/g' | paste -sd, -)

# Append the sof0_interfaces to the existing list
exclude_fabric_ifaces="lo,$sof0_interfaces"

# Update the file with the new list
sed -i "s/#.*exclude_fabric_ifaces: \[.*/exclude_fabric_ifaces: [$exclude_fabric_ifaces]/" $daos_config
fi

cat > $daos_config <<EOF
access_points: ${access_points}
transport_config:
allow_insecure: true
$exclude_fabric_ifaces
${daos_agent_config}
EOF



# Start service
if { [ "${OS_ID}" = "rocky" ] || [ "${OS_ID}" = "rhel" ]; } && { [ "${OS_VERSION_MAJOR}" = "8" ] || [ "${OS_VERSION_MAJOR}" = "9" ]; }; then
if { [ "$OS_ID" = "rocky" ] || [ "$OS_ID" = "rhel" ]; } && { [ "$OS_VERSION_MAJOR" = "8" ] || [ "$OS_VERSION_MAJOR" = "9" ]; }; then
# TODO: Update script to change default log destination folder, after daos_agent user is supported in debian and ubuntu.
# Move agent log destination from /tmp/ (default) to /var/log/daos_agent/
mkdir -p /var/log/daos_agent
chown daos_agent:daos_agent /var/log/daos_agent
sed -i "s/#.*log_file:.*/log_file: \/var\/log\/daos_agent\/daos_agent.log/g" $daos_config
echo "log_file: /var/log/daos_agent/daos_agent.log" >> $daos_config
systemctl enable daos_agent.service
systemctl start daos_agent.service
elif { [ "${OS_ID}" = "ubuntu" ] && [ "${OS_VERSION}" = "22.04" ]; } || { [ "${OS_ID}" = "debian" ] && [ "${OS_VERSION_MAJOR}" = "12" ]; }; then
elif { [ "$OS_ID" = "ubuntu" ] && [ "$OS_VERSION" = "22.04" ]; } || { [ "$OS_ID" = "debian" ] && [ "$OS_VERSION_MAJOR" = "12" ]; }; then
mkdir -p /var/run/daos_agent
daos_agent -o /etc/daos/daos_agent.yml >/dev/null 2>&1 &
daos_agent -o $daos_config >/dev/null 2>&1 &
else
echo "Unsupported operating system ${OS_ID} ${OS_VERSION}. This script only supports Rocky Linux 8, Redhat 8, Redhat 9, Ubuntu 22.04, and Debian 12."
echo "Unsupported operating system $OS_ID $OS_VERSION. This script only supports Rocky Linux 8, Redhat 8, Redhat 9, Ubuntu 22.04, and Debian 12."
exit 1
fi

# Mount parallelstore instance to client vm.
mkdir -p "$local_mount"
chmod 777 "$local_mount"
mkdir -p "${local_mount}"
chmod 777 "${local_mount}"

# Mount container for multi-user.
fuse_config=/etc/fuse.conf
Expand All @@ -81,13 +74,13 @@ sed -i "s/#.*user_allow_other/user_allow_other/g" $fuse_config
ulimit -n 1048576

# Store the mounting logic in a variable
mount_command="if mountpoint -q '$local_mount'; then fusermount3 -u '$local_mount'; fi; for i in {1..10}; do /bin/dfuse -m '$local_mount' --pool default-pool --container default-container --multi-user $mount_options --foreground && break; sleep 1; done"
mount_command="if mountpoint -q '${local_mount}'; then fusermount3 -z -u '${local_mount}'; fi; for i in {1..10}; do /bin/dfuse -m '${local_mount}' --pool default-pool --container default-container --multi-user ${mount_options} --foreground && break; sleep 1; done"

# Construct the service name with the local_mount suffix
service_name="mount_parallelstore_${local_mount//\//_}.service"
service_name="mount_parallelstore_${replace(local_mount, "/", "_")}.service"

# --- Begin: Add systemd service creation ---
cat >/usr/lib/systemd/system/"${service_name}" <<EOF
cat > "/usr/lib/systemd/system/$${service_name}" <<EOF
[Unit]
Description=DAOS Mount Service
After=network-online.target daos_agent.service
Expand All @@ -99,14 +92,17 @@ Group=root
Restart=always
RestartSec=1
ExecStart=/bin/bash -c "$mount_command"
ExecStop=fusermount3 -u '$local_mount'
ExecStop=fusermount3 -z -u '${local_mount}'
%{ for env_key, env_value in dfuse_environment ~}
Environment="${env_key}=${env_value}"
%{ endfor ~}
[Install]
WantedBy=multi-user.target
EOF

systemctl enable "${service_name}"
systemctl start "${service_name}"
systemctl enable "$${service_name}"
systemctl start "$${service_name}"
# --- End: Add systemd service creation ---

exit 0
10 changes: 10 additions & 0 deletions modules/file-system/pre-existing-network-storage/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -41,4 +41,14 @@ variable "mount_options" {
description = "Options describing various aspects of the file system. Consider adding setting to 'defaults,_netdev,implicit_dirs' when using gcsfuse."
type = string
default = "defaults,_netdev"
nullable = false
}

variable "parallelstore_options" {
description = "Parallelstore specific options"
type = object({
daos_agent_config = optional(string, "")
dfuse_environment = optional(map(string), {})
})
default = {}
}
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ stdlib::runner() {
stdlib::info "=== start executing runner: $object ==="
case "$1" in
ansible-local) stdlib::run_playbook "$destpath/$filename" "$args";;
shell) chmod u+x /$destpath/$filename && ./$destpath/$filename $args;;
shell) chmod u+x /$destpath/$filename && $destpath/$filename $args;;
esac

exit_code=$?
Expand Down
4 changes: 2 additions & 2 deletions tools/duplicate-diff.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,8 +83,8 @@
"modules/file-system/pre-existing-network-storage/scripts/install-daos-client.sh",
],
[
"modules/file-system/parallelstore/scripts/mount-daos.sh",
"modules/file-system/pre-existing-network-storage/scripts/mount-daos.sh",
"modules/file-system/parallelstore/templates/mount-daos.sh.tftpl",
"modules/file-system/pre-existing-network-storage/templates/mount-daos.sh.tftpl",
],
[
"modules/compute/vm-instance/compute_image.tf"
Expand Down

0 comments on commit 87f6475

Please sign in to comment.