Skip to content

Commit

Permalink
feat(runners): add retry logic to default install and start script fo…
Browse files Browse the repository at this point in the history
…r dnf operations (#3787)

## Background
---
This is a continuation of work done in #3748 

There seems to be a race condition some where with the user-data script
and the EC2 starting up and locking RPM. This issue is seen elsewhere,
not necessarily with this repo's user-data, see the following:

[Amazon Linux 2023 - issue with installing packages with
cloud-init](https://repost.aws/questions/QU_tj7NQl6ReKoG53zzEqYOw/amazon-linux-2023-issue-with-installing-packages-with-cloud-init)

[dnf/yum both fails while being executed on instance bootstrap on Amazon
Linux
2023](https://repost.aws/questions/QUgNz4VGCFSC2TYekM-6GiDQ/dnf-yum-both-fails-while-being-executed-on-instance-bootstrap-on-amazon-linux-2023)

Also,
https://github.com/philips-labs/terraform-aws-github-runner/issues/3741

## Changes Made
---
Added a loop to retry if the rpm lock file is found which sleeps for 5
seconds then retries again with a total of 5 iterations. This logic is
now added to the `user-data.sh` for the `upgrade-minimal` operation and
installation of docker, cloudwatch-agent, and curl.

## Testing Done
---
In progress. I'd like to open this up to review while testing.

Co-authored-by: Niek Palm <[email protected]>
  • Loading branch information
jkruse14 and npalm authored Mar 14, 2024
1 parent 8b843ad commit 6a8e1f0
Showing 1 changed file with 35 additions and 4 deletions.
39 changes: 35 additions & 4 deletions modules/runners/templates/user-data.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,22 @@
#!/bin/bash -e

install_with_retry() {
max_attempts=5
attempt_count=0
success=false
while [ $success = false ] && [ $attempt_count -le $max_attempts ]; do
echo "Attempting $attempt_count/$max_attempts: Installing $*"
dnf install -y $*
if [ $? -eq 0 ]; then
success=true
else
echo "Failed to install $1 - retrying"
attempt_count=$(( attempt_count + 1 ))
sleep 5
fi
done
}

exec > >(tee /var/log/user-data.log | logger -t user-data -s 2>/dev/console) 2>&1

# AWS suggest to create a log for debug purpose based on https://aws.amazon.com/premiumsupport/knowledge-center/ec2-linux-log-user-data/
Expand All @@ -15,15 +32,29 @@ set -x

${pre_install}

dnf upgrade-minimal -y
max_attempts=5
attempt_count=0
success=false
while [ $success = false ] && [ $attempt_count -le $max_attempts ]; do
echo "Attempting $attempt_count/$max_attempts: upgrade-minimal"
dnf upgrade-minimal -y
if [ $? -eq 0 ]; then
success=true
else
echo "Failed to run `dnf upgrad-minimal -y` - retrying"
attempt_count=$(( attempt_count + 1 ))
sleep 5
fi
done

# Install docker
dnf install -y docker
install_with_retry docker

service docker start
usermod -a -G docker ec2-user

dnf install -y amazon-cloudwatch-agent jq git
dnf install -y --allowerasing curl
install_with_retry amazon-cloudwatch-agent jq git
install_with_retry --allowerasing curl

user_name=ec2-user

Expand Down

0 comments on commit 6a8e1f0

Please sign in to comment.