Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SKY-1304] AWS throws InvalidGroup.Duplicate when user submits a lot of parallel launch #4584

Merged
merged 5 commits into from
Jan 21, 2025

Conversation

weih1121
Copy link
Contributor

@weih1121 weih1121 commented Jan 17, 2025

Fix SKY 1304
Fixes #4578
Root cause:

Solution:

  • When InvalidGroup.Duplicate exception occurs during security groups creating, query security again and return the created else raise created but not found exception.

Test:

D 01-17 17:49:33 config.py:88] Creating or updating security groups...
W 01-17 17:49:34 config.py:593] sky-sg-hong-4292 already exists when creating.
I 01-17 17:49:37 config.py:598] Found existing security group sky-sg-hong-4292 [id=sg-0f255f9fecda37b1f]
I 01-17 17:49:37 config.py:103] Security groups created or updated in 3.44550 seconds.

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Any manual or new tests for this PR (please specify below)
  • All smoke tests: pytest tests/test_smoke.py
  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
  • Backward compatibility tests: conda deactivate; bash -i tests/backward_compatibility_tests.sh

@zpoint
Copy link
Collaborator

zpoint commented Jan 17, 2025

/quicktest-core

@Michaelvll Michaelvll requested a review from cblmemo January 17, 2025 20:53
Copy link
Collaborator

@cblmemo cblmemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @weih1121 for fixing this! LGTM. Left some nits.

sky/provision/aws/config.py Outdated Show resolved Hide resolved
sky/provision/aws/config.py Outdated Show resolved Hide resolved
@Michaelvll
Copy link
Collaborator

/smoke-test aws

@weih1121
Copy link
Contributor Author

/smoke-test aws

@weih1121 weih1121 requested a review from cblmemo January 18, 2025 09:10
@weih1121
Copy link
Contributor Author

@cblmemo refactor to make it cleaner, it'd be better if you can help review again thanks~

@weih1121
Copy link
Contributor Author

/smoke-test aws

Copy link
Collaborator

@cblmemo cblmemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM. I took a quick look and seems like there are only one smoke test failed. Could you double check it is irrelevant to this PR (e.g. also failed on master) and maybe submit an issue for this? After that it should be ready to go!

cc @Michaelvll to double check here.

@weih1121
Copy link
Contributor Author

Thanks @cblmemo for the review, the smoke test fix PR is work in progress.

@cblmemo
Copy link
Collaborator

cblmemo commented Jan 21, 2025

Thanks @weih1121 again for contributing to skypilot! Looks great to me ;) merging now!

@cblmemo cblmemo merged commit 8cf6c86 into skypilot-org:master Jan 21, 2025
18 of 19 checks passed
@weih1121 weih1121 deleted the dev/hong/dup-sg branch January 21, 2025 08:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[AWS] AWS throws InvalidGroup.Duplicate when user submits a lot of parallel launch
4 participants