Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change added USER_HASH in models name to full digits (8) instead of half (4) to prevent potential hash conflict #4592

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

andylizf
Copy link
Contributor

@andylizf andylizf commented Jan 18, 2025

Resolves #4143

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Any manual or new tests for this PR (please specify below)
  • All smoke tests: pytest tests/test_smoke.py
  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
  • Backward compatibility tests: conda deactivate; bash -i tests/backward_compatibility_tests.sh

Uploading image.png…

@andylizf
Copy link
Contributor Author

After a thorough investigation, I found changing cluster_name_on_cloud in _provision changes the handle and cluster config yaml, which further affects sky launch/start for restarting the existing clusters (INIT/STOP) with already generated cluster_name_on_cloud in the existing cluster config yaml.

Luckily, I found that #1235 had already solved this backward compatibility issue by restoring the old cluster_name_on_cloud before generating a new cluster_name_on_cloud to overwrite it. Thus, we can promise that the leakage never happens by losing the old closter_name_on_cloud.

@andylizf
Copy link
Contributor Author

@cblmemo PTAL, thanks!

@andylizf andylizf marked this pull request as ready for review January 19, 2025 07:37
Copy link
Collaborator

@cblmemo cblmemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this @andylizf ! This looks good to me. cc @Michaelvll for a look here to make sure every thing looks good to you

@@ -182,7 +182,7 @@ def make_cluster_name_on_cloud(display_name: str,
f'on the cloud, we convert it to {cluster_name_on_cloud}.')
user_hash = ''
if add_user_hash:
user_hash = get_user_hash()[:USER_HASH_LENGTH_IN_CLUSTER_NAME]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this constant still in use after this deletion? if not we can remove it.

Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! Thanks @andylizf! Can we see if there is any manual backward compatibility tests other than the script to make sure this does not break any existing cluster, jobs, or services?

@@ -182,7 +182,7 @@ def make_cluster_name_on_cloud(display_name: str,
f'on the cloud, we convert it to {cluster_name_on_cloud}.')
user_hash = ''
if add_user_hash:
user_hash = get_user_hash()[:USER_HASH_LENGTH_IN_CLUSTER_NAME]
user_hash = get_user_hash()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: we can further increase the character space for the user hash from hex to base36, i.e. update the common_utils.generate_user_hash to further encode the user_hash with common_utils.base36_encode.

@Michaelvll Michaelvll added this to the v0.8.0 milestone Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG][AWS] Serving a new controller can result in terminating previously served models.
3 participants