Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Serve] Simplify endpoint logic in sky.serve.controller and improve input validation #4043

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

andylizf
Copy link
Collaborator

@andylizf andylizf commented Oct 6, 2024

Fixes #4041

Simplify endpoint logic in sky/serve/controller.py and improve input validation.

… input validation

Fixes skypilot-org#4041

Simplify endpoint logic in `sky/serve/controller.py` and improve input validation.
@andylizf
Copy link
Collaborator Author

andylizf commented Oct 7, 2024

@cblmemo Do you think we should go further and merge the RequestTimestamp class into schemas.py?
On the one hand, they are completely different, one for sending and the other for receiving and validating.
On the other hand, RequestTimestamp is very shallow and I think it would be nice if we merge them together.

Copy link
Collaborator

@cblmemo cblmemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the refactoring @andylizf ! This is awesome. Left some discussions :)

sky/serve/controller.py Outdated Show resolved Hide resolved
sky/serve/controller.py Show resolved Hide resolved
sky/serve/serve_utils.py Outdated Show resolved Hide resolved
sky/serve/schemas.py Outdated Show resolved Hide resolved
sky/serve/autoscalers.py Outdated Show resolved Hide resolved
sky/serve/controller.py Outdated Show resolved Hide resolved
sky/serve/controller.py Outdated Show resolved Hide resolved
sky/serve/schemas.py Outdated Show resolved Hide resolved
andylizf and others added 2 commits October 11, 2024 17:34
Co-authored-by: Tian Xia <[email protected]>
@andylizf andylizf force-pushed the simplify-endpoint-logic branch from 8085288 to af226a7 Compare October 12, 2024 02:25
@andylizf
Copy link
Collaborator Author

All clear.

@cblmemo
Copy link
Collaborator

cblmemo commented Oct 16, 2024

Thanks for adding this @andylizf ! I tried this today and got the following error. could you help take a look on what is happening here?

$ ssu examples/serve/http_server/task.yaml -n http --cloud aws       
Service from YAML spec: examples/serve/http_server/task.yaml
Service Spec:
Readiness probe method:           GET /health
Readiness initial delay seconds:  20
Readiness probe timeout seconds:  15
Replica autoscaling policy:       Fixed 2 replicas
Spot Policy:                      No spot fallback policy

Each replica will use the following resources (estimated):
I 10-16 09:52:42 optimizer.py:719] == Optimizer ==
I 10-16 09:52:42 optimizer.py:730] Target: minimizing cost
I 10-16 09:52:42 optimizer.py:742] Estimated cost: $0.1 / hour
I 10-16 09:52:42 optimizer.py:742] 
I 10-16 09:52:42 optimizer.py:867] Considered resources (1 node):
I 10-16 09:52:42 optimizer.py:937] ----------------------------------------------------------------------------------------
I 10-16 09:52:42 optimizer.py:937]  CLOUD   INSTANCE    vCPUs   Mem(GB)   ACCELERATORS   REGION/ZONE   COST ($)   CHOSEN   
I 10-16 09:52:42 optimizer.py:937] ----------------------------------------------------------------------------------------
I 10-16 09:52:42 optimizer.py:937]  AWS     m6i.large   2       8         -              us-east-1     0.10I 10-16 09:52:42 optimizer.py:937] ----------------------------------------------------------------------------------------
I 10-16 09:52:42 optimizer.py:937] 
Launching a new service 'http'. Proceed? [Y/n]: 
I 10-16 09:52:44 controller_utils.py:621] Translating workdir to SkyPilot Storage...
I 10-16 09:52:44 controller_utils.py:646] Workdir 'examples/serve/http_server' will be synced to cloud storage 'skypilot-workdir-memory-65aa0832'.
I 10-16 09:52:44 controller_utils.py:719] Uploading sources to cloud storage. See: sky storage ls
I 10-16 09:52:44 storage.py:1448] Created S3 bucket 'skypilot-workdir-memory-65aa0832' in us-east-1
Launching controller for 'http'...
I 10-16 09:52:48 cloud_vm_ray_backend.py:1315] To view detailed progress: tail -n100 -f /home/memory/sky_logs/sky-2024-10-16-09-52-45-985724/provision.log
I 10-16 09:52:48 cloud_vm_ray_backend.py:1226] Cluster 'sky-serve-controller-402b1bba' (status: STOPPED) was previously launched in AWS us-east-1. Relaunching in that region.
I 10-16 09:52:49 provisioner.py:65] Launching on AWS us-east-1 (us-east-1a)
I 10-16 09:53:20 provisioner.py:452] Successfully provisioned or found existing instance.
I 10-16 09:53:46 provisioner.py:554] Successfully provisioned cluster: sky-serve-controller-402b1bba
I 10-16 09:53:46 cloud_vm_ray_backend.py:4457] Processing file mounts.
I 10-16 09:53:47 cloud_vm_ray_backend.py:4483] To view detailed progress: tail -n100 -f ~/sky_logs/sky-2024-10-16-09-52-45-985724/file_mounts.log
I 10-16 09:53:47 backend_utils.py:1343] Syncing (to 1 node): /tmp/service-task-http-53hdmem2 -> ~/.sky/serve/http/task.yaml.tmp
I 10-16 09:53:47 backend_utils.py:1343] Syncing (to 1 node): /tmp/tmp7qdsyda1 -> ~/.sky/serve/http/config.yaml
I 10-16 09:53:48 backend_utils.py:1343] Syncing (to 1 node): /home/memory/.sky/catalogs/v5/aws/az_mappings-6d23a727.csv -> ~/.sky/catalogs/v5/aws/az_mappings-6d23a727.csv
I 10-16 09:53:48 backend_utils.py:1343] Syncing (to 1 node): /home/memory/.sky/catalogs/v5/azure/images.csv -> ~/.sky/catalogs/v5/azure/images.csv
I 10-16 09:53:49 backend_utils.py:1343] Syncing (to 1 node): /home/memory/.sky/catalogs/v5/azure/vms.csv -> ~/.sky/catalogs/v5/azure/vms.csv
I 10-16 09:53:50 backend_utils.py:1343] Syncing (to 1 node): /home/memory/.sky/catalogs/v5/gcp/vms.csv -> ~/.sky/catalogs/v5/gcp/vms.csv
I 10-16 09:53:50 cloud_vm_ray_backend.py:3216] Running setup on 1 node.
Check & install cloud dependencies on controller: Done for 8 clouds.  
I 10-16 09:54:01 cloud_vm_ray_backend.py:3229] Setup completed.
I 10-16 09:54:06 cloud_vm_ray_backend.py:3333] Job submitted with Job ID: 2

E 10-16 09:55:06 subprocess_utils.py:85] ValueError: Failed to register service 'http' on the SkyServe controller. Reason:
E 10-16 09:55:06 subprocess_utils.py:85] Traceback (most recent call last):
E 10-16 09:55:06 subprocess_utils.py:85]   File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
E 10-16 09:55:06 subprocess_utils.py:85]     return _run_code(code, main_globals, None,
E 10-16 09:55:06 subprocess_utils.py:85]   File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
E 10-16 09:55:06 subprocess_utils.py:85]     exec(code, run_globals)
E 10-16 09:55:06 subprocess_utils.py:85]   File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/sky/serve/service.py", line 23, in <module>
E 10-16 09:55:06 subprocess_utils.py:85]     from sky.serve import controller
E 10-16 09:55:06 subprocess_utils.py:85]   File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/sky/serve/controller.py", line 17, in <module>
E 10-16 09:55:06 subprocess_utils.py:85]     from sky.serve import autoscalers
E 10-16 09:55:06 subprocess_utils.py:85]   File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/sky/serve/autoscalers.py", line 12, in <module>
E 10-16 09:55:06 subprocess_utils.py:85]     from sky.serve import schemas
E 10-16 09:55:06 subprocess_utils.py:85]   File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/sky/serve/schemas.py", line 42, in <module>
E 10-16 09:55:06 subprocess_utils.py:85]     class LoadBalancerRequest(pydantic.BaseModel):
E 10-16 09:55:06 subprocess_utils.py:85]   File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_model_construction.py", line 224, in __new__
E 10-16 09:55:06 subprocess_utils.py:85]     complete_model_class(
E 10-16 09:55:06 subprocess_utils.py:85]   File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_model_construction.py", line 587, in complete_model_class
E 10-16 09:55:06 subprocess_utils.py:85]     schema = gen_schema.clean_schema(schema)
E 10-16 09:55:06 subprocess_utils.py:85]   File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_generate_schema.py", line 594, in clean_schema
E 10-16 09:55:06 subprocess_utils.py:85]     schema = _discriminated_union.apply_discriminators(schema)
E 10-16 09:55:06 subprocess_utils.py:85]   File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_discriminated_union.py", line 58, in apply_discriminators
E 10-16 09:55:06 subprocess_utils.py:85]     return _core_utils.walk_core_schema(schema, inner)
E 10-16 09:55:06 subprocess_utils.py:85]   File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_core_utils.py", line 416, in walk_core_schema
E 10-16 09:55:06 subprocess_utils.py:85]     return f(schema.copy(), _dispatch)
E 10-16 09:55:06 subprocess_utils.py:85]   File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_discriminated_union.py", line 48, in inner
E 10-16 09:55:06 subprocess_utils.py:85]     s = recurse(s, inner)
E 10-16 09:55:06 subprocess_utils.py:85]   File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_core_utils.py", line 197, in walk
E 10-16 09:55:06 subprocess_utils.py:85]     return f(schema, self._walk)
E 10-16 09:55:06 subprocess_utils.py:85]   File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_discriminated_union.py", line 48, in inner
E 10-16 09:55:06 subprocess_utils.py:85]     s = recurse(s, inner)
E 10-16 09:55:06 subprocess_utils.py:85]   File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_core_utils.py", line 200, in _walk
E 10-16 09:55:06 subprocess_utils.py:85]     schema = self._schema_type_to_method[schema['type']](schema.copy(), f)
E 10-16 09:55:06 subprocess_utils.py:85]   File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_core_utils.py", line 240, in handle_definitions_schema
E 10-16 09:55:06 subprocess_utils.py:85]     new_inner_schema = self.walk(schema['schema'], f)
E 10-16 09:55:06 subprocess_utils.py:85]   File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_core_utils.py", line 197, in walk
E 10-16 09:55:06 subprocess_utils.py:85]     return f(schema, self._walk)
E 10-16 09:55:06 subprocess_utils.py:85]   File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_discriminated_union.py", line 48, in inner
E 10-16 09:55:06 subprocess_utils.py:85]     s = recurse(s, inner)
E 10-16 09:55:06 subprocess_utils.py:85]   File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_core_utils.py", line 200, in _walk
E 10-16 09:55:06 subprocess_utils.py:85]     schema = self._schema_type_to_method[schema['type']](schema.copy(), f)
E 10-16 09:55:06 subprocess_utils.py:85]   File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_core_utils.py", line 209, in _handle_other_schemas
E 10-16 09:55:06 subprocess_utils.py:85]     schema['schema'] = self.walk(sub_schema, f)  # type: ignore
E 10-16 09:55:06 subprocess_utils.py:85]   File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_core_utils.py", line 197, in walk
E 10-16 09:55:06 subprocess_utils.py:85]     return f(schema, self._walk)
E 10-16 09:55:06 subprocess_utils.py:85]   File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_discriminated_union.py", line 48, in inner
E 10-16 09:55:06 subprocess_utils.py:85]     s = recurse(s, inner)
E 10-16 09:55:06 subprocess_utils.py:85]   File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_core_utils.py", line 200, in _walk
E 10-16 09:55:06 subprocess_utils.py:85]     schema = self._schema_type_to_method[schema['type']](schema.copy(), f)
E 10-16 09:55:06 subprocess_utils.py:85]   File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_core_utils.py", line 339, in handle_model_fields_schema
E 10-16 09:55:06 subprocess_utils.py:85]     replaced_field['schema'] = self.walk(v['schema'], f)
E 10-16 09:55:06 subprocess_utils.py:85]   File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_core_utils.py", line 197, in walk
E 10-16 09:55:06 subprocess_utils.py:85]     return f(schema, self._walk)
E 10-16 09:55:06 subprocess_utils.py:85]   File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_discriminated_union.py", line 55, in inner
E 10-16 09:55:06 subprocess_utils.py:85]     s = apply_discriminator(s, discriminator, global_definitions)
E 10-16 09:55:06 subprocess_utils.py:85]   File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_discriminated_union.py", line 97, in apply_discriminator
E 10-16 09:55:06 subprocess_utils.py:85]     return _ApplyInferredDiscriminator(discriminator, definitions or {}).apply(schema)
E 10-16 09:55:06 subprocess_utils.py:85]   File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_discriminated_union.py", line 191, in apply
E 10-16 09:55:06 subprocess_utils.py:85]     schema = self._apply_to_root(schema)
E 10-16 09:55:06 subprocess_utils.py:85]   File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_discriminated_union.py", line 227, in _apply_to_root
E 10-16 09:55:06 subprocess_utils.py:85]     self._handle_choice(choice)
E 10-16 09:55:06 subprocess_utils.py:85]   File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_discriminated_union.py", line 302, in _handle_choice
E 10-16 09:55:06 subprocess_utils.py:85]     inferred_discriminator_values = self._infer_discriminator_values_for_choice(choice, source_name=None)
E 10-16 09:55:06 subprocess_utils.py:85]   File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_discriminated_union.py", line 381, in _infer_discriminator_values_for_choice
E 10-16 09:55:06 subprocess_utils.py:85]     return self._infer_discriminator_values_for_choice(self.definitions[schema_ref], source_name=source_name)
E 10-16 09:55:06 subprocess_utils.py:85]   File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_discriminated_union.py", line 363, in _infer_discriminator_values_for_choice
E 10-16 09:55:06 subprocess_utils.py:85]     return self._infer_discriminator_values_for_choice(choice['schema'], source_name=choice['cls'].__name__)
E 10-16 09:55:06 subprocess_utils.py:85]   File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_discriminated_union.py", line 369, in _infer_discriminator_values_for_choice
E 10-16 09:55:06 subprocess_utils.py:85]     return self._infer_discriminator_values_for_model_choice(choice, source_name=source_name)
E 10-16 09:55:06 subprocess_utils.py:85]   File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_discriminated_union.py", line 411, in _infer_discriminator_values_for_model_choice
E 10-16 09:55:06 subprocess_utils.py:85]     return self._infer_discriminator_values_for_field(field, source)
E 10-16 09:55:06 subprocess_utils.py:85]   File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_discriminated_union.py", line 443, in _infer_discriminator_values_for_field
E 10-16 09:55:06 subprocess_utils.py:85]     return self._infer_discriminator_values_for_inner_schema(field['schema'], source)
E 10-16 09:55:06 subprocess_utils.py:85]   File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_discriminated_union.py", line 482, in _infer_discriminator_values_for_inner_schema
E 10-16 09:55:06 subprocess_utils.py:85]     raise PydanticUserError(
E 10-16 09:55:06 subprocess_utils.py:85] pydantic.errors.PydanticUserError: Model 'RequestsAggregator' needs field 'type' to be of type `Literal`
E 10-16 09:55:06 subprocess_utils.py:85] 
E 10-16 09:55:06 subprocess_utils.py:85] For further information visit https://errors.pydantic.dev/2.9/u/discriminator-needs-literal
E 10-16 09:55:06 subprocess_utils.py:85] 
E 10-16 09:55:06 subprocess_utils.py:85] 
RuntimeError: Failed to spin up the service. Please check the logs above for more details.

The first reason is we want there be a `RequestsAggregator` base class,
so that we can use it both for sending and receiving request information.
pydantic requires a literal type field for the base class, which causes
a conflict.

The second reason is to maintain backwards compatibility. Since when using
discriminator union, we need to pass a `type` field to the Pydantic model.
@andylizf andylizf force-pushed the simplify-endpoint-logic branch from f954ec5 to 5ae6649 Compare October 17, 2024 02:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Serve] Simplify endpoint logic in sky.serve.controller and improve input validation
2 participants