-
Notifications
You must be signed in to change notification settings - Fork 563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Serve] Simplify endpoint logic in sky.serve.controller
and improve input validation
#4043
base: master
Are you sure you want to change the base?
Conversation
… input validation Fixes skypilot-org#4041 Simplify endpoint logic in `sky/serve/controller.py` and improve input validation.
@cblmemo Do you think we should go further and merge the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the refactoring @andylizf ! This is awesome. Left some discussions :)
Co-authored-by: Tian Xia <[email protected]>
Co-authored-by: Tian Xia <[email protected]>
8085288
to
af226a7
Compare
All clear. |
Thanks for adding this @andylizf ! I tried this today and got the following error. could you help take a look on what is happening here? $ ssu examples/serve/http_server/task.yaml -n http --cloud aws
Service from YAML spec: examples/serve/http_server/task.yaml
Service Spec:
Readiness probe method: GET /health
Readiness initial delay seconds: 20
Readiness probe timeout seconds: 15
Replica autoscaling policy: Fixed 2 replicas
Spot Policy: No spot fallback policy
Each replica will use the following resources (estimated):
I 10-16 09:52:42 optimizer.py:719] == Optimizer ==
I 10-16 09:52:42 optimizer.py:730] Target: minimizing cost
I 10-16 09:52:42 optimizer.py:742] Estimated cost: $0.1 / hour
I 10-16 09:52:42 optimizer.py:742]
I 10-16 09:52:42 optimizer.py:867] Considered resources (1 node):
I 10-16 09:52:42 optimizer.py:937] ----------------------------------------------------------------------------------------
I 10-16 09:52:42 optimizer.py:937] CLOUD INSTANCE vCPUs Mem(GB) ACCELERATORS REGION/ZONE COST ($) CHOSEN
I 10-16 09:52:42 optimizer.py:937] ----------------------------------------------------------------------------------------
I 10-16 09:52:42 optimizer.py:937] AWS m6i.large 2 8 - us-east-1 0.10 ✔
I 10-16 09:52:42 optimizer.py:937] ----------------------------------------------------------------------------------------
I 10-16 09:52:42 optimizer.py:937]
Launching a new service 'http'. Proceed? [Y/n]:
I 10-16 09:52:44 controller_utils.py:621] Translating workdir to SkyPilot Storage...
I 10-16 09:52:44 controller_utils.py:646] Workdir 'examples/serve/http_server' will be synced to cloud storage 'skypilot-workdir-memory-65aa0832'.
I 10-16 09:52:44 controller_utils.py:719] Uploading sources to cloud storage. See: sky storage ls
I 10-16 09:52:44 storage.py:1448] Created S3 bucket 'skypilot-workdir-memory-65aa0832' in us-east-1
Launching controller for 'http'...
I 10-16 09:52:48 cloud_vm_ray_backend.py:1315] To view detailed progress: tail -n100 -f /home/memory/sky_logs/sky-2024-10-16-09-52-45-985724/provision.log
I 10-16 09:52:48 cloud_vm_ray_backend.py:1226] Cluster 'sky-serve-controller-402b1bba' (status: STOPPED) was previously launched in AWS us-east-1. Relaunching in that region.
I 10-16 09:52:49 provisioner.py:65] Launching on AWS us-east-1 (us-east-1a)
I 10-16 09:53:20 provisioner.py:452] Successfully provisioned or found existing instance.
I 10-16 09:53:46 provisioner.py:554] Successfully provisioned cluster: sky-serve-controller-402b1bba
I 10-16 09:53:46 cloud_vm_ray_backend.py:4457] Processing file mounts.
I 10-16 09:53:47 cloud_vm_ray_backend.py:4483] To view detailed progress: tail -n100 -f ~/sky_logs/sky-2024-10-16-09-52-45-985724/file_mounts.log
I 10-16 09:53:47 backend_utils.py:1343] Syncing (to 1 node): /tmp/service-task-http-53hdmem2 -> ~/.sky/serve/http/task.yaml.tmp
I 10-16 09:53:47 backend_utils.py:1343] Syncing (to 1 node): /tmp/tmp7qdsyda1 -> ~/.sky/serve/http/config.yaml
I 10-16 09:53:48 backend_utils.py:1343] Syncing (to 1 node): /home/memory/.sky/catalogs/v5/aws/az_mappings-6d23a727.csv -> ~/.sky/catalogs/v5/aws/az_mappings-6d23a727.csv
I 10-16 09:53:48 backend_utils.py:1343] Syncing (to 1 node): /home/memory/.sky/catalogs/v5/azure/images.csv -> ~/.sky/catalogs/v5/azure/images.csv
I 10-16 09:53:49 backend_utils.py:1343] Syncing (to 1 node): /home/memory/.sky/catalogs/v5/azure/vms.csv -> ~/.sky/catalogs/v5/azure/vms.csv
I 10-16 09:53:50 backend_utils.py:1343] Syncing (to 1 node): /home/memory/.sky/catalogs/v5/gcp/vms.csv -> ~/.sky/catalogs/v5/gcp/vms.csv
I 10-16 09:53:50 cloud_vm_ray_backend.py:3216] Running setup on 1 node.
Check & install cloud dependencies on controller: Done for 8 clouds.
I 10-16 09:54:01 cloud_vm_ray_backend.py:3229] Setup completed.
I 10-16 09:54:06 cloud_vm_ray_backend.py:3333] Job submitted with Job ID: 2
E 10-16 09:55:06 subprocess_utils.py:85] ValueError: Failed to register service 'http' on the SkyServe controller. Reason:
E 10-16 09:55:06 subprocess_utils.py:85] Traceback (most recent call last):
E 10-16 09:55:06 subprocess_utils.py:85] File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
E 10-16 09:55:06 subprocess_utils.py:85] return _run_code(code, main_globals, None,
E 10-16 09:55:06 subprocess_utils.py:85] File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
E 10-16 09:55:06 subprocess_utils.py:85] exec(code, run_globals)
E 10-16 09:55:06 subprocess_utils.py:85] File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/sky/serve/service.py", line 23, in <module>
E 10-16 09:55:06 subprocess_utils.py:85] from sky.serve import controller
E 10-16 09:55:06 subprocess_utils.py:85] File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/sky/serve/controller.py", line 17, in <module>
E 10-16 09:55:06 subprocess_utils.py:85] from sky.serve import autoscalers
E 10-16 09:55:06 subprocess_utils.py:85] File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/sky/serve/autoscalers.py", line 12, in <module>
E 10-16 09:55:06 subprocess_utils.py:85] from sky.serve import schemas
E 10-16 09:55:06 subprocess_utils.py:85] File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/sky/serve/schemas.py", line 42, in <module>
E 10-16 09:55:06 subprocess_utils.py:85] class LoadBalancerRequest(pydantic.BaseModel):
E 10-16 09:55:06 subprocess_utils.py:85] File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_model_construction.py", line 224, in __new__
E 10-16 09:55:06 subprocess_utils.py:85] complete_model_class(
E 10-16 09:55:06 subprocess_utils.py:85] File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_model_construction.py", line 587, in complete_model_class
E 10-16 09:55:06 subprocess_utils.py:85] schema = gen_schema.clean_schema(schema)
E 10-16 09:55:06 subprocess_utils.py:85] File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_generate_schema.py", line 594, in clean_schema
E 10-16 09:55:06 subprocess_utils.py:85] schema = _discriminated_union.apply_discriminators(schema)
E 10-16 09:55:06 subprocess_utils.py:85] File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_discriminated_union.py", line 58, in apply_discriminators
E 10-16 09:55:06 subprocess_utils.py:85] return _core_utils.walk_core_schema(schema, inner)
E 10-16 09:55:06 subprocess_utils.py:85] File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_core_utils.py", line 416, in walk_core_schema
E 10-16 09:55:06 subprocess_utils.py:85] return f(schema.copy(), _dispatch)
E 10-16 09:55:06 subprocess_utils.py:85] File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_discriminated_union.py", line 48, in inner
E 10-16 09:55:06 subprocess_utils.py:85] s = recurse(s, inner)
E 10-16 09:55:06 subprocess_utils.py:85] File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_core_utils.py", line 197, in walk
E 10-16 09:55:06 subprocess_utils.py:85] return f(schema, self._walk)
E 10-16 09:55:06 subprocess_utils.py:85] File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_discriminated_union.py", line 48, in inner
E 10-16 09:55:06 subprocess_utils.py:85] s = recurse(s, inner)
E 10-16 09:55:06 subprocess_utils.py:85] File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_core_utils.py", line 200, in _walk
E 10-16 09:55:06 subprocess_utils.py:85] schema = self._schema_type_to_method[schema['type']](schema.copy(), f)
E 10-16 09:55:06 subprocess_utils.py:85] File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_core_utils.py", line 240, in handle_definitions_schema
E 10-16 09:55:06 subprocess_utils.py:85] new_inner_schema = self.walk(schema['schema'], f)
E 10-16 09:55:06 subprocess_utils.py:85] File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_core_utils.py", line 197, in walk
E 10-16 09:55:06 subprocess_utils.py:85] return f(schema, self._walk)
E 10-16 09:55:06 subprocess_utils.py:85] File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_discriminated_union.py", line 48, in inner
E 10-16 09:55:06 subprocess_utils.py:85] s = recurse(s, inner)
E 10-16 09:55:06 subprocess_utils.py:85] File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_core_utils.py", line 200, in _walk
E 10-16 09:55:06 subprocess_utils.py:85] schema = self._schema_type_to_method[schema['type']](schema.copy(), f)
E 10-16 09:55:06 subprocess_utils.py:85] File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_core_utils.py", line 209, in _handle_other_schemas
E 10-16 09:55:06 subprocess_utils.py:85] schema['schema'] = self.walk(sub_schema, f) # type: ignore
E 10-16 09:55:06 subprocess_utils.py:85] File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_core_utils.py", line 197, in walk
E 10-16 09:55:06 subprocess_utils.py:85] return f(schema, self._walk)
E 10-16 09:55:06 subprocess_utils.py:85] File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_discriminated_union.py", line 48, in inner
E 10-16 09:55:06 subprocess_utils.py:85] s = recurse(s, inner)
E 10-16 09:55:06 subprocess_utils.py:85] File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_core_utils.py", line 200, in _walk
E 10-16 09:55:06 subprocess_utils.py:85] schema = self._schema_type_to_method[schema['type']](schema.copy(), f)
E 10-16 09:55:06 subprocess_utils.py:85] File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_core_utils.py", line 339, in handle_model_fields_schema
E 10-16 09:55:06 subprocess_utils.py:85] replaced_field['schema'] = self.walk(v['schema'], f)
E 10-16 09:55:06 subprocess_utils.py:85] File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_core_utils.py", line 197, in walk
E 10-16 09:55:06 subprocess_utils.py:85] return f(schema, self._walk)
E 10-16 09:55:06 subprocess_utils.py:85] File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_discriminated_union.py", line 55, in inner
E 10-16 09:55:06 subprocess_utils.py:85] s = apply_discriminator(s, discriminator, global_definitions)
E 10-16 09:55:06 subprocess_utils.py:85] File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_discriminated_union.py", line 97, in apply_discriminator
E 10-16 09:55:06 subprocess_utils.py:85] return _ApplyInferredDiscriminator(discriminator, definitions or {}).apply(schema)
E 10-16 09:55:06 subprocess_utils.py:85] File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_discriminated_union.py", line 191, in apply
E 10-16 09:55:06 subprocess_utils.py:85] schema = self._apply_to_root(schema)
E 10-16 09:55:06 subprocess_utils.py:85] File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_discriminated_union.py", line 227, in _apply_to_root
E 10-16 09:55:06 subprocess_utils.py:85] self._handle_choice(choice)
E 10-16 09:55:06 subprocess_utils.py:85] File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_discriminated_union.py", line 302, in _handle_choice
E 10-16 09:55:06 subprocess_utils.py:85] inferred_discriminator_values = self._infer_discriminator_values_for_choice(choice, source_name=None)
E 10-16 09:55:06 subprocess_utils.py:85] File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_discriminated_union.py", line 381, in _infer_discriminator_values_for_choice
E 10-16 09:55:06 subprocess_utils.py:85] return self._infer_discriminator_values_for_choice(self.definitions[schema_ref], source_name=source_name)
E 10-16 09:55:06 subprocess_utils.py:85] File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_discriminated_union.py", line 363, in _infer_discriminator_values_for_choice
E 10-16 09:55:06 subprocess_utils.py:85] return self._infer_discriminator_values_for_choice(choice['schema'], source_name=choice['cls'].__name__)
E 10-16 09:55:06 subprocess_utils.py:85] File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_discriminated_union.py", line 369, in _infer_discriminator_values_for_choice
E 10-16 09:55:06 subprocess_utils.py:85] return self._infer_discriminator_values_for_model_choice(choice, source_name=source_name)
E 10-16 09:55:06 subprocess_utils.py:85] File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_discriminated_union.py", line 411, in _infer_discriminator_values_for_model_choice
E 10-16 09:55:06 subprocess_utils.py:85] return self._infer_discriminator_values_for_field(field, source)
E 10-16 09:55:06 subprocess_utils.py:85] File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_discriminated_union.py", line 443, in _infer_discriminator_values_for_field
E 10-16 09:55:06 subprocess_utils.py:85] return self._infer_discriminator_values_for_inner_schema(field['schema'], source)
E 10-16 09:55:06 subprocess_utils.py:85] File "/home/ubuntu/skypilot-runtime/lib/python3.10/site-packages/pydantic/_internal/_discriminated_union.py", line 482, in _infer_discriminator_values_for_inner_schema
E 10-16 09:55:06 subprocess_utils.py:85] raise PydanticUserError(
E 10-16 09:55:06 subprocess_utils.py:85] pydantic.errors.PydanticUserError: Model 'RequestsAggregator' needs field 'type' to be of type `Literal`
E 10-16 09:55:06 subprocess_utils.py:85]
E 10-16 09:55:06 subprocess_utils.py:85] For further information visit https://errors.pydantic.dev/2.9/u/discriminator-needs-literal
E 10-16 09:55:06 subprocess_utils.py:85]
E 10-16 09:55:06 subprocess_utils.py:85]
RuntimeError: Failed to spin up the service. Please check the logs above for more details. |
The first reason is we want there be a `RequestsAggregator` base class, so that we can use it both for sending and receiving request information. pydantic requires a literal type field for the base class, which causes a conflict. The second reason is to maintain backwards compatibility. Since when using discriminator union, we need to pass a `type` field to the Pydantic model.
f954ec5
to
5ae6649
Compare
Fixes #4041
Simplify endpoint logic in
sky/serve/controller.py
and improve input validation.