Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Piotrm api server #30

Closed
wants to merge 52 commits into from
Closed

Piotrm api server #30

wants to merge 52 commits into from

Conversation

piotrm-nvidia
Copy link
Contributor

No description provided.

Comment on lines +10 to +11
app = FastAPI()
create_app(connector, app)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic will need to change a little, as the T2 OpenAI Frontend defines and wraps a FastAPI app in the FastApiFrontend object. It uses a similar helper under the hood, but this encapsulates the definitions and schemas as well.

Comment on lines 49 to 56
def start_server(self):
"""
Launch uvicorn in a background thread or so
"""
config = uvicorn.Config(self.app, host="0.0.0.0", port=8080, log_level="info")
self.server = uvicorn.Server(config)
self._logger.info("Starting uvicorn server for openai endpoints.")
self.server.run()
Copy link
Contributor

@rmccorm4 rmccorm4 Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should do something like this instead to use the unified OpenAI semantics - see related comment here: https://github.com/triton-inference-server/triton-distributed/pull/30/files#r1920609943

frontend = FastApiFrontend()
frontend.start()

Base automatically changed from nnshah1-hello-world to main January 17, 2025 23:57
@piotrm-nvidia
Copy link
Contributor Author

You can run deploy for API server:

python3 -m api_server_open_ai.deploy

The standard output log doesn't indicate much:

20:13:32 deployment.py:115[triton_distributed.worker.deployment] INFO: 

Starting Worker:

	Config:
	WorkerConfig(request_plane=<class 'triton_distributed.icp.nats_request_plane.NatsRequestPlane'>,
             data_plane=<function UcpDataPlane at 0x7f06c3930220>,
             request_plane_args=(['nats://localhost:4223'], {}),
             data_plane_args=([], {}),
             log_level=1,
             operators=[OperatorConfig(name='api_server',
                                       implementation='ApiServerOperator',
                                       repository=None,
                                       version=1,
                                       max_inflight_requests=1,
                                       parameters={},
                                       log_level=None)],
             triton_log_path=None,
             name='api_server.0',
             log_dir='/workspace/examples/api_server_open_ai/logs',
             metrics_port=50000)
	<SpawnProcess name='api_server.0' parent=2629 initial>

double free or corruption (out)

The example produces many logs is subfolder related to location of source code:

root@PDX01-M03-I41-DGX-A100-DL-1:/workspace# ls -lt examples/api_server_open_ai/logs
total 126
-rw-r--r-- 1 root dip 8240 Jan 21 20:13 api_server.0.35451cfd-d834-11ef-b334-5cff35fb93c3.2765.triton.log
-rw-r--r-- 1 root dip  457 Jan 21 20:13 api_server.0.35451cfd-d834-11ef-b334-5cff35fb93c3.2765.stderr.log
-rw-r--r-- 1 root dip 2065 Jan 21 20:13 api_server.0.35451cfd-d834-11ef-b334-5cff35fb93c3.2765.stdout.log
-rw-r--r-- 1 root dip 1560 Jan 21 20:13 nats_server.stderr.log

The logs again don't show much except api_server.0.35451cfd-d834-11ef-b334-5cff35fb93c3.2765.stdout.log:

20:13:45 worker.py:118[Triton Worker] ERROR: can't instantiate operator: api_server Empty module name
Traceback (most recent call last):
  File "/workspace/worker/src/python/triton_distributed/worker/worker.py", line 115, in _import_operators
    module = importlib.import_module(module_path.name)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/importlib/__init__.py", line 90, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1384, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1298, in _sanity_check
ValueError: Empty module name
20:13:45 worker.py:262[Triton Worker] ERROR: Encountered an error in worker: Empty module name
Traceback (most recent call last):
  File "/workspace/worker/src/python/triton_distributed/worker/worker.py", line 256, in serve
    self._import_operators()
  File "/workspace/worker/src/python/triton_distributed/worker/worker.py", line 121, in _import_operators
    raise e
  File "/workspace/worker/src/python/triton_distributed/worker/worker.py", line 115, in _import_operators
    module = importlib.import_module(module_path.name)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/importlib/__init__.py", line 90, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1384, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1298, in _sanity_check
ValueError: Empty module name
20:13:45 worker.py:265[Triton Worker] INFO: worker store: []
20:13:45 worker.py:266[Triton Worker] INFO: Worker stopped...
20:13:45 worker.py:267[Triton Worker] INFO: Hosted Operators: {} Requests Received: Counter() Responses Sent: Counter()
20:13:45 worker.py:330[Triton Worker] INFO: Stopping the event loop
20:13:45 worker.py:378[Triton Worker] INFO: Successfully shutdown worker.

The operator binding code:

    # define all your worker configs as before: encoder, decoder, etc.
    api_server_op = OperatorConfig(
        name="api_server",
        implementation="ApiServerOperator",  # matches the .py file's operator class
        max_inflight_requests=1,
    )

    api_server = WorkerConfig(operators=[api_server_op], name="api_server")

    deployment = Deployment(
        [
            (api_server, 1),
        ],
        initialize_request_plane=True,
        log_dir=args.log_dir,
        log_level=args.log_level,
    )

What is wrong here?

@piotrm-nvidia piotrm-nvidia marked this pull request as ready for review January 22, 2025 09:12
@piotrm-nvidia
Copy link
Contributor Author

You need to modify the PYTHONPATH to point to the correct directories:

export PYTHONPATH=<YOUR REPO>/triton-distributed/worker/src/python:<YOUR REPOT>/triton-distributed/icp/src/python:<YOUR REPO>/triton-distributed/examples:/opt/tritonserver/openai/openai_frontend:<YOUR REPO>/triton-distributed/examples/api_server_open_ai/api_endpoint

curl -X POST http://localhost:7777/v1/chat/completions \
    -H 'Content-Type: application/json' \
    -d '{
      "model": "llama",
      "messages": [{"role": "system", "content": "What is the capital of France?"}]
    }'

Failure:

INFO:     127.0.0.1:38658 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request

@piotrm-nvidia
Copy link
Contributor Author

Different approach was implemented in this PR: #46

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants