Piotrm api server #30

piotrm-nvidia · 2025-01-17T18:26:04Z

No description provided.

…ence-server/triton-distributed into nnshah1-hello-world

…bled

rmccorm4 · 2025-01-17T18:44:59Z

examples/api_server_open_ai/operators/server.py

+    app = FastAPI()
+    create_app(connector, app)


This logic will need to change a little, as the T2 OpenAI Frontend defines and wraps a FastAPI app in the FastApiFrontend object. It uses a similar helper under the hood, but this encapsulates the definitions and schemas as well.

rmccorm4 · 2025-01-17T18:45:52Z

examples/api_server_open_ai/operators/api_server_operator.py

+    def start_server(self):
+        """
+        Launch uvicorn in a background thread or so
+        """
+        config = uvicorn.Config(self.app, host="0.0.0.0", port=8080, log_level="info")
+        self.server = uvicorn.Server(config)
+        self._logger.info("Starting uvicorn server for openai endpoints.")
+        self.server.run()


This should do something like this instead to use the unified OpenAI semantics - see related comment here: https://github.com/triton-inference-server/triton-distributed/pull/30/files#r1920609943

frontend = FastApiFrontend() frontend.start()

piotrm-nvidia · 2025-01-21T20:27:54Z

You can run deploy for API server:

python3 -m api_server_open_ai.deploy

The standard output log doesn't indicate much:

20:13:32 deployment.py:115[triton_distributed.worker.deployment] INFO: 

Starting Worker:

	Config:
	WorkerConfig(request_plane=<class 'triton_distributed.icp.nats_request_plane.NatsRequestPlane'>,
             data_plane=<function UcpDataPlane at 0x7f06c3930220>,
             request_plane_args=(['nats://localhost:4223'], {}),
             data_plane_args=([], {}),
             log_level=1,
             operators=[OperatorConfig(name='api_server',
                                       implementation='ApiServerOperator',
                                       repository=None,
                                       version=1,
                                       max_inflight_requests=1,
                                       parameters={},
                                       log_level=None)],
             triton_log_path=None,
             name='api_server.0',
             log_dir='/workspace/examples/api_server_open_ai/logs',
             metrics_port=50000)
	<SpawnProcess name='api_server.0' parent=2629 initial>

double free or corruption (out)

The example produces many logs is subfolder related to location of source code:

root@PDX01-M03-I41-DGX-A100-DL-1:/workspace# ls -lt examples/api_server_open_ai/logs
total 126
-rw-r--r-- 1 root dip 8240 Jan 21 20:13 api_server.0.35451cfd-d834-11ef-b334-5cff35fb93c3.2765.triton.log
-rw-r--r-- 1 root dip  457 Jan 21 20:13 api_server.0.35451cfd-d834-11ef-b334-5cff35fb93c3.2765.stderr.log
-rw-r--r-- 1 root dip 2065 Jan 21 20:13 api_server.0.35451cfd-d834-11ef-b334-5cff35fb93c3.2765.stdout.log
-rw-r--r-- 1 root dip 1560 Jan 21 20:13 nats_server.stderr.log

The logs again don't show much except api_server.0.35451cfd-d834-11ef-b334-5cff35fb93c3.2765.stdout.log:

20:13:45 worker.py:118[Triton Worker] ERROR: can't instantiate operator: api_server Empty module name
Traceback (most recent call last):
  File "/workspace/worker/src/python/triton_distributed/worker/worker.py", line 115, in _import_operators
    module = importlib.import_module(module_path.name)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/importlib/__init__.py", line 90, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1384, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1298, in _sanity_check
ValueError: Empty module name
20:13:45 worker.py:262[Triton Worker] ERROR: Encountered an error in worker: Empty module name
Traceback (most recent call last):
  File "/workspace/worker/src/python/triton_distributed/worker/worker.py", line 256, in serve
    self._import_operators()
  File "/workspace/worker/src/python/triton_distributed/worker/worker.py", line 121, in _import_operators
    raise e
  File "/workspace/worker/src/python/triton_distributed/worker/worker.py", line 115, in _import_operators
    module = importlib.import_module(module_path.name)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/importlib/__init__.py", line 90, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1384, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1298, in _sanity_check
ValueError: Empty module name
20:13:45 worker.py:265[Triton Worker] INFO: worker store: []
20:13:45 worker.py:266[Triton Worker] INFO: Worker stopped...
20:13:45 worker.py:267[Triton Worker] INFO: Hosted Operators: {} Requests Received: Counter() Responses Sent: Counter()
20:13:45 worker.py:330[Triton Worker] INFO: Stopping the event loop
20:13:45 worker.py:378[Triton Worker] INFO: Successfully shutdown worker.

The operator binding code:

    # define all your worker configs as before: encoder, decoder, etc.
    api_server_op = OperatorConfig(
        name="api_server",
        implementation="ApiServerOperator",  # matches the .py file's operator class
        max_inflight_requests=1,
    )

    api_server = WorkerConfig(operators=[api_server_op], name="api_server")

    deployment = Deployment(
        [
            (api_server, 1),
        ],
        initialize_request_plane=True,
        log_dir=args.log_dir,
        log_level=args.log_level,
    )

What is wrong here?

piotrm-nvidia · 2025-01-22T19:53:41Z

You need to modify the PYTHONPATH to point to the correct directories:

export PYTHONPATH=<YOUR REPO>/triton-distributed/worker/src/python:<YOUR REPOT>/triton-distributed/icp/src/python:<YOUR REPO>/triton-distributed/examples:/opt/tritonserver/openai/openai_frontend:<YOUR REPO>/triton-distributed/examples/api_server_open_ai/api_endpoint

curl -X POST http://localhost:7777/v1/chat/completions \
    -H 'Content-Type: application/json' \
    -d '{
      "model": "llama",
      "messages": [{"role": "system", "content": "What is the capital of France?"}]
    }'

Failure:

INFO:     127.0.0.1:38658 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request

piotrm-nvidia · 2025-01-23T12:35:02Z

Different approach was implemented in this PR: #46

nnshah1 and others added 30 commits January 15, 2025 07:23

updated with skeleton / straw man structure

9a67e25

updated

44304db

updated

f4a1ef0

moving under operators

62b20c5

updated skeleton

7dd83c1

Merge branch 'main' into nnshah1-hello-world

aafe096

updated with minimal example - not yet tested

cad9d3f

Merge branch 'nnshah1-hello-world' of https://github.com/triton-infer…

41abd50

…ence-server/triton-distributed into nnshah1-hello-world

updated to use deployment

97c752d

updating with single_file separate from main

b0f3464

updating

991358b

docs: Add descriptions for example and main folder

bd57e10

updating to add operator to operators folder for us in 'deploy'

1109f65

Merge branch 'nnshah1-hello-world' of https://github.com/triton-infer…

d16811c

…ence-server/triton-distributed into nnshah1-hello-world

updates to remove operator from deploy

88de51c

incremental updates

a7face5

removing temporary files

c52b3ad

putting more common handling into deployment

ff72717

updating default for single file

3611828

Merge branch 'main' into nnshah1-hello-world

a01fd7b

updating with deploy app

958f5f8

updated with signal handler for exit

c4e6fed

adding parser

00aedbc

removing gitkeep

c2dc190

adding in initial client - still testing

3a74bcb

updated with working client

e7bd20f

updated to remove version

369259f

removing version from tests

9f56b11

adding basic exit code detection to help with sanity tests

dadd9f0

adding sanity test

2da91d7

nnshah1 and others added 12 commits January 17, 2025 00:29

removing unnecessary files

5748a15

Merge branch 'main' into nnshah1-hello-world

f62b18e

updates for static type checks

26fe1d8

adding sanity test to pre-merge until we have additional triggers ena…

9421700

…bled

reduce number of requests for sanity test

7298502

Merge branch 'main' into nnshah1-hello-world

aac2881

removing version from RemoteOperator calls

444a82a

adding copyright

671ce6a

updating copyright

2d0f1db

Add API server draft

8f2173c

Add tests for API server

0b5b814

Add more changes

45e2e46

piotrm-nvidia temporarily deployed to GITLAB January 17, 2025 18:26 — with GitHub Actions Inactive

piotrm-nvidia temporarily deployed to GITLAB January 17, 2025 18:27 — with GitHub Actions Inactive

rmccorm4 reviewed Jan 17, 2025

View reviewed changes

Base automatically changed from nnshah1-hello-world to main January 17, 2025 23:57

piotrm-nvidia added 2 commits January 21, 2025 19:23

Add server thread

b10479e

Adjust sleep to block server

fac9da6

piotrm-nvidia marked this pull request as ready for review January 22, 2025 09:12

piotrm-nvidia added 6 commits January 22, 2025 10:39

Add README.md to API server

58a97e9

Adjustr import for Worker config

32c50d5

Wrapp connector logic with LLMEngine

f5ba2ce

Add parameters parsing

3a79d11

Fix uvicorn harness in operator

9d7e8bd

Shift FastAPI endpoint outside operator

48b6a83

piotrm-nvidia closed this Jan 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Piotrm api server #30

Piotrm api server #30

piotrm-nvidia commented Jan 17, 2025

rmccorm4 Jan 17, 2025

rmccorm4 Jan 17, 2025 •

edited

Loading

piotrm-nvidia commented Jan 21, 2025

piotrm-nvidia commented Jan 22, 2025

piotrm-nvidia commented Jan 23, 2025

Piotrm api server #30

Piotrm api server #30

Conversation

piotrm-nvidia commented Jan 17, 2025

rmccorm4 Jan 17, 2025

Choose a reason for hiding this comment

rmccorm4 Jan 17, 2025 • edited Loading

Choose a reason for hiding this comment

piotrm-nvidia commented Jan 21, 2025

piotrm-nvidia commented Jan 22, 2025

piotrm-nvidia commented Jan 23, 2025

rmccorm4 Jan 17, 2025 •

edited

Loading