[Windows] Support CPU shared memory (Client/Frontend) #7048

fpetrini15 · 2024-03-27T18:12:20Z

Goal: Support CPU shared memory between the server and client for Windows

Sub-goals: Modify L0_shared_memory to run on bare-metal Windows using only Python.

Client changes: triton-inference-server/client#551

Some things to note:

When I can verify that the Linux tests pass using only the Python script, I will remove test.sh
L0_shared_memory uses a graphdef model by default. I swapped it with Python so that it would be supported on both Windows and Linux. I still need to go back and investigate how the model ends up in L0_shared_memory (not generated by script) and remove it.
Some of the default paths need to be modified to reflect the testing environment and will be modified pre-merge.

* Update README and versions for 2.36.0 / 23.07 * Update Dockerfile.win10.min * Fix formating issue * fix formating issue * Fix whitespaces * Fix whitespaces * Fix whitespaces

* Reduce instance count to 1 for python bls model loading test * Add comment when calling unload

* Fix queue test to expect exact number of failures * Increase the execution time to more accurately capture requests

…yment (fix #6047) (#6100) Signed-off-by: Xiaodong Ye <[email protected]>

…6063) * Adding tests for bls * Added fixme, cleaned previous commit * Removed unused imports * Fixing commit tree: Refactor code, so that OTel tracer provider is initialized only once Added resource cmd option, testig Added docs * Clean up * Update docs/user_guide/trace.md Co-authored-by: Ryan McCormick <[email protected]> * Revision * Update doc * Clean up * Added ostream exporter to OpenTelemetry for testing purposes; refactored trace tests * Added opentelemetry trace collector set up to tests; refactored otel exporter tests to use OTel collector instead of netcat * Revising according to comments * Added comment regarding 'parent_span_id' * Added permalink * Adjusted test --------- Co-authored-by: Ryan McCormick <[email protected]>

Add tests for python 3.8-3.11 for L0_python_backends

* Improve L0_backend_python debugging * Use utils function for artifacts collection

…6149)

Update docs with NVAIE messaging

…#6140) * Remove test checking for --shape option * Remove the entire test

…same time (#6150) * Add test when unload/load requests for same model received the same time * Add test_same_model_overlapping_load_unload * Use a load/unload stress test instead * Pre-merge test name update * Address pre-commit error * Revert "Address pre-commit error" This reverts commit 781cab1. * Record number of occurrence of each exception

* Add end-to-end CI test for decoupled model support * Address feedback

* added debugging guide * Run pre-commit --------- Co-authored-by: David Yastremsky <[email protected]>

* Add utility functions for outlier removal * Fix functions * Add newline to end of file

* Testing: add gc collect to make sure gpu tensor is deallocated * Address comment

* Initial commit * Cleanup using new standard formatting * QA test restructuring * Add newline to the end of test.sh * HTTP/GRCP protocol changed to pivot on ready status & error status. Log file name changed in qa test. * Fixing unhandled error memory leak * Handle index function memory leak fix

nv-kmcgill53

This review is a bit ramble-y but it's very tricky as well. You've done a great job so far, I'm teasing out the nuances so you can provide a good template of how to program for multiple OSes in a sustainable way.

docs/protocol/extension_shared_memory.md

qa/common/util.py

src/shared_memory_manager.cc

src/shared_memory_manager.h

src/shared_memory_manager.cc

src/shared_memory_manager.h

* Eanable autodocs for python client library * Fixing the format and spelling

* Deprecate dynamic log file * Update error message

…7093)

* Add async execute decoupled test * Add decoupled bls async exec test * Enhance test with different durations for concurrent executes

Add trace_mode and trace_config to getTraceSettingsAPI --------- Co-authored-by: Ryan McCormick <[email protected]>

rmccorm4 · 2024-04-11T18:08:38Z

qa/L0_shared_memory/shared_memory_test.py

-            triton_client = httpclient.InferenceServerClient(_url, verbose=True)
+    # Custom setup method to allow passing of parameters
+    def _setUp(self, protocol, log_file_path):
+        self._tritonserver_ipaddr = os.environ.get("TRITONSERVER_IPADDR", "localhost")


Does this need to be configurable in practice? Do we expect to use shared memory for anything other than a co-located server on localhost?

TBD: Currently on the Windows testing side of things, it's passed in as a variable and is different from "localhost". Still trying to get a CI pipeline up to see the new behavior for this test in particular. Will remove if no issue.

rmccorm4 · 2024-04-11T18:09:58Z

qa/L0_shared_memory/shared_memory_test.py

+        self._build_model_repo()
+        self._build_server_args()
+        self._shared_memory_test_server_log = open(log_file_path, "w")
+        self._server_process = util.run_server(


How does util.run_server interact with test.sh also starting server? Is there conflict or issue there?

I don't believe they should overlap. For this test my ultimate goal is to remove test.sh entirely.

Isn't this also getting run in the linux case that runs test.sh? or is there changes on the gitlab-side to not run test.sh at all?

Ah I see your point. There are changes on the gitlab side such that test.sh will not run at all for Windows. I will attempt to change the Linux test case so that it also will not run test.sh

rmccorm4 · 2024-04-11T18:13:01Z

qa/L0_shared_memory/shared_memory_test.py

+            backend_dir = "C:\\opt\\tritonserver\\backends"
+            model_dir = "C:\\opt\\tritonserver\\qa\\L0_shared_memory\\models"
+            self._server_executable = "C:\\opt\\tritonserver\\bin\\tritonserver.exe"


Probably more of a random note or follow-up, but I was under the impression something like Pathlib.Path("/opt/tritonserver/backends") would translate to "C:\\opt\\tritonserver\\backends" for free when run on Windows. If so you could probably condense the cases to work for both.

Did you see otherwise?

No, I believe you are right. ATM they are set to my local path and were hardcoded for convenience. They need to be modified and will once I determine the CI environment.

rmccorm4 · 2024-04-11T18:25:36Z

qa/L0_shared_memory/shared_memory_test.py

+    # Constant members
+    shared_memory_test_client_log = Path(os.getcwd()) / "client.log"
+    model_dir_path = Path(os.getcwd()) / "models"
+    model_source_path = Path(os.getcwd()).parents[0] / "python_models/add_sub/model.py"


Future follow-up as we expand python utilities for CI testing, but might be nice to have some kind of utils.relative_path([path, to, thing]).

ex maybe something like this:

model_dir_path = utils.relative_path("models") model_source_path = utils.relative_path("..", "python_models", "add_sub", "model.py")

nv-kmcgill53

LGTM. Great work on this!

src/shared_memory_manager.h

src/shared_memory_manager.cc

GuanLuo

Left some comments, can be addressed in the future PR that adds clean up logic

src/shared_memory_manager.cc

kthui and others added 30 commits July 24, 2023 14:03

Fix default instance name (#6097)

a58246f

Removing unused tests (#6085)

1d02ea6

Update post-23.07 release (#6103)

4b3c278

* Update README and versions for 2.36.0 / 23.07 * Update Dockerfile.win10.min * Fix formating issue * fix formating issue * Fix whitespaces * Fix whitespaces * Fix whitespaces

Improve asyncio testing (#6122)

46859db

Reduce instance count to 1 for python bls model loading test (#6130)

4de6d8a

* Reduce instance count to 1 for python bls model loading test * Add comment when calling unload

Fix queue test to expect exact number of failures (#6133)

28da6a6

* Fix queue test to expect exact number of failures * Increase the execution time to more accurately capture requests

Add CPU & GPU metrics in Grafana dashboard.json for K8s op prem deplo…

01c70b9

…yment (fix #6047) (#6100) Signed-off-by: Xiaodong Ye <[email protected]>

Test python environments 3.8-3.11 (#6109)

775d700

Add tests for python 3.8-3.11 for L0_python_backends

Improve L0_backend_python debugging (#6157)

1d97247

* Improve L0_backend_python debugging * Use utils function for artifacts collection

Add unreachable output test for reporting source of disconnectivity (#…

8a7c14c

…6149)

Update 'main' to track development of 2.38.0 / 23.09 (#6163)

2591c87

Fix the versions in the doc (#6164)

1370ec7

Update docs with NVAIE messaging (#6162)

39fafff

Update docs with NVAIE messaging

Add sanity tests for parallel instance loading (#6126)

e87478e

Remove extra whitespace (#6174)

da392f4

Remove a test case that sanity checks input value of --shape CLI flag (…

d126623

…#6140) * Remove test checking for --shape option * Remove the entire test

Make assert failures clearer in L0_trt_plugin (#6166)

e6735e8

Add end-to-end CI test for decoupled model support (#6131) (#6184)

1bb0131

* Add end-to-end CI test for decoupled model support * Address feedback

Test preserve_ordering for oldest strategy sequence batcher (#6185)

d8fd125

added debugging guide (#5924)

c968045

* added debugging guide * Run pre-commit --------- Co-authored-by: David Yastremsky <[email protected]>

Add deadlock gdb section to debug guide (#6193)

ecfe017

Fix character escape in model repository documentation (#6197)

8cca875

Fix docs test (#6192)

5d2ea14

Add utility functions for array manipulation (#6203)

c527cfd

* Add utility functions for outlier removal * Fix functions * Add newline to end of file

Add gc collect to make sure gpu tensor is deallocated (#6205)

0cf0681

* Testing: add gc collect to make sure gpu tensor is deallocated * Address comment

Check for log error on failing to find explicit load model (#6204)

d8931b0

Set default shm size to 1MB for Python backend (#6209)

1423260

nv-kmcgill53 reviewed Apr 9, 2024

View reviewed changes

fpetrini15 requested a review from GuanLuo April 9, 2024 18:26

tanmayv25 and others added 13 commits April 9, 2024 12:51

Enable autodocs for python client library API documentation (#7082)

10f1c8d

* Eanable autodocs for python client library * Fixing the format and spelling

Updated vllm version (#7095)

5e20ef6

Disable Dynamic Log File (#7092)

52f97b5

* Deprecate dynamic log file * Update error message

Initial commit

b1663e9

Fix GPU case

14dacb3

Validate offset

a8914c4

Open shm file, don't create

3de58c9

Intermmediate commit: Major test restructuring.

be0d10e

Formatting

6d99076

Gitbot Fixes

ef2c735

Major software bloat refactor. Opaque shm file handle

35aa5cc

Fixes for Unix and handle-agnostic get restructure

9df15bd

Review comments. Passing ShmFile pointer instead of void

76f9f2b

fpetrini15 force-pushed the fpetrini-win-cpu-shm branch from 15f94bb to a5b6b7e Compare April 11, 2024 17:18

rmccorm4 and others added 3 commits April 11, 2024 10:50

Validate system shared memory region size when registering a region (#…

159b060

…7093)

Decoupled Async Execute (#7062)

196caf0

* Add async execute decoupled test * Add decoupled bls async exec test * Enhance test with different durations for concurrent executes

Add trace mode and trace config entries in trace settings API (#7050)

5b739db

Add trace_mode and trace_config to getTraceSettingsAPI --------- Co-authored-by: Ryan McCormick <[email protected]>

rmccorm4 reviewed Apr 11, 2024

View reviewed changes

rmccorm4 mentioned this pull request Apr 11, 2024

Validate the memory requested for the infer request is not out of bounds #7083

Merged

Merge remote-tracking branch 'origin/main' into fpetrini-win-cpu-shm

2d206e6

nv-kmcgill53 previously approved these changes Apr 11, 2024

View reviewed changes

src/shared_memory_manager.h Show resolved Hide resolved

src/shared_memory_manager.cc Outdated Show resolved Hide resolved

GuanLuo reviewed Apr 11, 2024

View reviewed changes

src/shared_memory_manager.cc Outdated Show resolved Hide resolved

src/shared_memory_manager.cc Outdated Show resolved Hide resolved

Open backing file to validate shared memory

06d1e6b

fpetrini15 dismissed nv-kmcgill53’s stale review via 126f7dc April 13, 2024 20:50

pvijayakrish force-pushed the fpetrini-win-cpu-shm branch from 126f7dc to 06d1e6b Compare January 15, 2025 17:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Windows] Support CPU shared memory (Client/Frontend) #7048

[Windows] Support CPU shared memory (Client/Frontend) #7048

fpetrini15 commented Mar 27, 2024 •

edited

Loading

nv-kmcgill53 left a comment

rmccorm4 Apr 11, 2024

fpetrini15 Apr 11, 2024

rmccorm4 Apr 11, 2024

fpetrini15 Apr 11, 2024

rmccorm4 Apr 11, 2024

fpetrini15 Apr 11, 2024 •

edited

Loading

rmccorm4 Apr 11, 2024

fpetrini15 Apr 11, 2024

rmccorm4 Apr 11, 2024 •

edited

Loading

nv-kmcgill53 left a comment

GuanLuo left a comment

[Windows] Support CPU shared memory (Client/Frontend) #7048

Are you sure you want to change the base?

[Windows] Support CPU shared memory (Client/Frontend) #7048

Conversation

fpetrini15 commented Mar 27, 2024 • edited Loading

nv-kmcgill53 left a comment

Choose a reason for hiding this comment

rmccorm4 Apr 11, 2024

Choose a reason for hiding this comment

fpetrini15 Apr 11, 2024

Choose a reason for hiding this comment

rmccorm4 Apr 11, 2024

Choose a reason for hiding this comment

fpetrini15 Apr 11, 2024

Choose a reason for hiding this comment

rmccorm4 Apr 11, 2024

Choose a reason for hiding this comment

fpetrini15 Apr 11, 2024 • edited Loading

Choose a reason for hiding this comment

rmccorm4 Apr 11, 2024

Choose a reason for hiding this comment

fpetrini15 Apr 11, 2024

Choose a reason for hiding this comment

rmccorm4 Apr 11, 2024 • edited Loading

Choose a reason for hiding this comment

nv-kmcgill53 left a comment

Choose a reason for hiding this comment

GuanLuo left a comment

Choose a reason for hiding this comment

fpetrini15 commented Mar 27, 2024 •

edited

Loading

fpetrini15 Apr 11, 2024 •

edited

Loading

rmccorm4 Apr 11, 2024 •

edited

Loading