-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Windows] Support CPU shared memory (Client/Frontend) #7048
base: main
Are you sure you want to change the base?
Conversation
* Update README and versions for 2.36.0 / 23.07 * Update Dockerfile.win10.min * Fix formating issue * fix formating issue * Fix whitespaces * Fix whitespaces * Fix whitespaces
* Reduce instance count to 1 for python bls model loading test * Add comment when calling unload
* Fix queue test to expect exact number of failures * Increase the execution time to more accurately capture requests
…yment (fix #6047) (#6100) Signed-off-by: Xiaodong Ye <[email protected]>
…6063) * Adding tests for bls * Added fixme, cleaned previous commit * Removed unused imports * Fixing commit tree: Refactor code, so that OTel tracer provider is initialized only once Added resource cmd option, testig Added docs * Clean up * Update docs/user_guide/trace.md Co-authored-by: Ryan McCormick <[email protected]> * Revision * Update doc * Clean up * Added ostream exporter to OpenTelemetry for testing purposes; refactored trace tests * Added opentelemetry trace collector set up to tests; refactored otel exporter tests to use OTel collector instead of netcat * Revising according to comments * Added comment regarding 'parent_span_id' * Added permalink * Adjusted test --------- Co-authored-by: Ryan McCormick <[email protected]>
Add tests for python 3.8-3.11 for L0_python_backends
* Improve L0_backend_python debugging * Use utils function for artifacts collection
Update docs with NVAIE messaging
…#6140) * Remove test checking for --shape option * Remove the entire test
…same time (#6150) * Add test when unload/load requests for same model received the same time * Add test_same_model_overlapping_load_unload * Use a load/unload stress test instead * Pre-merge test name update * Address pre-commit error * Revert "Address pre-commit error" This reverts commit 781cab1. * Record number of occurrence of each exception
* added debugging guide * Run pre-commit --------- Co-authored-by: David Yastremsky <[email protected]>
* Add utility functions for outlier removal * Fix functions * Add newline to end of file
* Testing: add gc collect to make sure gpu tensor is deallocated * Address comment
* Initial commit * Cleanup using new standard formatting * QA test restructuring * Add newline to the end of test.sh * HTTP/GRCP protocol changed to pivot on ready status & error status. Log file name changed in qa test. * Fixing unhandled error memory leak * Handle index function memory leak fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This review is a bit ramble-y but it's very tricky as well. You've done a great job so far, I'm teasing out the nuances so you can provide a good template of how to program for multiple OSes in a sustainable way.
* Eanable autodocs for python client library * Fixing the format and spelling
* Deprecate dynamic log file * Update error message
15f94bb
to
a5b6b7e
Compare
* Add async execute decoupled test * Add decoupled bls async exec test * Enhance test with different durations for concurrent executes
Add trace_mode and trace_config to getTraceSettingsAPI --------- Co-authored-by: Ryan McCormick <[email protected]>
triton_client = httpclient.InferenceServerClient(_url, verbose=True) | ||
# Custom setup method to allow passing of parameters | ||
def _setUp(self, protocol, log_file_path): | ||
self._tritonserver_ipaddr = os.environ.get("TRITONSERVER_IPADDR", "localhost") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be configurable in practice? Do we expect to use shared memory for anything other than a co-located server on localhost?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBD: Currently on the Windows testing side of things, it's passed in as a variable and is different from "localhost". Still trying to get a CI pipeline up to see the new behavior for this test in particular. Will remove if no issue.
self._build_model_repo() | ||
self._build_server_args() | ||
self._shared_memory_test_server_log = open(log_file_path, "w") | ||
self._server_process = util.run_server( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does util.run_server
interact with test.sh
also starting server? Is there conflict or issue there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe they should overlap. For this test my ultimate goal is to remove test.sh
entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this also getting run in the linux case that runs test.sh
? or is there changes on the gitlab-side to not run test.sh
at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see your point. There are changes on the gitlab side such that test.sh
will not run at all for Windows. I will attempt to change the Linux test case so that it also will not run test.sh
backend_dir = "C:\\opt\\tritonserver\\backends" | ||
model_dir = "C:\\opt\\tritonserver\\qa\\L0_shared_memory\\models" | ||
self._server_executable = "C:\\opt\\tritonserver\\bin\\tritonserver.exe" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably more of a random note or follow-up, but I was under the impression something like Pathlib.Path("/opt/tritonserver/backends")
would translate to "C:\\opt\\tritonserver\\backends"
for free when run on Windows. If so you could probably condense the cases to work for both.
Did you see otherwise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I believe you are right. ATM they are set to my local path and were hardcoded for convenience. They need to be modified and will once I determine the CI environment.
# Constant members | ||
shared_memory_test_client_log = Path(os.getcwd()) / "client.log" | ||
model_dir_path = Path(os.getcwd()) / "models" | ||
model_source_path = Path(os.getcwd()).parents[0] / "python_models/add_sub/model.py" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Future follow-up as we expand python utilities for CI testing, but might be nice to have some kind of utils.relative_path([path, to, thing])
.
ex maybe something like this:
model_dir_path = utils.relative_path("models")
model_source_path = utils.relative_path("..", "python_models", "add_sub", "model.py")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Great work on this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments, can be addressed in the future PR that adds clean up logic
126f7dc
to
06d1e6b
Compare
Goal: Support CPU shared memory between the server and client for Windows
Sub-goals: Modify L0_shared_memory to run on bare-metal Windows using only Python.
Client changes: triton-inference-server/client#551
Some things to note:
test.sh