-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Fixing CI for TRTLLM HLAPI #94
fix: Fixing CI for TRTLLM HLAPI #94
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM if pipeline passes. Please follow-up on the other PR from my branch into main
that this is merging into afterwards
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. qq - does this also fix the GPU OOM issue or is it a separate problem?
My PR which launches the HL API conversion as a subprocess seems to definitively fix the OOM issue and guarantee memory cleanup. Results were muddied for a bit because it turned out the gitlab mirror got out of sync and was running old code each time. |
@krishung5 That fix was made as a part of Ryan's branch [Link]. IIUC, the solution was building the trtllm engine as a separate child process. Here is the relevant code snippet: # Run TRT-LLM build in a separate process to make sure it definitely
# cleans up any GPU memory used when done.
p = multiprocessing.Process(
target=self.__build_trtllm_engine, args=(huggingface_id, engines_path)
)
p.start()
p.join() |
This PR fixes the CI failures that have been observed when using the TRTLLM HL API. The fixes include:
ScopedTritonServer
, it relies on usingsubprocess
and it's relevant functions instead of usingpsutils
and having to manually keep track of the process id.The root cause for this was after
process.terminate()
was called for the underlying tritonserver process, it is followed by aprocess.communicate(timeout=...)
which triggers a file I/O operation. Hence, to resolve this we replaceprocess.communicate()
withprocess.wait()
to give the process time to clean up with making an I/O call.