Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Fixing CI for TRTLLM HLAPI #94

Merged
merged 5 commits into from
Dec 7, 2024

Conversation

KrishnanPrash
Copy link
Contributor

@KrishnanPrash KrishnanPrash commented Dec 5, 2024

This PR fixes the CI failures that have been observed when using the TRTLLM HL API. The fixes include:

  • Mainly, for the ScopedTritonServer, it relies on using subprocess and it's relevant functions instead of using psutils and having to manually keep track of the process id.
  • Additionally, resolving an issue that was observed when stopping triton instance in CI. Here is the error messaged being displayed:
Received signal=15, frame=<frame at 0x79d701106f80, file '/mnt/triton_cli/src/triton_cli/server/server_local.py', line 148, code logs>. Shutting down...
reentrant call inside <_io.BufferedReader name=3>
triton - ERROR - Failed to start server, unknown error.

The root cause for this was after process.terminate() was called for the underlying tritonserver process, it is followed by a process.communicate(timeout=...) which triggers a file I/O operation. Hence, to resolve this we replace process.communicate() with process.wait() to give the process time to clean up with making an I/O call.

tests/utils.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@rmccorm4 rmccorm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if pipeline passes. Please follow-up on the other PR from my branch into main that this is merging into afterwards

Copy link
Contributor

@krishung5 krishung5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. qq - does this also fix the GPU OOM issue or is it a separate problem?

@rmccorm4
Copy link
Collaborator

rmccorm4 commented Dec 7, 2024

LGTM. qq - does this also fix the GPU OOM issue or is it a separate problem?

My PR which launches the HL API conversion as a subprocess seems to definitively fix the OOM issue and guarantee memory cleanup. Results were muddied for a bit because it turned out the gitlab mirror got out of sync and was running old code each time.

@KrishnanPrash
Copy link
Contributor Author

@krishung5 That fix was made as a part of Ryan's branch [Link]. IIUC, the solution was building the trtllm engine as a separate child process. Here is the relevant code snippet:

# Run TRT-LLM build in a separate process to make sure it definitely
# cleans up any GPU memory used when done.
p = multiprocessing.Process(
         target=self.__build_trtllm_engine, args=(huggingface_id, engines_path)
)
p.start()
p.join()

@KrishnanPrash KrishnanPrash merged commit 06f46d6 into rmccormick-trtllm-hlapi Dec 7, 2024
4 checks passed
@KrishnanPrash KrishnanPrash deleted the kprashanth-trtllm-hlapi-fix branch December 7, 2024 01:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants