Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing trtllm OOM Issue #91

Closed
wants to merge 15 commits into from
Closed

Fixing trtllm OOM Issue #91

wants to merge 15 commits into from

Conversation

KrishnanPrash
Copy link
Contributor

Currently, when building the engine with the TRTLLM HLAPI leads to OOM issues when starting the server.

The root cause appears to be something not being cleaned up post-engine creation in tensorrt_llm.

In order to resolve this, we separate engine building for HLAPI into a separate child process and allow the operating system to clean up any dangling allocations that are currently not being cleaned up.

@rmccorm4 rmccorm4 changed the base branch from rmccormick-trtllm-hlapi to main November 27, 2024 18:49
@KrishnanPrash KrishnanPrash deleted the kprashanth-trtllm-fix branch December 2, 2024 17:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants