-
-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clear and Restore In-Memory Footprint for Local LLM Platform #90
Comments
Issue was raised in https://github.com/orgs/StanfordSpezi/discussions/69, I formalized it as a feature request in the SpeziLLM repo. |
Tagging @LeonNissen as he currently leads the developments of the local LLM execution via SpeziLLM! 🚀 |
I definitely feel the most crucial part of this is a simple panic button. Restoring would be nice.. but if we cannot clean up resources at all.. very problematic |
Seems like
Would be a good start? |
@bryan1anderson great idea! 🚀 I think we could start by removing the reference of the ModelContainer in https://github.com/StanfordSpezi/SpeziLLM/blob/main/Sources/SpeziLLMLocal/LLMLocalSchema.swift I would try that one first. |
@LeonNissen I guess you're rather referring to the LLMLocalSession (representing the mutable LLM state and its resources in-flight), correct? |
Problem
SpeziLLM provides the
LocalPlatform
for on-device LLM execution. However, local LLM inference is highly resource-intensive, particularly in terms of main memory consumption. On iOS, this often results in the operating system conservatively terminating the resource-heavy application when it is moved to the background. Currently, SpeziLLM offers no mechanism to mitigate this issue, forcing users to restart the app entirely after it is terminated.Solution
Introduce two simple functions (likely as part of
LLMRunner
to ensure cross-platform applicability, but primarily affecting the local platform) to give developers more control over the memory footprint of the app using SpeziLLM.This function would:
Offloading resources: Pause LLM execution and remove in-flight state from main memory, e.g. when the app moves to the background.
Loading resources again: Provide a complementary function to resume LLM execution, e.g. when the app is resumed and moves into the foreground.
This approach would enable resource-efficient handling of LLM execution during app backgrounding and prevent unnecessary terminations and restarts.
Additional context
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: