Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear and Restore In-Memory Footprint for Local LLM Platform #90

Open
1 task done
philippzagar opened this issue Dec 30, 2024 · 6 comments
Open
1 task done

Clear and Restore In-Memory Footprint for Local LLM Platform #90

philippzagar opened this issue Dec 30, 2024 · 6 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@philippzagar
Copy link
Member

Problem

SpeziLLM provides the LocalPlatform for on-device LLM execution. However, local LLM inference is highly resource-intensive, particularly in terms of main memory consumption. On iOS, this often results in the operating system conservatively terminating the resource-heavy application when it is moved to the background. Currently, SpeziLLM offers no mechanism to mitigate this issue, forcing users to restart the app entirely after it is terminated.

Solution

Introduce two simple functions (likely as part of LLMRunner to ensure cross-platform applicability, but primarily affecting the local platform) to give developers more control over the memory footprint of the app using SpeziLLM.
This function would:

  1. Offloading resources: Pause LLM execution and remove in-flight state from main memory, e.g. when the app moves to the background.

    • This could be done by either:
      • Moving the execution state to disk, or
      • Completely clearing the state from memory.
  2. Loading resources again: Provide a complementary function to resume LLM execution, e.g. when the app is resumed and moves into the foreground.

    • This function would:
      • Restore the state from disk back into memory, or
      • Reinitialize the platform state as necessary (refer to MLX documentation for guidance).

This approach would enable resource-efficient handling of LLM execution during app backgrounding and prevent unnecessary terminations and restarts.

Additional context

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct and Contributing Guidelines
@philippzagar philippzagar added enhancement New feature or request help wanted Extra attention is needed labels Dec 30, 2024
@philippzagar
Copy link
Member Author

philippzagar commented Dec 30, 2024

Issue was raised in https://github.com/orgs/StanfordSpezi/discussions/69, I formalized it as a feature request in the SpeziLLM repo.

@philippzagar
Copy link
Member Author

Tagging @LeonNissen as he currently leads the developments of the local LLM execution via SpeziLLM! 🚀

@bryan1anderson
Copy link

I definitely feel the most crucial part of this is a simple panic button. Restoring would be nice.. but if we cannot clean up resources at all.. very problematic

@bryan1anderson
Copy link

Seems like

  • Cancel tasks
  • Clear MLX cache
  • Empty session arrays

Would be a good start?

@LeonNissen
Copy link
Contributor

@bryan1anderson great idea! 🚀

I think we could start by removing the reference of the ModelContainer in https://github.com/StanfordSpezi/SpeziLLM/blob/main/Sources/SpeziLLMLocal/LLMLocalSchema.swift

I would try that one first.

@philippzagar
Copy link
Member Author

philippzagar commented Jan 1, 2025

@bryan1anderson great idea! 🚀

I think we could start by removing the reference of the ModelContainer in https://github.com/StanfordSpezi/SpeziLLM/blob/main/Sources/SpeziLLMLocal/LLMLocalSchema.swift

I would try that one first.

@LeonNissen I guess you're rather referring to the LLMLocalSession (representing the mutable LLM state and its resources in-flight), correct?
We should check how MLX handles these cleanup scenarios, there should definitely be some code within MLX responsible for freeing up resources. Maybe we can then build on top of that existing functionality, also in order to make the cleanup a bit more flexible (instead of just removing the ModelContainer reference)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
Status: Backlog
Development

No branches or pull requests

3 participants