Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: load/unload model on request #42

Open
peperunas opened this issue Dec 11, 2024 · 1 comment
Open

[Feature Request]: load/unload model on request #42

peperunas opened this issue Dec 11, 2024 · 1 comment

Comments

@peperunas
Copy link

To the best of my knowledge, now the model stays loaded in memory, occupying quite a chunk of VRAM, which could be freed when not needed.

Implementing a load/unload mechanism when audio is received would be great.

@ecker00
Copy link

ecker00 commented Jan 20, 2025

Agree, for example here is how Ollama does it:

The keep_alive parameter can be set to:
a duration string (such as "10m" or "24h")
a number in seconds (such as 3600)
any negative number which will keep the model loaded in memory (e.g. -1 or "-1m")
'0' which will unload the model immediately after generating a response

Would be nice to have an environment variable or something to configure this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants