[Feature Request]: load/unload model on request #42

peperunas · 2024-12-11T15:02:49Z

To the best of my knowledge, now the model stays loaded in memory, occupying quite a chunk of VRAM, which could be freed when not needed.

Implementing a load/unload mechanism when audio is received would be great.

ecker00 · 2025-01-20T11:41:02Z

Agree, for example here is how Ollama does it:

The keep_alive parameter can be set to:
a duration string (such as "10m" or "24h")
a number in seconds (such as 3600)
any negative number which will keep the model loaded in memory (e.g. -1 or "-1m")
'0' which will unload the model immediately after generating a response

Would be nice to have an environment variable or something to configure this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request]: load/unload model on request #42

[Feature Request]: load/unload model on request #42

peperunas commented Dec 11, 2024

ecker00 commented Jan 20, 2025

[Feature Request]: load/unload model on request #42

[Feature Request]: load/unload model on request #42

Comments

peperunas commented Dec 11, 2024

ecker00 commented Jan 20, 2025