You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The keep_alive parameter can be set to:
a duration string (such as "10m" or "24h")
a number in seconds (such as 3600)
any negative number which will keep the model loaded in memory (e.g. -1 or "-1m")
'0' which will unload the model immediately after generating a response
Would be nice to have an environment variable or something to configure this.
To the best of my knowledge, now the model stays loaded in memory, occupying quite a chunk of VRAM, which could be freed when not needed.
Implementing a load/unload mechanism when audio is received would be great.
The text was updated successfully, but these errors were encountered: