Replies: 1 comment
-
There is an experimental memory summarization feature inside Lite that only works for instruct models. Click on the Memory tab and you will see a button for it there. you can press this to add stuff from your context into memory. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Idea for a feature. Context lengths are usually 2048 or 4096 (for now). How about a feature that chunks up longer inputs into 2048-token chunks. These chunks are then processed and summarized by the llm to lets say 256 tokens and fed back into the chat input. So if you have like a 10000 token chat length (not including the user input and a user determined range of up to say 1024 token or something) the previous tokens can be summarized and included into the prompt for around 1000 tokens. It's not perfect but it could be the start of an active "long term memory" or "extended short term memory".
Ive seen from the new ui kobold that it only uses the most recent chat tokens and that it does not go back to the early chat at all. Once past the max context length, thats it, kobold ignores it. This rudimentary memory idea might be solution for this.
(Breakdown of how input would look being fed to the llm. Depending on model context length and vram available, as according to current context lengths)
[Summarization of previous chat: 128 - 1024 tokens]
[Current chat: 1024 - 2048 tokens]
[User input: 256 - 1024 tokens]
Beta Was this translation helpful? Give feedback.
All reactions