Replies: 1 comment
-
I believe superhot is trained with linear rope scale, so yes those settings should be correct. If you don't use a custom rope config, it will instead try NTK-aware scale, which may or may not work. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I've got a bit of confusion regarding the proper set of options needed to use SuperHOT 8K models. I've read around a bit, and see the documentation mentions you need a "specific" RoPE config setup, while some reddit posts indicate that you don't need to configure that at all. I haven't quite been able to get fully satisfactory results yet.
Am I correct in assuming that the correct way to use full context with these models (particularly chronos-hermes-13b-superhot-8k) is these options in the launcher? (And then obviously telling Kobold or SillyTavern to use 8192 as well)
Or is the custom rope config not necessary?
Beta Was this translation helpful? Give feedback.
All reactions