Replies: 2 comments 1 reply
-
Hmm I'd have to take a look - the version of ggml they are using is ancient and will not be directly compatible with the existing library. Also on the surface it looks like NeoX is very similar to GPT-J just with a different RoPE function, but i'm not too familiar with the specifics yet. Ideally ggerganov does a sample implementation compatible with the latest GGML and I can just use theirs. We'll see how that develops before I try to roll my own version. |
Beta Was this translation helpful? Give feedback.
-
@LostRuins ggml implementation was merged ggerganov/ggml#96 so it might be in a more workable state to implement here. |
Beta Was this translation helpful? Give feedback.
-
A bit of a cross post from llama.cpp but feels appropriate here as this repo targets a number of ggml formats instead of just LLaMa. ggerganov#1063 It might not be too out of the question as someone has already done work on converting/quantizing GPT NeoX models in ggml. Strangely none of this was PR'd to ggml and has just been out there.
https://github.com/NolanoOrg/cformers
https://github.com/NolanoOrg/cformers/blob/master/cformers/cpp/converters/convert_gptneox_to_ggml.py
https://github.com/NolanoOrg/cformers/blob/master/cformers/cpp/quantize_gptneox.cpp
Edit: ggerganov/ggml#96
Beta Was this translation helpful? Give feedback.
All reactions