Replies: 1 comment 1 reply
-
From memory, there's a note in the newer I suspect that EDIT: Just to confirm you are aware that you have:
and
so you are guaranteed different results with your two calls to |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Edit:
After some investigation I've identified the problem.
When sampling,
top_k
value is not being evaluated before being passed into the functionhttps://github.com/abetlen/llama-cpp-python/blob/1a13d76c487df1c8560132d10bda62d6e2f4fa93/llama_cpp/llama.py#LL367C1-L367C1
The value is passed as is and is not changed to
n_vocab
iftop_k=0
.Why is that a problem?
In the source code of
llama.cpp
we can see that whenk=0
andmin_keep=1
it will always default to a maximum of a single candidate, ensuring we only receive the candidate with the highest logit.This is not an expected functionality, because value of
k=0
is meant to mark thattop_k
sampling is disabled, according tollama.cpp
source code:Hello.
I've noticed a strange occurrence when trying to generate output. Based on context the bindgins API will always return the same output. Additionally it seems that
top_p
andtemp
values are being completely ignored.This is not the case when running llama.cpp itself.
I am using the latest version (v0.1.50) of llama-cpp-python. I've installed it with cuBLAS support over pip as well as tried compiling it myself, both instances produce the same results.
My example script:
Output example (always the same, regardless of top_p and temp):
Now, using llama.cpp I always get a different result:
Sorry if this is an incorrect place to post something like this, this is my first time posting.
Beta Was this translation helpful? Give feedback.
All reactions