You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I'm not familiar with llama-cpp-python (actually not familiar with cpp) but I have to use gguf model for my project.
I want to generate answer from pre-computed embedding vectors(torch.Tensor) with size (1, n_tokens, 4096), not from query text. Here I mean the embedding vectors are text embeddings that generated from torch.nn.Embedding()
(Just like inputs_embeds argument of generate() function of transformers model)
What I want to do is just skip process 1 and 2:
tokenize input string
make text embeddings from tokens
model inference
get output token
detokenize
Is this feature already implemented? If not, please anyone help me where should I begin.
The text was updated successfully, but these errors were encountered:
Hi, I'm not familiar with llama-cpp-python (actually not familiar with cpp) but I have to use gguf model for my project.
I want to generate answer from pre-computed embedding vectors(torch.Tensor) with size (1, n_tokens, 4096), not from query text. Here I mean the embedding vectors are text embeddings that generated from torch.nn.Embedding()
(Just like inputs_embeds argument of generate() function of transformers model)
What I want to do is just skip process 1 and 2:
Is this feature already implemented? If not, please anyone help me where should I begin.
The text was updated successfully, but these errors were encountered: