You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
it just seems verbose and redundant since client already have the prompt which was submitted. It especially odd with RAG-approach where prompts are so long.
mkhludnev
changed the title
vllm-backend prepends output with prompt
vllm-backend exclude_input_in_output flag whether prepend output with prompt
Feb 14, 2024
Description
vllm-backend concatenates the prompt and the output before responding.
Triton Information
Are you using the Triton container or did you build it yourself?
nvcr.io/nvidia/tritonserver:23.11-vllm-python-py3
To Reproduce
Steps to reproduce the behavior.
curl POST :8000/v2/models/vllm_model/generate
:8001
and watch verbose logsExpected behavior
There are no prompt in vLLM response that makes #6864 less painful.
The text was updated successfully, but these errors were encountered: