gRPC response is cropped, where REST /generate fully sent #6864

mkhludnev · 2024-02-05T09:47:57Z

Hello,
I use triton with vLLM model. For some reason vLLM backend prepends response with prompt. Don't know why, asked here.
As a result of RAG approach I've got really long responses. The problem is that REST /generate return long response fully, and gRPC crops long response at certain length. I'm using python gRPC client, but I witness cropping in verbose log.
More details, excepts from configs and logs are here.

oandreeva-nv · 2024-02-15T18:19:35Z

Hi @mkhludnev, could you please give us a rough estimation of how long your prompt is, or provide an example of a PROMPT for a reproducer, we'd highly appreciate it.

cc @itskevinwang

mkhludnev · 2024-02-15T19:44:34Z

Sure.
curl got 200 words rest-triton.txt
gRPC takes only 20 grpc-triton.txt
I use langchain module as a client.
Here's the model for vllm_bacnkend
model.json
config.pbtxt.txt

I'm trying to find a brief reproducer but it's tough to a noob like myself.

oandreeva-nv · 2024-02-15T19:46:12Z

Thank you! I think this should be enough for now.

mkhludnev · 2024-02-15T20:44:46Z

You know what.. I feel terribly sorry.
It might be it's a question of "parameters":{"max_tokens":4000}.
I'm trying to figure it out how to send it with https://github.com/triton-inference-server/vllm_backend/blob/main/samples/client.py

context triton-inference-server/server#6864

mkhludnev · 2024-02-15T20:59:51Z

Pardon triton-inference-server/vllm_backend#34

oandreeva-nv · 2024-02-15T22:51:38Z

Thanks for providing solution!

mkhludnev mentioned this issue Feb 6, 2024

vllm-backend exclude_input_in_output flag whether prepend output with prompt #6866

Closed

mkhludnev added a commit to mkhludnev/vllm_backend that referenced this issue Feb 15, 2024

Demonstrate passing "max_tokens" param

257a2a9

context triton-inference-server/server#6864

mkhludnev mentioned this issue Feb 15, 2024

Demonstrate passing "max_tokens" param triton-inference-server/vllm_backend#34

Merged

mkhludnev closed this as completed Feb 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gRPC response is cropped, where REST /generate fully sent #6864

gRPC response is cropped, where REST /generate fully sent #6864

mkhludnev commented Feb 5, 2024

oandreeva-nv commented Feb 15, 2024

mkhludnev commented Feb 15, 2024

oandreeva-nv commented Feb 15, 2024

mkhludnev commented Feb 15, 2024

mkhludnev commented Feb 15, 2024

oandreeva-nv commented Feb 15, 2024

gRPC response is cropped, where REST /generate fully sent #6864

gRPC response is cropped, where REST /generate fully sent #6864

Comments

mkhludnev commented Feb 5, 2024

oandreeva-nv commented Feb 15, 2024

mkhludnev commented Feb 15, 2024

oandreeva-nv commented Feb 15, 2024

mkhludnev commented Feb 15, 2024

mkhludnev commented Feb 15, 2024

oandreeva-nv commented Feb 15, 2024