Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gRPC response is cropped, where REST /generate fully sent #6864

Closed
mkhludnev opened this issue Feb 5, 2024 · 6 comments
Closed

gRPC response is cropped, where REST /generate fully sent #6864

mkhludnev opened this issue Feb 5, 2024 · 6 comments

Comments

@mkhludnev
Copy link

Hello,
I use triton with vLLM model. For some reason vLLM backend prepends response with prompt. Don't know why, asked here.
As a result of RAG approach I've got really long responses. The problem is that REST /generate return long response fully, and gRPC crops long response at certain length. I'm using python gRPC client, but I witness cropping in verbose log.
More details, excepts from configs and logs are here.

@oandreeva-nv
Copy link
Contributor

Hi @mkhludnev, could you please give us a rough estimation of how long your prompt is, or provide an example of a PROMPT for a reproducer, we'd highly appreciate it.

cc @itskevinwang

@mkhludnev
Copy link
Author

Sure.
curl got 200 words rest-triton.txt
gRPC takes only 20 grpc-triton.txt
I use langchain module as a client.
Here's the model for vllm_bacnkend
model.json
config.pbtxt.txt

I'm trying to find a brief reproducer but it's tough to a noob like myself.

@oandreeva-nv
Copy link
Contributor

Thank you! I think this should be enough for now.

@mkhludnev
Copy link
Author

You know what.. I feel terribly sorry.
It might be it's a question of "parameters":{"max_tokens":4000}.
I'm trying to figure it out how to send it with https://github.com/triton-inference-server/vllm_backend/blob/main/samples/client.py

@mkhludnev
Copy link
Author

@oandreeva-nv
Copy link
Contributor

Thanks for providing solution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants