-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gRPC response is cropped, where REST /generate fully sent #6864
Comments
Hi @mkhludnev, could you please give us a rough estimation of how long your prompt is, or provide an example of a PROMPT for a reproducer, we'd highly appreciate it. |
Sure. I'm trying to find a brief reproducer but it's tough to a noob like myself. |
Thank you! I think this should be enough for now. |
You know what.. I feel terribly sorry. |
Thanks for providing solution! |
Hello,
I use triton with vLLM model. For some reason vLLM backend prepends response with prompt. Don't know why, asked here.
As a result of RAG approach I've got really long responses. The problem is that REST /generate return long response fully, and gRPC crops long response at certain length. I'm using python gRPC client, but I witness cropping in verbose log.
More details, excepts from configs and logs are here.
The text was updated successfully, but these errors were encountered: