Optimize multiple concurrent LLM calls #1073
Unanswered
NicolaZomer
asked this question in
Q&A
Replies: 1 comment 1 reply
-
I guess you might be thinking on batch. As far as I know, this PR will bring it to the project: |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi everyone!
I would like to know if there is an efficient way to optimize multiple LLM calls. Let's say I need to make 10 independent requests to the same LLM, instantiated with llama-cpp-python. Is there a more efficient way then doing it sequentially? Can we manage the workload, or parallelize it, or do you any other strategies that might help?
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions