Optimize multiple concurrent LLM calls #1073

NicolaZomer · 2024-01-09T10:21:47Z

NicolaZomer
Jan 9, 2024

Hi everyone!

I would like to know if there is an efficient way to optimize multiple LLM calls. Let's say I need to make 10 independent requests to the same LLM, instantiated with llama-cpp-python. Is there a more efficient way then doing it sequentially? Can we manage the workload, or parallelize it, or do you any other strategies that might help?

Thank you!

Felipe-Amdocs · 2024-01-10T15:55:57Z

Felipe-Amdocs
Jan 10, 2024

I guess you might be thinking on batch. As far as I know, this PR will bring it to the project:
#951

1 reply

NicolaZomer Jan 15, 2024
Author

Thank you! And yes, this is what I was looking for. Hope to see it done soon!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize multiple concurrent LLM calls #1073

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Optimize multiple concurrent LLM calls #1073

NicolaZomer Jan 9, 2024

Replies: 1 comment · 1 reply

Felipe-Amdocs Jan 10, 2024

NicolaZomer Jan 15, 2024 Author

NicolaZomer
Jan 9, 2024

Replies: 1 comment 1 reply

Felipe-Amdocs
Jan 10, 2024

NicolaZomer Jan 15, 2024
Author