[Enhancement] Support user-configurable throttling for embeddings on Cohere #3943

sealad886 · 2025-02-03T17:29:45Z

Validations

I believe this is a way to improve. I'll try to join the Continue Discord for questions
I'm not able to find an open issue that requests the same enhancement

Problem

You can use Cohere's models for free by signing up and configuring your API key.

However, when trying to get embeddings for some documents, I receive the follwoing error in VSCode.Console:

[Extension Host] HTTP 429 Too Many Requests from https://api.cohere.ai/v1/embed

{"message":"trial token rate limit exceeded, limit is 100000 tokens per minute"}

Code: undefined
Error number: undefined
Syscall: undefined
Type: undefined

Error: HTTP 429 Too Many Requests from https://api.cohere.ai/v1/embed

{"message":"trial token rate limit exceeded, limit is 100000 tokens per minute"}
    at customFetch2 (/Users/andrew/.vscode/extensions/continue.continue-0.9.258-darwin-arm64/out/extension.js:109297:21)
    at processTicksAndRejections (node:internal/process/task_queues:95:5)
    at withExponentialBackoff (/Users/andrew/.vscode/extensions/continue.continue-0.9.258-darwin-arm64/out/extension.js:105220:26)
    at _Cohere._embed (/Users/andrew/.vscode/extensions/continue.continue-0.9.258-darwin-arm64/out/extension.js:157208:22)
    at /Users/andrew/.vscode/extensions/continue.continue-0.9.258-darwin-arm64/out/extension.js:109659:24
    at withExponentialBackoff (/Users/andrew/.vscode/extensions/continue.continue-0.9.258-darwin-arm64/out/extension.js:105220:26)
    at /Users/andrew/.vscode/extensions/continue.continue-0.9.258-darwin-arm64/out/extension.js:109650:32
    at async Promise.all (index 7)

It would appear that, beneficent as they are letting devs use their API completely for free, they are also a reasonable people who put limits on that.

This also leads to the state where certain very large files (e.g. @Docs entries) can't be indexed at all--at least by the server that is limiting tokens like this.

Solution

Implement a global throttling mechanism. Recommend to be configurable by:

Restrict max number tokens sent per time interval
Restrict max number tokens sent + received per time interval (this one will be difficult and perhaps not necessary)
Restrict max number requests per time interval
Restrict max number concurrent open connections (I could see this coming into play if, say, someone created a context provider that would job-off certain things--like applying code updates--and might overwhelm a free service)

The text was updated successfully, but these errors were encountered:

sestinj self-assigned this Feb 3, 2025

github-actions bot added the "needs-triage" label Feb 3, 2025

dosubot bot added area:configuration Relates to configuration options kind:enhancement Indicates a new feature request, imrovement, or extension labels Feb 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] Support user-configurable throttling for embeddings on Cohere #3943

[Enhancement] Support user-configurable throttling for embeddings on Cohere #3943

sealad886 commented Feb 3, 2025 •

edited

Loading

[Enhancement] Support user-configurable throttling for embeddings on Cohere #3943

[Enhancement] Support user-configurable throttling for embeddings on Cohere #3943

Comments

sealad886 commented Feb 3, 2025 • edited Loading

Validations

Problem

Solution

sealad886 commented Feb 3, 2025 •

edited

Loading