Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Support user-configurable throttling for embeddings on Cohere #3943

Open
2 tasks done
sealad886 opened this issue Feb 3, 2025 · 0 comments
Open
2 tasks done
Assignees
Labels
area:configuration Relates to configuration options kind:enhancement Indicates a new feature request, imrovement, or extension "needs-triage"

Comments

@sealad886
Copy link
Contributor

sealad886 commented Feb 3, 2025

Validations

  • I believe this is a way to improve. I'll try to join the Continue Discord for questions
  • I'm not able to find an open issue that requests the same enhancement

Problem

You can use Cohere's models for free by signing up and configuring your API key.

However, when trying to get embeddings for some documents, I receive the follwoing error in VSCode.Console:

[Extension Host] HTTP 429 Too Many Requests from https://api.cohere.ai/v1/embed

{"message":"trial token rate limit exceeded, limit is 100000 tokens per minute"}

Code: undefined
Error number: undefined
Syscall: undefined
Type: undefined

Error: HTTP 429 Too Many Requests from https://api.cohere.ai/v1/embed

{"message":"trial token rate limit exceeded, limit is 100000 tokens per minute"}
    at customFetch2 (/Users/andrew/.vscode/extensions/continue.continue-0.9.258-darwin-arm64/out/extension.js:109297:21)
    at processTicksAndRejections (node:internal/process/task_queues:95:5)
    at withExponentialBackoff (/Users/andrew/.vscode/extensions/continue.continue-0.9.258-darwin-arm64/out/extension.js:105220:26)
    at _Cohere._embed (/Users/andrew/.vscode/extensions/continue.continue-0.9.258-darwin-arm64/out/extension.js:157208:22)
    at /Users/andrew/.vscode/extensions/continue.continue-0.9.258-darwin-arm64/out/extension.js:109659:24
    at withExponentialBackoff (/Users/andrew/.vscode/extensions/continue.continue-0.9.258-darwin-arm64/out/extension.js:105220:26)
    at /Users/andrew/.vscode/extensions/continue.continue-0.9.258-darwin-arm64/out/extension.js:109650:32
    at async Promise.all (index 7)

It would appear that, beneficent as they are letting devs use their API completely for free, they are also a reasonable people who put limits on that.

This also leads to the state where certain very large files (e.g. @Docs entries) can't be indexed at all--at least by the server that is limiting tokens like this.

Solution

Implement a global throttling mechanism. Recommend to be configurable by:

  • Restrict max number tokens sent per time interval
  • Restrict max number tokens sent + received per time interval (this one will be difficult and perhaps not necessary)
  • Restrict max number requests per time interval
  • Restrict max number concurrent open connections (I could see this coming into play if, say, someone created a context provider that would job-off certain things--like applying code updates--and might overwhelm a free service)
@sestinj sestinj self-assigned this Feb 3, 2025
@dosubot dosubot bot added area:configuration Relates to configuration options kind:enhancement Indicates a new feature request, imrovement, or extension labels Feb 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:configuration Relates to configuration options kind:enhancement Indicates a new feature request, imrovement, or extension "needs-triage"
Projects
None yet
Development

No branches or pull requests

2 participants