-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Anthropic Citations support #588
Comments
It looks like this is only relevant to requests that include documents, so this is not the same feature as Perplexity's citations. My understanding is that almost no gptel users are doing this, because it's expensive to send documents with each request, even with prompt caching (which gptel uses if sending binary data). I can add it if there's enough interest, but it's a niche feature so it's a low priority otherwise. |
Thanks for considering the request.
The feature is designed so you send the document once and then
prompt with questions about the document. Claude parses the
document and then provides citations to the document when
responding. The cited text does not count toward output tokens.
…On 2025-01-28 at 11:37:13, karthink wrote:
It looks like this is only relevant to requests that include
documents, so this is not the same feature as Perplexity's
citations.
My understanding is that almost no gptel users are doing this,
because it's expensive to send documents with each request,
even with prompt caching (which gptel uses if sending binary
data).
I can add it if there's enough interest, but it's a niche
feature so it's a low priority otherwise.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.
--
çög
|
Thanks for considering the request.
The feature is designed so you send the document once and then prompt with questions about the document. Claude parses the document and then provides citations to the document when responding. The cited text does not count toward output tokens.
That's not how the Anthropic API used by gptel works. The document is resent each time you interact with the LLM, i.e. with each subsequent question. This means you will end up sending your 5 MB PDF file over the network (say) 30 times in a conversation. It will be parsed in full the first time and some intermediate inference state will be cached by Anthropic. Assuming you only append to the conversation, on subsequent requests Anthropic will use the cache instead. But you still pay a reduced token cost every time, and the document still needs to be sent over the network with each request.
I don't know if there is a stateful Anthropic API available that works how you describe it -- OpenAI's "assistants" API works in this stateful way. If it does exist gptel does not use it.
|
You're right I didn't understand how it works. I asked anthropic
about recommendations and they said:
If you are trying to reduce costs it might make sense to write
just the document to cache first and then send your multiple
citation queries to hit the cached document (cache hit API calls
are significantly less expensive than normal API calls).
I believe this is what you were saying? Does gptel cache the
documents?
…On 2025-01-28 at 13:50:20, karthink wrote:
> Thanks for considering the request.
>
> The feature is designed so you send the document once and then
> prompt with questions about the document. Claude
parses the document and then provides citations to the document
when responding. The cited text does not count toward
output tokens.
That's not how the Anthropic API used by gptel works. The
document is resent each time you interact with the LLM,
i.e. with
each subsequent question. This means you will end up sending
your 5 MB PDF file over the network (say) 30 times in a
conversation. It will be parsed in full the first time and some
intermediate inference state will be cached by Anthropic.
Assuming you only append to the conversation, on subsequent
requests Anthropic will use the cache instead. But you still
pay a reduced token cost every time, and the document still
needs to be sent over the network with each request.
I don't know if there is a stateful Anthropic API available that
works how you describe it -- OpenAI's "assistants" API works in
this stateful way. If it does exist gptel does not use it.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.
--
çög
|
You're right I didn't understand how it works. I asked anthropic
about recommendations and they said:
> If you are trying to reduce costs it might make sense to write
> just the document to cache first and then send your multiple
> citation queries to hit the cached document (cache hit API calls
> are significantly less expensive than normal API calls).
I believe this is what you were saying? Does gptel cache the
documents?
Yes and yes.
|
Anthropic recently has introduced a citations feature which cites documents included in the context. Would it be possible to add this feature? Perhaps #581, de0dedb for Perplexity citation support may be useful.
The text was updated successfully, but these errors were encountered: