Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relation to OpenTelemetry #33

Open
tmc opened this issue Oct 8, 2023 · 6 comments
Open

Relation to OpenTelemetry #33

tmc opened this issue Oct 8, 2023 · 6 comments

Comments

@tmc
Copy link

tmc commented Oct 8, 2023

Heya! I’m curious how you are thinking about how this effort relates to or interacts with the OpenTelemetry project.

@mikeldking
Copy link
Contributor

Heya @tmc - yes the traces are basically OpenTelemetry with a few different design constraints:

  • Reserved span kinds specific to generative applications
  • Semantic attributes for data required for evals, re-training, or fine-tuning
  • Ability to latently evaluate the data (e.g. the traces are not there just for debugging)

We believe OTEL will become increasingly important but that fine-grain traces like OpenInference Tracing is also important to build first-class generative apps. I think the interaction with OpenTelemetry is not yet fleshed out but I can see a future where these traces can be consumed as part of OTEL's distributed tracing.

@janaka
Copy link

janaka commented Nov 5, 2023

Can you elaborate more on why you can't build on top / extend Otel right now? Why is it in the future?

@mikeldking
Copy link
Contributor

Can you elaborate more on why you can't build on top / extend Otel right now? Why is it in the future?

Hey @janaka - good question - it's something we ponder about a lot. I think we mainly want to be deliberate in our use of OTEL since - if we build out with OTEL we conflate APM with some of the "possible" tracing needs of LLM application introspection. OTEL is designed around distributed systems and it's context management is really designed around these boundaries where as with what we've started to inspect, the context is more application specific (like conversational applications, retrieval, etc.). In many ways you need a lot more information than traditional APM because you are dealing with unstructured data like documents and text. To answer it more simply - we are mainly focused on the application specific topology - so we started there. But as you mention, building out on top of OTEL could be a good move since a lot of instrumentation already exists.

I know that doesn't fully answer answer your question but we are focusing on capturing the right set of attributes and plan on supporting OTEL as a follow-up. Hope that helps a bit. Would love to hear your thoughts on the matter.

@janaka
Copy link

janaka commented Nov 9, 2023

Hi @mikeldking, thanks for your response.

Yes for sure there are ML specifics needed, no getting away from that. It makes sense that you are focusing on figuring what the model for this domain should look like first. That's the value prop after all. It's also great that you've based it on OTel that was definitely the right move in my view rather than going bespoke.

From a usage point of view, having one system for wiring up the tracing (and metrics and logs) and pushing to different backends that are task/domain specific makes a big difference mentally. Docq is doing RAG so there's the index+search side as well.

Over the last few days I spent some time instrumenting Docq with OTel tracing. The auto instrumentation gets you started OOTB fast but of course need to add application-specific events/spans to make it useful. That's not complicated but I wish Copilot would just add the first round of function decorators for me. Very quickly I've hit the limits of not getting much visibility into LlamaIndex. I created a LlamaIndex callback handler to give me a little more visibility but I don't think it's sufficient. I had a stab at creating a OpenTelemetry instrumentor but had to park that. I think Traceloop are planning to release one for LlamaIndex. Going to see what that looks like.

Right now I think I need end-to-end tracing within the app, so more traditional APM, especially given we are intentionally a single process monolith. Then, want to be able to get more visibility into the RAG pipeline. So both the indexing and then the search/prompt/generation on the usage side (Chat/Q&A/Agents). Evaluations is part of this I feel. No doubt there are development vs production use cases differences. Right now we are more focused on the development time needs.

@mikeldking
Copy link
Contributor

Hi @mikeldking, thanks for your response.

Yes for sure there are ML specifics needed, no getting away from that. It makes sense that you are focusing on figuring what the model for this domain should look like first. That's the value prop after all. It's also great that you've based it on OTel that was definitely the right move in my view rather than going bespoke.

From a usage point of view, having one system for wiring up the tracing (and metrics and logs) and pushing to different backends that are task/domain specific makes a big difference mentally. Docq is doing RAG so there's the index+search side as well.

Over the last few days I spent some time instrumenting Docq with OTel tracing. The auto instrumentation gets you started OOTB fast but of course need to add application-specific events/spans to make it useful. That's not complicated but I wish Copilot would just add the first round of function decorators for me. Very quickly I've hit the limits of not getting much visibility into LlamaIndex. I created a LlamaIndex callback handler to give me a little more visibility but I don't think it's sufficient. I had a stab at creating a OpenTelemetry instrumentor but had to park that. I think Traceloop are planning to release one for LlamaIndex. Going to see what that looks like.

Right now I think I need end-to-end tracing within the app, so more traditional APM, especially given we are intentionally a single process monolith. Then, want to be able to get more visibility into the RAG pipeline. So both the indexing and then the search/prompt/generation on the usage side (Chat/Q&A/Agents). Evaluations is part of this I feel. No doubt there are development vs production use cases differences. Right now we are more focused on the development time needs.

@janaka this is super insightful. Thank you. I think investing in OTEL makes a ton of sense because it gives you maximum visibility across various boundaries - definitely worth the investment. I would never deploy a distributed system without it now. Biggest hurdle as you say is the instrumentation and the need for auto instrumentation for LLM orchestration and LLM providers. I think that's where we are starting to converge on - we have OpenAI instrumentation as well as AWS bedrock instrumentation coming very soon and will tackle the context management to stitch together the spans. At that point I think we will probably figure out how this gets exported to OTEL as well as other collectors like arize-phoenix. Will keep you up to date as we make progress towards end-to-end tracing.

@mikeldking
Copy link
Contributor

Updating we've started moving our instrumentation over to a monorepo that will house OTEL instrumentation (https://github.com/Arize-ai/openinference) - Phoenix now supports OTEL via OTLP so you can send traces to phoenix using OTEL!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants