Standardize LLM Observability on OpenTelemetry

Added Sep 28, 2025
Article: PositiveCommunity: PositiveMixed
Standardize LLM Observability on OpenTelemetry

Chatwoot’s production issues with an AI agent highlight the need for deep LLM observability. While OpenInference provides AI-centric span types, its partial adherence to OpenTelemetry and limited language support create practical integration problems. The author argues teams should standardize on OpenTelemetry, enrich spans with AI attributes, and help advance OTel’s GenAI semantics—an approach SigNoz is actively supporting.

Key Points

  • Production LLMs need step-level visibility (RAG documents, tool calls, inputs/outputs, decisions) to debug real issues.
  • OpenTelemetry is the most mature, widely supported standard, but its span kinds are generic for AI workflows.
  • OpenInference offers AI-native span types and a better LLM-focused UX, yet it lacks full OTel semantic compatibility and broad language SDKs (e.g., Ruby).
  • Mixing telemetry standards fragments observability and breaks out-of-the-box, OTel-based features across the stack.
  • Best practice: use OpenTelemetry as the backbone, enrich spans with AI attributes, and contribute to OTel’s GenAI semantic conventions.

Sentiment

The community is broadly supportive of standardizing on OpenTelemetry for LLM observability but pushes back on specific claims in the article. The most substantive critique is that the article mischaracterizes the relationship between OpenInference and OTel, conflating vendor UI behavior with protocol compatibility. There is healthy debate about whether OTel alone provides enough semantic richness or needs domain-specific extensions, and no consensus on which specific tooling is best. The discussion is constructive and practitioner-driven, with real-world experiences shared from multiple teams.

In Agreement

  • OTel provides whole-system observability beyond just LLM calls, making it easier to trace entire agent workflows with minimal integration code
  • OTel data can be stored in ClickHouse and augmented with custom schema, giving the best of both worlds between general-purpose and AI-specific observability
  • Observability and evals together are the cornerstone of successful agent systems — without semantic evaluations, you cannot explain or improve agent behavior
  • Following the official OTel GenAI semantic conventions spec is the right path forward for standardization

Opposed

  • The article incorrectly claims OpenInference is not OTel-compatible; semantic conventions are just attribute naming within OTel spans, not a separate protocol — Phoenix's UI specificity is different from protocol incompatibility
  • Just add attributes oversimplifies debugging multi-agent systems with dynamic tool calls; hybrid or bridging standards may be inevitable
  • We have not found the right LLM observability metrics yet — tracking tool calls and prompts is like monitoring every syscall in a C++ app rather than looking at meaningful application-level logs
  • Conversation traces feel more natural for LLM systems than traditional distributed tracing, since conversations already capture execution flow and user feedback
  • Using LLMs to evaluate LLM performance is a circular chicken-and-egg problem
Standardize LLM Observability on OpenTelemetry | TD Stuff