[{"data":1,"prerenderedAt":249},["ShallowReactive",2],{"blog-post-blog_en-llm-observability-mit-opentelemetry":3},{"id":4,"title":5,"body":6,"cover":233,"date":234,"description":235,"draft":236,"extension":237,"meta":238,"navigation":239,"path":240,"seo":241,"stem":242,"tags":243,"__hash__":248},"blog_en\u002Fen\u002Fblog\u002Fllm-observability-mit-opentelemetry.md","LLM Observability With OpenTelemetry: Operating AI Features Measurably",{"type":7,"value":8,"toc":228},"minimark",[9,13,18,26,29,32,66,69,73,76,79,170,173,205,208,212,215,224],[10,11,12],"p",{},"LLM observability with OpenTelemetry becomes relevant once AI features are no longer demos, but part of real product workflows. At that point, logs of individual prompts are not enough: leadership and engineering need to understand cost, latency, failures, model changes and tool calls.",[14,15,17],"h2",{"id":16},"what-llm-observability-with-opentelemetry-means","What LLM Observability With OpenTelemetry Means",[10,19,20,21,25],{},"LLM observability is not only about whether a model returned an answer. It is about ",[22,23,24],"strong",{},"placing AI behaviour inside normal software operations",": traces, metrics, logs, cost information and business quality signals need to be evaluated together.",[10,27,28],{},"OpenTelemetry is useful here because it describes telemetry in a vendor-neutral way. Instead of inventing a separate monitoring schema for every LLM provider, framework and backend, teams can work with shared attributes and metrics.",[10,30,31],{},"For production AI features, these signals matter most:",[33,34,35,42,48,54,60],"ul",{},[36,37,38,41],"li",{},[22,39,40],{},"Latency:"," How long do model calls, retrieval, tool calls and response streaming actually take?",[36,43,44,47],{},[22,45,46],{},"Cost:"," Which features, customers or workflows cause high token and infrastructure cost?",[36,49,50,53],{},[22,51,52],{},"Error types:"," Are problems model errors, timeouts, rate limits, empty retrieval results or faulty tools?",[36,55,56,59],{},[22,57,58],{},"Quality:"," Where do answers fail domain checks, need manual correction or create support cases?",[36,61,62,65],{},[22,63,64],{},"Data access:"," Which systems and data sources were used to produce an answer?",[10,67,68],{},"OpenTelemetry's GenAI semantic conventions are still in development. That is exactly why teams should not treat them as a finished standard, but as a useful foundation for their own stable operating model.",[14,70,72],{"id":71},"where-teams-should-start-instrumenting","Where Teams Should Start Instrumenting",[10,74,75],{},"The most common mistake is adding LLM observability only after the first major incident. By then, historical baselines are missing, cost is hard to attribute and nobody knows whether a model change actually helped.",[10,77,78],{},"A pragmatic starting point is a small telemetry schema for each AI workflow:",[80,81,86],"pre",{"className":82,"code":83,"language":84,"meta":85,"style":85},"language-yaml shiki shiki-themes github-light github-dark","ai_workflow: contract_review\nowner: product-platform\nsignals: [\"latency\", \"token_usage\", \"tool_calls\", \"error_type\"]\nbusiness_metric: \"review_completed_without_manual_retry\"\nprivacy_rule: \"no_prompt_payloads_in_logs\"\n","yaml","",[87,88,89,106,117,148,159],"code",{"__ignoreMap":85},[90,91,94,98,102],"span",{"class":92,"line":93},"line",1,[90,95,97],{"class":96},"s9eBZ","ai_workflow",[90,99,101],{"class":100},"sVt8B",": ",[90,103,105],{"class":104},"sZZnC","contract_review\n",[90,107,109,112,114],{"class":92,"line":108},2,[90,110,111],{"class":96},"owner",[90,113,101],{"class":100},[90,115,116],{"class":104},"product-platform\n",[90,118,120,123,126,129,132,135,137,140,142,145],{"class":92,"line":119},3,[90,121,122],{"class":96},"signals",[90,124,125],{"class":100},": [",[90,127,128],{"class":104},"\"latency\"",[90,130,131],{"class":100},", ",[90,133,134],{"class":104},"\"token_usage\"",[90,136,131],{"class":100},[90,138,139],{"class":104},"\"tool_calls\"",[90,141,131],{"class":100},[90,143,144],{"class":104},"\"error_type\"",[90,146,147],{"class":100},"]\n",[90,149,151,154,156],{"class":92,"line":150},4,[90,152,153],{"class":96},"business_metric",[90,155,101],{"class":100},[90,157,158],{"class":104},"\"review_completed_without_manual_retry\"\n",[90,160,162,165,167],{"class":92,"line":161},5,[90,163,164],{"class":96},"privacy_rule",[90,166,101],{"class":100},[90,168,169],{"class":104},"\"no_prompt_payloads_in_logs\"\n",[10,171,172],{},"That creates concrete architecture decisions:",[33,174,175,181,187,193,199],{},[36,176,177,180],{},[22,178,179],{},"No prompt payloads in standard logs:"," Content may contain personal data, trade secrets or customer documents.",[36,182,183,186],{},[22,184,185],{},"Correlation instead of dashboard silos:"," LLM spans need to connect with request ID, user role, tenant and backend operations.",[36,188,189,192],{},[22,190,191],{},"Measure cost per feature:"," Average cost per request is not enough when individual workflows destroy margin.",[36,194,195,198],{},[22,196,197],{},"Make model changes testable:"," Teams need baselines for response time, failure rate, token usage and domain accuracy.",[36,200,201,204],{},[22,202,203],{},"Define ownership:"," Product, engineering and support need to know who evaluates alerts and sets priorities.",[10,206,207],{},"OpenTelemetry does not make these decisions automatically. But it prevents AI observability from becoming a special solution beside the rest of operations.",[14,209,211],{"id":210},"why-this-matters","Why This Matters",[10,213,214],{},"AI features rarely fail only because a model is \"bad\". More often, business problems appear through invisible cost, unstable latency, unclear failure modes and missing accountability between product, backend and operations.",[10,216,217,218,223],{},"LLM observability with OpenTelemetry makes these problems manageable early. Teams can see which workflows are profitable, which architecture paths become too expensive and where quality is not reproducible. For growing companies, this is not just a monitoring question, but a leadership responsibility: teams that want to run AI in production need measurability before scale amplifies the mistakes. An ",[219,220,222],"a",{"href":221},"\u002Fen\u002F#packages","Architecture & AI Review"," can assess whether AI features are technically observable and economically controllable.",[225,226,227],"style",{},"html pre.shiki code .s9eBZ, html code.shiki .s9eBZ{--shiki-default:#22863A;--shiki-dark:#85E89D}html pre.shiki code .sVt8B, html code.shiki .sVt8B{--shiki-default:#24292E;--shiki-dark:#E1E4E8}html pre.shiki code .sZZnC, html code.shiki .sZZnC{--shiki-default:#032F62;--shiki-dark:#9ECBFF}html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html.dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}",{"title":85,"searchDepth":108,"depth":108,"links":229},[230,231,232],{"id":16,"depth":108,"text":17},{"id":71,"depth":108,"text":72},{"id":210,"depth":108,"text":211},null,"2026-05-09","LLM observability with OpenTelemetry makes cost, latency and failures in AI features visible before they hurt product quality and support.",false,"md",{},true,"\u002Fen\u002Fblog\u002Fllm-observability-mit-opentelemetry",{"title":5,"description":235},"en\u002Fblog\u002Fllm-observability-mit-opentelemetry",[244,245,246,247],"AI","Software Architecture","Backend Development","Software Quality","_lSU6hNjaMJVSLnO66oWIrA724NRkX5GWQE6LqZA66U",1780122462449]