Category Guide
Continuous observation of LLM-powered features in production — latency, error rates, token usage, and the metric most teams miss: cost per request.
LLM monitoring is the practice of continuously observing the behavior, performance, and cost of large language model API calls in production. Every time your application sends a request to OpenAI, Anthropic, Google, or any other LLM provider, that request has measurable properties — how long it took, whether it succeeded, how many tokens it consumed, and what it cost. LLM monitoring captures these signals in real time and surfaces them as dashboards, trends, and alerts.
Traditional application monitoring tracks request latency, HTTP status codes, and throughput. LLM monitoring adds a layer specific to AI workloads: token-based cost calculation, model-specific performance tracking, and per-customer cost attribution. A standard API call that returns 200 OK tells your APM tool everything is fine. LLM monitoring tells you that same call consumed 4,000 output tokens on a frontier model and cost $0.12 — information that is invisible to traditional monitoring tools.
The goal is operational awareness. You want to know, at any given moment, whether your LLM-powered features are working, how fast they are responding, how much they are costing, and whether anything has changed. Without monitoring, the first sign of a problem is usually the monthly bill — or a customer complaint about slow responses.
Latency (p50 / p95 / p99) measures how long each LLM call takes to return a response. The median (p50) tells you the typical experience. The p95 and p99 tell you about the tail — the slowest requests that affect your most patient users. LLM latency is highly variable: a short classification prompt might return in 200ms, while a long document analysis with streaming disabled might take 15 seconds. Monitoring percentiles separately prevents slow outliers from hiding behind a healthy average.
Error rates include HTTP failures, provider timeouts, rate limit hits (429s), and content filter rejections. LLM APIs are less reliable than most SaaS APIs — provider outages, rate limits, and model deprecations all cause errors that your application needs to handle. Monitoring error rates by model and by provider lets you spot degradation before it affects users.
Token usage per request drives cost directly. Input tokens and output tokens are priced differently — output tokens are typically 3x to 5x more expensive. Monitoring token usage by feature and by customer reveals where your spend is concentrated. A feature that generates long responses (code generation, document drafting) has a fundamentally different cost profile than one that generates short responses (classification, extraction), even if both make the same number of API calls.
Cost per request is the metric most teams overlook. A request can succeed with low latency and still cost 40x more than necessary because the wrong model was selected. Monitoring cost per request — broken down by model, feature, and customer — is what separates LLM monitoring from general application monitoring. It is the one metric that directly affects your margin.
Model drift refers to changes in model behavior over time. Providers update models, deprecate versions, and adjust rate limits without notice. Monitoring for sudden changes in latency, token usage, or error rates after a provider-side update helps you detect drift before it impacts quality or cost. If your average output token count jumps 30% overnight on the same prompts, something changed on the provider side.
| Capability | APM Tools Datadog, New Relic |
LLM Observability Helicone, Langfuse |
Cost Monitoring MarginDash |
|---|---|---|---|
| Latency tracking | No | ||
| Error rate monitoring | No | ||
| Token-based cost calculation | No | ||
| Per-customer cost attribution | No | No | |
| Revenue / margin tracking | No | No | |
| Prompt tracing / debugging | No | No | |
| Cost simulator | No | No | |
| Budget alerts | Custom | ||
| Pricing database | No | No |
These approaches are complementary, not mutually exclusive. APM tools handle infrastructure-level monitoring. LLM observability tools handle prompt-level debugging. Cost monitoring tools handle unit economics. Most production deployments benefit from at least two.
Most monitoring setups track latency and errors but ignore cost entirely. With traditional APIs, the cost per request is effectively zero. With LLM APIs, every request has a variable cost determined by the model, the token count, and the provider's pricing. Cost can spike without any errors, without any latency increase, and without any alerts firing.
The most common cost spikes happen when a feature sends longer prompts than expected, when retry logic re-sends expensive requests on transient failures, or when a customer triggers high-token responses through normal usage. None of these produce errors. The API returns 200 OK. Your existing monitoring says everything is healthy. Your bill says otherwise.
For teams that resell AI features to customers, cost monitoring becomes a unit economics problem. You need to know cost per customer — not just aggregate cost. Without per-customer cost monitoring, you cannot identify which customers are profitable and which are underwater. MarginDash's cost simulator reprices your actual token usage against every model in the pricing database and ranks alternatives by intelligence-per-dollar.
Monitoring without alerts is just a dashboard you check when something already broke. The value of LLM monitoring is catching problems before they become expensive. Budget alerts let you define spending thresholds and get notified before they are exceeded — not after.
Per-customer budgets protect you from individual customers consuming disproportionate resources. If you charge a flat $49/month and one customer is burning $40 in API calls, you want to know immediately — not at the end of the billing cycle. Set a threshold at, say, 60% of their subscription price and get alerted when cost approaches the break-even point.
Per-feature budgets help you track which AI-powered features are the most expensive. If your chat feature costs 5x more than your summarization feature per request, that information affects product decisions — pricing tiers, usage limits, model selection. Feature-level budgets surface these patterns automatically.
Organization-wide budgets are the safety net. Set a monthly spending cap across all customers and features. If total LLM spend exceeds a threshold — or is trending toward exceeding it — you get notified with time to react. This is especially important early in a product launch when usage patterns are unpredictable.
Monitoring answers: is it working? It tracks quantitative metrics — latency, error rates, cost, throughput — and alerts you when something deviates from normal. Monitoring is about detection. You define what "healthy" looks like and get told when the system drifts from that baseline.
Observability answers: why isn't it working? It provides the tools to investigate — prompt traces, input/output logging, evaluation scoring, chain visualization. Observability is about diagnosis. When monitoring tells you that latency spiked at 3pm, observability tools let you drill into the specific requests that caused it and understand why.
In practice, most teams need both. A monitoring tool tells you that cost per request increased 40% this week. An observability tool tells you that a prompt template change caused longer outputs. A monitoring tool tells you that a specific customer is unprofitable. An analytics tool tells you which features they use most and what model swap would fix the margin.
The overlap between these categories is growing. Some observability tools now include basic cost tracking. Some monitoring tools are adding trace-level detail. But the core distinction remains: monitoring is ongoing and automated (dashboards, alerts, trends), while observability is ad-hoc and investigative (traces, logs, evaluations). Choose your tools based on which problem you face most often.
MarginDash tracks cost per request, cost per customer, and margin across 400+ models from OpenAI, Anthropic, Google, and more. Set budget alerts per customer, per feature, or org-wide. Use the cost simulator to find cheaper models without sacrificing quality.
Start Monitoring LLM Costs →No credit card required
Prompt tracing, evaluation, and debugging for LLM applications. Understand why your AI features behave the way they do.
Track costs, usage, and performance across models and customers. Turn raw usage data into actionable business metrics.
Safety filters, content moderation, and output validation for production LLM deployments.
Strategies for controlling and optimizing AI API spend. Cost simulators, model swaps, and budget controls.
Create an account, install the SDK, and see your first margin data in minutes.
See My Margin DataNo credit card required