Category Guide
A four-pillar approach to managing AI costs in production — from instrumenting API calls to setting budget alerts that fire before spending gets out of control.
AI costs are variable, opaque, and distributed across providers. Two customers using the same feature can generate wildly different bills. A request that costs $0.01 on one model might cost $0.40 on another. Without a framework, these differences are invisible until the monthly invoice arrives.
The framework breaks down into four pillars, each building on the one before it:
Instrument every API call to capture model name, token counts, and timestamps. You cannot manage what you do not measure.
Connect costs to customers, features, and teams. Per-customer P&L shows who is profitable and who is underwater.
Use cost simulation and benchmark-aware model selection to reduce spend without sacrificing quality.
Set thresholds per customer, per feature, or org-wide. Get alerted before spending exceeds limits.
Tracking is the foundation. Every AI API call your application makes should be instrumented to capture the model name, input token count, output token count, and a timestamp. Without this data, everything else — attribution, optimization, budgeting — is guesswork.
The implementation is straightforward. After each API call, log the usage data from the response. OpenAI, Anthropic, and Google all return token counts in their API responses. The challenge is not capturing the data — it is doing it consistently across every call path in your application, including retries, fallbacks, and streaming responses where token counts arrive at the end of the stream.
Consistency matters more than granularity in the early stages. Missing a few edge-case API calls is less harmful than having inconsistent tracking across your main call paths. Start by instrumenting the primary features that make the most API calls, then expand to cover retries, background jobs, and internal tools. The goal is to reach full coverage, but partial coverage that is accurate is better than full coverage that is spotty.
Cost calculation requires a pricing database. Token counts alone do not tell you the cost. You need to know that GPT-4o charges one rate per million input tokens and a different rate per million output tokens, and that those rates differ from Claude or Gemini. Maintaining this pricing data manually is tedious — providers change pricing without warning, new models launch frequently, and every vendor structures pricing differently. A maintained pricing database that updates daily eliminates this maintenance burden.
Multi-provider complexity compounds the problem. Most production applications use models from more than one provider — OpenAI for chat, Anthropic for analysis, Google for embeddings. Each provider has its own pricing structure, its own usage dashboard, and its own billing cycle. Without a unified tracking layer, you are checking three or four dashboards and reconciling the numbers in a spreadsheet. A single tracking layer that normalizes data across providers gives you one source of truth for all AI costs.
With MarginDash, tracking takes a few lines of SDK code. The SDK captures model name and token counts, batches events automatically, and transmits them asynchronously. Cost calculation happens server-side against a pricing database covering 400+ models. Your application never needs to know what any model costs. For more on LLM analytics and the metrics that matter, see our dedicated guide.
Tracking tells you what you spent in total. Attribution tells you who and what drove that spend. This is the step most teams skip — and it is the step that transforms cost data from an accounting exercise into a business tool. Without attribution, you know your AI costs were $5,000 last month. With attribution, you know which ten customers drove $3,000 of that and which feature accounted for 60% of the total.
Per-customer attribution is the most valuable dimension. By tagging each API call with a customer identifier, you can build a per-customer P&L that shows revenue alongside cost. This reveals which customers are profitable, which are breaking even, and which are losing you money. For SaaS products that charge flat-rate subscriptions while incurring variable AI costs, this data is essential for pricing decisions.
Per-feature attribution tells you where your AI budget is going. Some features are token-light — classification, extraction, short completions. Others are token-heavy — chat, document analysis, code generation. Knowing that your document analysis feature consumes 60% of your AI budget while serving 10% of requests changes how you prioritize optimization work.
Per-team attribution matters for larger organizations. When multiple teams ship AI features independently, costs can grow without any single team noticing. Attribution by team lets engineering leadership see which teams are driving AI costs and whether those costs are justified by the value the features deliver. It also creates accountability — teams that see their own costs tend to optimize more aggressively than teams that share an opaque aggregate budget.
Revenue attribution closes the loop. Cost data alone tells you who is expensive. Revenue data tells you who is expensive relative to what they pay. A customer costing $50/month in API calls looks bad until you see they are on your $200/month enterprise plan. Another customer costing $15/month looks fine until you realize they are on a $10/month starter plan. Connecting cost to revenue produces per-customer margin — the metric that actually drives pricing and retention decisions.
MarginDash handles attribution by accepting a customer ID and an optional event type with each tracked API call. Connect your Stripe account and revenue data flows in automatically, giving you a complete per-customer P&L with cost, revenue, and margin percentage. No spreadsheet reconciliation required.
The alternative to automated attribution is manual reconciliation — exporting token logs, matching them to customer records, looking up pricing for each model, and calculating costs in a spreadsheet. This works once. It does not work as a recurring practice. The manual approach takes hours, drifts out of date immediately, and introduces errors that compound over time. Automated attribution scales with your application — as you add customers, features, and models, the attribution layer absorbs the complexity without additional effort.
Once you can see where costs are concentrated, you can start reducing them. The highest-leverage optimization in AI cost management is model selection — choosing the right model for each task rather than defaulting to the most expensive one.
Models with comparable benchmark scores can vary in price by 40x. A feature that runs on a frontier model because that is what the developer used during prototyping might work equally well on a balanced-tier model at a fraction of the cost. The difficulty is knowing which swaps are safe — which models score similarly on the benchmarks that matter for your use case (MMLU-Pro for reasoning, GPQA for scientific tasks, AIME for mathematical problem-solving) without degrading the user experience.
The key insight is that not every API call needs the same model. A classification task that returns "yes" or "no" does not need a frontier model. A customer-facing chat feature that handles nuanced questions probably does. Task-appropriate model selection — matching model capability to task complexity — is the single highest-leverage cost reduction available to most teams. But you cannot do it without attribution data that shows which features use which models and how much each one costs.
A cost simulator makes this practical. It takes your actual token usage data and reprices every event against every model in the pricing database, ranked by intelligence-per-dollar. The simulator filters out models that drop more than 10% on benchmarks or cannot handle your context window requirements. You see projected savings before making any changes — not estimates based on averages, but calculations based on your real workload.
Beyond model selection, other optimization levers include prompt engineering (shorter prompts that achieve the same result with fewer tokens), caching (serving repeated requests from cache instead of making new API calls), and routing (sending simple requests to cheap models and complex requests to expensive ones). Each of these requires the tracking and attribution data from the first two pillars to identify where to apply them.
The optimization cycle is not a one-time event. New models are released regularly, and each release changes the cost-quality landscape. A model that was the best value six months ago may now be outperformed by a cheaper alternative. Continuous cost management means the pricing database updates, the simulator reprices your workload against new options, and you can see immediately whether a new model would save you money without compromising quality. For a deeper look at optimization strategies, see our guide on LLM cost optimization.
Budgeting is the control layer. Tracking, attribution, and optimization are retrospective — they analyze what already happened. Budgeting is prospective — it prevents problems before they occur. Without budgeting, cost management is reactive — you find out about overruns after the fact. With budgeting, you catch them before they happen.
Budget alerts are the most immediate tool. Set a spending threshold per customer, per feature, or across the entire organization, and get notified by email before spending exceeds it. This catches runaway costs early — a single customer whose usage spikes, a feature whose prompts got longer after a code change, or a batch job that processes more documents than expected.
Trend-based forecasting extends budgeting into the future. If your AI costs are growing 15% month-over-month, you need to know that before it becomes a problem — not after. Forecasting uses historical cost data to project future spending, giving you time to optimize models, adjust pricing, or set new usage limits before costs exceed what the business can support.
Usage limits are the enforcement mechanism. When a customer or feature approaches its budget, you need options: alert the team, throttle requests to a cheaper model, or cap usage until the next billing cycle. The right response depends on the customer and the feature — which is why budget controls need to work at the per-customer and per-feature level, not just at the organization level.
MarginDash supports budget alerts at the customer, feature, and organization level. Set a threshold, and you receive an email when spending approaches the limit. Combined with per-customer attribution, this means you can set different budgets for different customers — tighter limits for free-tier users, higher limits for enterprise accounts. For more on monitoring approaches, see our guide on LLM monitoring.
The budget pillar also informs pricing strategy. If your per-customer cost data shows that heavy users cost $40/month in AI calls and your plan charges $49/month, your margin is thin and one pricing change from a provider could make those customers unprofitable. Budgeting is not just about preventing overruns — it is about understanding the relationship between your cost structure and your pricing tiers so you can adjust before problems emerge.
Only looking at aggregate bills. The total monthly bill tells you what you spent. It does not tell you who drove that spend. Aggregate data hides the customers who cost 10x more than the average and the features that consume 80% of your budget. Per-customer attribution is what turns a number into a decision.
Using one model for everything. Teams that default to a frontier model for all features are overpaying for simple tasks. The cost difference between a frontier model and a balanced-tier model that scores within 5% on benchmarks can be 10x to 40x. Matching model capability to task complexity is the single highest-leverage cost reduction.
Ignoring output token costs. Output tokens are typically 3x to 5x more expensive than input tokens. A feature that generates long responses has a very different cost profile than one that generates short responses. Understanding the input-to-output ratio per feature is essential for accurate cost projections.
Maintaining pricing data manually. Hardcoding model prices in a spreadsheet works until a provider changes pricing — which happens frequently and without notice. Manual pricing data drifts out of date silently, making your cost calculations wrong without any indication.
MarginDash gives you the full AI cost management framework — track costs across 400+ models, attribute them to customers and features, simulate savings with the cost simulator, and set budget alerts before spending gets out of control. Free tier available. Set up in 5 minutes.
Start Managing AI Costs →No credit card required
Create an account, install the SDK, and see your first margin data in minutes.
See My Margin DataNo credit card required