Model Comparison

Claude 3.5 Sonnet vs GPT-4o mini

Anthropic vs OpenAI

OpenAI's GPT-4o mini costs less per intelligence point, even though Anthropic's Claude 3.5 Sonnet scores higher.

Data last updated March 4, 2026

GPT-4o mini delivers more intelligence per dollar, while Claude 3.5 Sonnet leads on raw benchmark scores. Claude 3.5 Sonnet costs $0.03 per request vs $0.0014 for GPT-4o mini (at 5K input / 1K output tokens). The question is whether Claude 3.5 Sonnet's higher scores justify the 22x price premium.

Benchmarks & Performance

Metric Claude 3.5 Sonnet GPT-4o mini
Intelligence Index 15.9 12.6
MMLU-Pro 0.8 0.6
GPQA 0.6 0.4
AIME 0.2 0.1
Context window 200,000 128,000

Pricing per 1M Tokens

List prices as published by the provider. Not adjusted for token efficiency.

Metric Claude 3.5 Sonnet GPT-4o mini
Input price / 1M tokens $3.00 $0.15
Output price / 1M tokens $15.00 $0.60
Cache hit price / 1M tokens $0.30 $0.08

Intelligence vs Price

10 15 20 25 30 35 40 $0.001 $0.002 $0.005 $0.01 $0.02 $0.05 Typical request cost (5K input + 1K output) Intelligence Index Gemini 2.5 Pro DeepSeek R1 0528 GPT-4.1 GPT-4.1 mini Claude 4 Sonnet... Claude 4.5 Sonn... Gemini 2.5 Flas... Grok 3 mini Rea... Claude 3.5 Sonnet GPT-4o mini
Claude 3.5 Sonnet GPT-4o mini Other models

Value Analysis

Cost per IQ point based on a typical request of 5,000 input and 1,000 output tokens.

Cheaper (list price)

GPT-4o mini

Higher Benchmarks

Claude 3.5 Sonnet

Better Value ($/IQ point)

GPT-4o mini

Claude 3.5 Sonnet

$0.0019 / IQ point

GPT-4o mini

$0.0001 / IQ point

Frequently Asked Questions

How much cheaper is GPT-4o mini than Claude 3.5 Sonnet?

GPT-4o mini is dramatically cheaper — 22x less per request than Claude 3.5 Sonnet. GPT-4o mini is cheaper on both input ($0.15/M vs $3.0/M) and output ($0.6/M vs $15.0/M). At a fraction of the cost, GPT-4o mini saves significantly in production workloads. This comparison assumes a typical request of 5,000 input and 1,000 output tokens (5:1 ratio). Actual ratios vary by workload — chat and completion tasks typically run 2:1, code review around 3:1, document analysis and summarization 10:1 to 50:1, and embedding workloads are pure input with no output tokens.

How much does Claude 3.5 Sonnet outperform GPT-4o mini on benchmarks?

Claude 3.5 Sonnet scores higher overall (15.9 vs 12.6). Claude 3.5 Sonnet leads on MMLU-Pro (0.77 vs 0.65), GPQA (0.6 vs 0.43), AIME (0.16 vs 0.12). Claude 3.5 Sonnet's GPQA score of 0.6 makes it stronger for technical and scientific tasks.

Which has a larger context window, Claude 3.5 Sonnet or GPT-4o mini?

Claude 3.5 Sonnet has a 56% larger context window at 200,000 tokens vs GPT-4o mini at 128,000 tokens. That's roughly 266 vs 170 pages of text. The extra context capacity in Claude 3.5 Sonnet matters for document analysis and long conversations.

Is GPT-4o mini worth choosing over Claude 3.5 Sonnet on value alone?

GPT-4o mini offers dramatically better value — $0.0001 per intelligence point vs Claude 3.5 Sonnet at $0.0019. GPT-4o mini is cheaper, which offsets Claude 3.5 Sonnet's higher benchmark scores to deliver more value per dollar. If raw benchmark scores matter less than cost for your use case, GPT-4o mini is the efficient choice.

Which model benefits more from prompt caching, Claude 3.5 Sonnet or GPT-4o mini?

With prompt caching, GPT-4o mini is dramatically cheaper — 17x less per request than Claude 3.5 Sonnet. Caching saves 45% on Claude 3.5 Sonnet and 28% on GPT-4o mini compared to standard input prices. Claude 3.5 Sonnet benefits more from caching. If your workload has repetitive prompts, Claude 3.5 Sonnet's cache discount gives it a bigger cost advantage than list prices suggest.

Pricing verified against official vendor documentation. Updated daily. See our methodology.

Related Comparisons

Stop guessing. Start measuring.

Create an account, install the SDK, and see your first margin data in minutes.

See My Margin Data

No credit card required