Model Comparison

Claude 3.5 Sonnet vs GPT-4.1

Anthropic vs OpenAI

OpenAI's GPT-4.1 beats Anthropic's Claude 3.5 Sonnet on both price and benchmarks — here's the full breakdown.

Data last updated March 4, 2026

GPT-4.1 is the clear winner — cheaper and higher-scoring than Claude 3.5 Sonnet. Claude 3.5 Sonnet costs $0.03 per request vs $0.018 for GPT-4.1 (at 5K input / 1K output tokens). GPT-4.1 scores proportionally higher on mathematical reasoning (AIME: 0.44), while Claude 3.5 Sonnet's scores skew toward general knowledge (MMLU-Pro: 0.77). Claude 3.5 Sonnet's only edge might be vendor-specific features or API ecosystem.

Benchmarks & Performance

Metric Claude 3.5 Sonnet GPT-4.1
Intelligence Index 15.9 26.3
MMLU-Pro 0.8 0.8
GPQA 0.6 0.7
AIME 0.2 0.4
Context window 200,000 1,047,576

Pricing per 1M Tokens

List prices as published by the provider. Not adjusted for token efficiency.

Metric Claude 3.5 Sonnet GPT-4.1
Input price / 1M tokens $3.00 $2.00
Output price / 1M tokens $15.00 $8.00
Cache hit price / 1M tokens $0.30 $0.50

Intelligence vs Price

10 15 20 25 30 35 40 $0.002 $0.005 $0.01 $0.02 $0.05 Typical request cost (5K input + 1K output) Intelligence Index Gemini 2.5 Pro DeepSeek R1 0528 GPT-4.1 mini Claude 4 Sonnet... Claude 4.5 Sonn... Gemini 2.5 Flas... Grok 3 mini Rea... Claude 3.5 Sonnet GPT-4.1
Claude 3.5 Sonnet GPT-4.1 Other models

Value Analysis

Cost per IQ point based on a typical request of 5,000 input and 1,000 output tokens.

Cheaper (list price)

GPT-4.1

Higher Benchmarks

GPT-4.1

Better Value ($/IQ point)

GPT-4.1

Claude 3.5 Sonnet

$0.0019 / IQ point

GPT-4.1

$0.0007 / IQ point

Frequently Asked Questions

What's the price difference between Claude 3.5 Sonnet and GPT-4.1?

GPT-4.1 is 67% cheaper per request than Claude 3.5 Sonnet. GPT-4.1 is cheaper on both input ($2.0/M vs $3.0/M) and output ($8.0/M vs $15.0/M). The 67% price gap matters at scale but is less significant for low-volume use cases. This comparison assumes a typical request of 5,000 input and 1,000 output tokens (5:1 ratio). Actual ratios vary by workload — chat and completion tasks typically run 2:1, code review around 3:1, document analysis and summarization 10:1 to 50:1, and embedding workloads are pure input with no output tokens.

How much does GPT-4.1 outperform Claude 3.5 Sonnet on benchmarks?

GPT-4.1 scores higher overall (26.3 vs 15.9). GPT-4.1 leads on GPQA (0.67 vs 0.6) and AIME (0.44 vs 0.16), with both within 5% on MMLU-Pro. GPT-4.1 scores proportionally higher on AIME (mathematical reasoning) relative to its MMLU-Pro, while Claude 3.5 Sonnet's scores are more weighted toward general knowledge. If mathematical reasoning matters, GPT-4.1's AIME score of 0.44 gives it an edge.

How much more context can GPT-4.1 handle than Claude 3.5 Sonnet?

GPT-4.1 has a much larger context window — 1,047,576 tokens vs Claude 3.5 Sonnet at 200,000 tokens. That's roughly 1,396 vs 266 pages of text. GPT-4.1's window can handle entire codebases or book-length documents; Claude 3.5 Sonnet works better for shorter inputs.

Which model is better value for money, Claude 3.5 Sonnet or GPT-4.1?

GPT-4.1 offers 176% better value at $0.0007 per intelligence point compared to Claude 3.5 Sonnet at $0.0019. GPT-4.1 is both cheaper and higher-scoring, making it the clear value pick. You don't sacrifice quality to save money with GPT-4.1.

How does prompt caching affect Claude 3.5 Sonnet and GPT-4.1 pricing?

With prompt caching, GPT-4.1 is 57% cheaper per request than Claude 3.5 Sonnet. Caching saves 45% on Claude 3.5 Sonnet and 42% on GPT-4.1 compared to standard input prices. Both models benefit from caching at similar rates, so the uncached price comparison holds.

Pricing verified against official vendor documentation. Updated daily. See our methodology.

Related Comparisons

Stop guessing. Start measuring.

Create an account, install the SDK, and see your first margin data in minutes.

See My Margin Data

No credit card required