Model Comparison

Claude 4.5 Sonnet (Non-reasoning) vs GPT-4.1

Anthropic vs OpenAI

OpenAI's GPT-4.1 costs less per intelligence point, even though Anthropic's Claude 4.5 Sonnet (Non-reasoning) scores higher.

Data last updated March 4, 2026

GPT-4.1 delivers more intelligence per dollar, while Claude 4.5 Sonnet (Non-reasoning) leads on raw benchmark scores. Claude 4.5 Sonnet (Non-reasoning) costs $0.03 per request vs $0.018 for GPT-4.1 (at 5K input / 1K output tokens). GPT-4.1 scores proportionally higher on mathematical reasoning (AIME: 0.44), while Claude 4.5 Sonnet (Non-reasoning)'s scores skew toward general knowledge (MMLU-Pro: 0.86). The question is whether Claude 4.5 Sonnet (Non-reasoning)'s higher scores justify paying 67% more.

Benchmarks & Performance

Metric Claude 4.5 Sonnet (Non-reasoning) GPT-4.1
Intelligence Index 37.1 26.3
MMLU-Pro 0.9 0.8
GPQA 0.7 0.7
Output tokens/sec 45.3 77.2
Time to first token 1.20s 0.51s
Context window 1,000,000 1,047,576

Pricing per 1M Tokens

List prices as published by the provider. Not adjusted for token efficiency.

Metric Claude 4.5 Sonnet (Non-reasoning) GPT-4.1
Input price / 1M tokens $3.00 $2.00
Output price / 1M tokens $15.00 $8.00
Cache hit price / 1M tokens $0.30 $0.50

Intelligence vs Price

15 20 25 30 35 40 $0.002 $0.005 $0.01 $0.02 $0.05 Typical request cost (5K input + 1K output) Intelligence Index Gemini 2.5 Pro DeepSeek R1 0528 GPT-4.1 mini Claude 4 Sonnet... Gemini 2.5 Flas... Grok 3 mini Rea... Claude 4.5 Sonnet (Non-reasoning) GPT-4.1
Claude 4.5 Sonnet (Non-reasoning) GPT-4.1 Other models

Value Analysis

Cost per IQ point based on a typical request of 5,000 input and 1,000 output tokens.

Cheaper (list price)

GPT-4.1

Higher Benchmarks

Claude 4.5 Sonnet (Non-reasoning)

Better Value ($/IQ point)

GPT-4.1

Claude 4.5 Sonnet (Non-reasoning)

$0.0008 / IQ point

GPT-4.1

$0.0007 / IQ point

Frequently Asked Questions

What's the price difference between Claude 4.5 Sonnet (Non-reasoning) and GPT-4.1?

GPT-4.1 is 67% cheaper per request than Claude 4.5 Sonnet (Non-reasoning). GPT-4.1 is cheaper on both input ($2.0/M vs $3.0/M) and output ($8.0/M vs $15.0/M). The 67% price gap matters at scale but is less significant for low-volume use cases. This comparison assumes a typical request of 5,000 input and 1,000 output tokens (5:1 ratio). Actual ratios vary by workload — chat and completion tasks typically run 2:1, code review around 3:1, document analysis and summarization 10:1 to 50:1, and embedding workloads are pure input with no output tokens.

How much does Claude 4.5 Sonnet (Non-reasoning) outperform GPT-4.1 on benchmarks?

Claude 4.5 Sonnet (Non-reasoning) scores higher overall (37.1 vs 26.3). Claude 4.5 Sonnet (Non-reasoning) leads on MMLU-Pro (0.86 vs 0.81), GPQA (0.73 vs 0.67). GPT-4.1 scores proportionally higher on AIME (mathematical reasoning) relative to its MMLU-Pro, while Claude 4.5 Sonnet (Non-reasoning)'s scores are more weighted toward general knowledge. Claude 4.5 Sonnet (Non-reasoning)'s GPQA score of 0.73 makes it stronger for technical and scientific tasks.

Which generates output faster, Claude 4.5 Sonnet (Non-reasoning) or GPT-4.1?

GPT-4.1 is 70% faster at 77.2 tokens per second compared to Claude 4.5 Sonnet (Non-reasoning) at 45.3 tokens per second. GPT-4.1 also starts generating sooner at 0.51s vs 1.20s time to first token. The speed difference matters for chatbots but is less relevant in batch processing.

Which has a larger context window, Claude 4.5 Sonnet (Non-reasoning) or GPT-4.1?

GPT-4.1 has a 5% larger context window at 1,047,576 tokens vs Claude 4.5 Sonnet (Non-reasoning) at 1,000,000 tokens. That's roughly 1,396 vs 1,333 pages of text. The extra context capacity in GPT-4.1 matters for document analysis and long conversations.

Which model is better value for money, Claude 4.5 Sonnet (Non-reasoning) or GPT-4.1?

GPT-4.1 offers 18% better value at $0.0007 per intelligence point compared to Claude 4.5 Sonnet (Non-reasoning) at $0.0008. GPT-4.1 is cheaper, which offsets Claude 4.5 Sonnet (Non-reasoning)'s higher benchmark scores to deliver more value per dollar. If raw benchmark scores matter less than cost for your use case, GPT-4.1 is the efficient choice.

How does prompt caching affect Claude 4.5 Sonnet (Non-reasoning) and GPT-4.1 pricing?

With prompt caching, GPT-4.1 is 57% cheaper per request than Claude 4.5 Sonnet (Non-reasoning). Caching saves 45% on Claude 4.5 Sonnet (Non-reasoning) and 42% on GPT-4.1 compared to standard input prices. Both models benefit from caching at similar rates, so the uncached price comparison holds.

Pricing verified against official vendor documentation. Updated daily. See our methodology.

Related Comparisons

Stop guessing. Start measuring.

Create an account, install the SDK, and see your first margin data in minutes.

See My Margin Data

No credit card required