Model Comparison
OpenAI's GPT-4.1 beats Anthropic's Claude 4 Opus (Non-reasoning) on both price and benchmarks — here's the full breakdown.
Data last updated March 4, 2026
GPT-4.1 is the clear winner — cheaper and higher-scoring than Claude 4 Opus (Non-reasoning). Claude 4 Opus (Non-reasoning) costs $0.15 per request vs $0.018 for GPT-4.1 (at 5K input / 1K output tokens). GPT-4.1 generates tokens at 77/sec vs Claude 4 Opus (Non-reasoning)'s 36/sec. There's little reason to choose Claude 4 Opus (Non-reasoning) unless you need its specific API features or ecosystem.
| Metric | Claude 4 Opus (Non-reasoning) | GPT-4.1 |
|---|---|---|
| Intelligence IndexComposite score from MMLU-Pro, GPQA, and AIME. Higher is better. | 22.2 | 26.3 |
| MMLU-ProGeneral knowledge and reasoning. Higher is better. | 0.9 | 0.8 |
| GPQAGraduate-level science questions. Higher is better. | 0.7 | 0.7 |
| AIMEMathematical problem solving. Higher is better. | 0.6 | 0.4 |
| Output tokens/secTokens generated per second. Higher means faster responses. | 36.2 | 77.2 |
| Time to first tokenSeconds until first token. Lower is better. | 1.34s | 0.51s |
| Context windowMax tokens per request. Larger handles more text. | 1,000,000 | 1,047,576 |
List prices as published by the provider. Not adjusted for token efficiency.
| Metric | Claude 4 Opus (Non-reasoning) | GPT-4.1 |
|---|---|---|
| Input price / 1M tokens | $15.00 | $2.00 |
| Output price / 1M tokens | $75.00 | $8.00 |
| Cache hit price / 1M tokens | $1.50 | $0.50 |
Cost per IQ point based on a typical request of 5,000 input and 1,000 output tokens.
Cheaper (list price)
GPT-4.1
Higher Benchmarks
GPT-4.1
Better Value ($/IQ point)
GPT-4.1
Claude 4 Opus (Non-reasoning)
$0.0068 / IQ point
GPT-4.1
$0.0007 / IQ point
GPT-4.1 is dramatically cheaper — 8x less per request than Claude 4 Opus (Non-reasoning). GPT-4.1 is cheaper on both input ($2.0/M vs $15.0/M) and output ($8.0/M vs $75.0/M). At a fraction of the cost, GPT-4.1 saves significantly in production workloads. This comparison assumes a typical request of 5,000 input and 1,000 output tokens (5:1 ratio). Actual ratios vary by workload — chat and completion tasks typically run 2:1, code review around 3:1, document analysis and summarization 10:1 to 50:1, and embedding workloads are pure input with no output tokens.
GPT-4.1 scores higher overall (26.3 vs 22.2). Claude 4 Opus (Non-reasoning) leads on MMLU-Pro (0.86 vs 0.81) and AIME (0.56 vs 0.44), with both within 5% on GPQA. If mathematical reasoning matters, Claude 4 Opus (Non-reasoning)'s AIME score of 0.56 gives it an edge.
GPT-4.1 is 113% faster at 77.2 tokens per second compared to Claude 4 Opus (Non-reasoning) at 36.2 tokens per second. GPT-4.1 also starts generating sooner at 0.51s vs 1.34s time to first token. The speed difference matters for chatbots but is less relevant in batch processing.
GPT-4.1 has a 5% larger context window at 1,047,576 tokens vs Claude 4 Opus (Non-reasoning) at 1,000,000 tokens. That's roughly 1,396 vs 1,333 pages of text. The extra context capacity in GPT-4.1 matters for document analysis and long conversations.
GPT-4.1 offers dramatically better value — $0.0007 per intelligence point vs Claude 4 Opus (Non-reasoning) at $0.0068. GPT-4.1 is both cheaper and higher-scoring, making it the clear value pick. You don't sacrifice quality to save money with GPT-4.1.
With prompt caching, GPT-4.1 is dramatically cheaper — 8x less per request than Claude 4 Opus (Non-reasoning). Caching saves 45% on Claude 4 Opus (Non-reasoning) and 42% on GPT-4.1 compared to standard input prices. Both models benefit from caching at similar rates, so the uncached price comparison holds.
Pricing verified against official vendor documentation. Updated daily. See our methodology.
Related Comparisons
Create an account, install the SDK, and see your first margin data in minutes.
See My Margin DataNo credit card required