Model Comparison
OpenAI's GPT-4.1 nano costs less per intelligence point, even though Anthropic's Claude 3.5 Sonnet scores higher.
Data last updated March 4, 2026
GPT-4.1 nano delivers more intelligence per dollar, while Claude 3.5 Sonnet leads on raw benchmark scores. Claude 3.5 Sonnet costs $0.03 per request vs $0.0009 for GPT-4.1 nano (at 5K input / 1K output tokens). The question is whether Claude 3.5 Sonnet's higher scores justify the 33x price premium.
| Metric | Claude 3.5 Sonnet | GPT-4.1 nano |
|---|---|---|
| Intelligence IndexComposite score from MMLU-Pro, GPQA, and AIME. Higher is better. | 15.9 | 14.9 |
| MMLU-ProGeneral knowledge and reasoning. Higher is better. | 0.8 | 0.7 |
| GPQAGraduate-level science questions. Higher is better. | 0.6 | 0.5 |
| AIMEMathematical problem solving. Higher is better. | 0.2 | 0.2 |
| Context windowMax tokens per request. Larger handles more text. | 200,000 | 1,047,576 |
List prices as published by the provider. Not adjusted for token efficiency.
| Metric | Claude 3.5 Sonnet | GPT-4.1 nano |
|---|---|---|
| Input price / 1M tokens | $3.00 | $0.10 |
| Output price / 1M tokens | $15.00 | $0.40 |
| Cache hit price / 1M tokens | $0.30 | $0.02 |
Cost per IQ point based on a typical request of 5,000 input and 1,000 output tokens.
Cheaper (list price)
GPT-4.1 nano
Higher Benchmarks
Claude 3.5 Sonnet
Better Value ($/IQ point)
GPT-4.1 nano
Claude 3.5 Sonnet
$0.0019 / IQ point
GPT-4.1 nano
$0.00006 / IQ point
GPT-4.1 nano is dramatically cheaper — 33x less per request than Claude 3.5 Sonnet. GPT-4.1 nano is cheaper on both input ($0.1/M vs $3.0/M) and output ($0.4/M vs $15.0/M). At a fraction of the cost, GPT-4.1 nano saves significantly in production workloads. This comparison assumes a typical request of 5,000 input and 1,000 output tokens (5:1 ratio). Actual ratios vary by workload — chat and completion tasks typically run 2:1, code review around 3:1, document analysis and summarization 10:1 to 50:1, and embedding workloads are pure input with no output tokens.
Claude 3.5 Sonnet scores higher overall (15.9 vs 14.9). Claude 3.5 Sonnet leads on MMLU-Pro (0.77 vs 0.66) and GPQA (0.6 vs 0.51), while GPT-4.1 nano leads on AIME (0.24 vs 0.16). If mathematical reasoning matters, GPT-4.1 nano's AIME score of 0.24 gives it an edge.
GPT-4.1 nano has a much larger context window — 1,047,576 tokens vs Claude 3.5 Sonnet at 200,000 tokens. That's roughly 1,396 vs 266 pages of text. GPT-4.1 nano's window can handle entire codebases or book-length documents; Claude 3.5 Sonnet works better for shorter inputs.
GPT-4.1 nano offers dramatically better value — $0.00006 per intelligence point vs Claude 3.5 Sonnet at $0.0019. GPT-4.1 nano is cheaper, which offsets Claude 3.5 Sonnet's higher benchmark scores to deliver more value per dollar. If raw benchmark scores matter less than cost for your use case, GPT-4.1 nano is the efficient choice.
With prompt caching, GPT-4.1 nano is dramatically cheaper — 31x less per request than Claude 3.5 Sonnet. Caching saves 45% on Claude 3.5 Sonnet and 42% on GPT-4.1 nano compared to standard input prices. Both models benefit from caching at similar rates, so the uncached price comparison holds.
Pricing verified against official vendor documentation. Updated daily. See our methodology.
Related Comparisons
Create an account, install the SDK, and see your first margin data in minutes.
See My Margin DataNo credit card required