Model Comparison
Google's Gemini 2.0 Flash costs less per intelligence point, even though Anthropic's Claude 4 Opus (Non-reasoning) scores higher.
Data last updated March 4, 2026
Gemini 2.0 Flash delivers more intelligence per dollar, while Claude 4 Opus (Non-reasoning) leads on raw benchmark scores. Claude 4 Opus (Non-reasoning) costs $0.15 per request vs $0.0009 for Gemini 2.0 Flash (at 5K input / 1K output tokens). Claude 4 Opus (Non-reasoning) scores proportionally higher on mathematical reasoning (AIME: 0.56), while Gemini 2.0 Flash's scores skew toward general knowledge (MMLU-Pro: 0.78). The question is whether Claude 4 Opus (Non-reasoning)'s higher scores justify the 167x price premium.
| Metric | Claude 4 Opus (Non-reasoning) | Gemini 2.0 Flash |
|---|---|---|
| Intelligence IndexComposite score from MMLU-Pro, GPQA, and AIME. Higher is better. | 22.2 | 18.5 |
| MMLU-ProGeneral knowledge and reasoning. Higher is better. | 0.9 | 0.8 |
| GPQAGraduate-level science questions. Higher is better. | 0.7 | 0.6 |
| AIMEMathematical problem solving. Higher is better. | 0.6 | 0.3 |
| Context windowMax tokens per request. Larger handles more text. | 1,000,000 | 1,000,000 |
List prices as published by the provider. Not adjusted for token efficiency.
| Metric | Claude 4 Opus (Non-reasoning) | Gemini 2.0 Flash |
|---|---|---|
| Input price / 1M tokens | $15.00 | $0.10 |
| Output price / 1M tokens | $75.00 | $0.40 |
| Cache hit price / 1M tokens | $1.50 | $0.02 |
Cost per IQ point based on a typical request of 5,000 input and 1,000 output tokens.
Cheaper (list price)
Gemini 2.0 Flash
Higher Benchmarks
Claude 4 Opus (Non-reasoning)
Better Value ($/IQ point)
Gemini 2.0 Flash
Claude 4 Opus (Non-reasoning)
$0.0068 / IQ point
Gemini 2.0 Flash
$0.000049 / IQ point
Gemini 2.0 Flash is dramatically cheaper — 167x less per request than Claude 4 Opus (Non-reasoning). Gemini 2.0 Flash is cheaper on both input ($0.1/M vs $15.0/M) and output ($0.4/M vs $75.0/M). At a fraction of the cost, Gemini 2.0 Flash saves significantly in production workloads. This comparison assumes a typical request of 5,000 input and 1,000 output tokens (5:1 ratio). Actual ratios vary by workload — chat and completion tasks typically run 2:1, code review around 3:1, document analysis and summarization 10:1 to 50:1, and embedding workloads are pure input with no output tokens.
Claude 4 Opus (Non-reasoning) scores higher overall (22.2 vs 18.5). Claude 4 Opus (Non-reasoning) leads on MMLU-Pro (0.86 vs 0.78), GPQA (0.7 vs 0.62), AIME (0.56 vs 0.33). Claude 4 Opus (Non-reasoning) scores proportionally higher on AIME (mathematical reasoning) relative to its MMLU-Pro, while Gemini 2.0 Flash's scores are more weighted toward general knowledge. If mathematical reasoning matters, Claude 4 Opus (Non-reasoning)'s AIME score of 0.56 gives it an edge.
Claude 4 Opus (Non-reasoning) and Gemini 2.0 Flash have the same context window of 1,000,000 tokens (roughly 1,333 pages of text). Both windows are large enough for most production workloads.
Gemini 2.0 Flash offers dramatically better value — $0.000049 per intelligence point vs Claude 4 Opus (Non-reasoning) at $0.0068. Gemini 2.0 Flash is cheaper, which offsets Claude 4 Opus (Non-reasoning)'s higher benchmark scores to deliver more value per dollar. If raw benchmark scores matter less than cost for your use case, Gemini 2.0 Flash is the efficient choice.
With prompt caching, Gemini 2.0 Flash is dramatically cheaper — 157x less per request than Claude 4 Opus (Non-reasoning). Caching saves 45% on Claude 4 Opus (Non-reasoning) and 42% on Gemini 2.0 Flash compared to standard input prices. Both models benefit from caching at similar rates, so the uncached price comparison holds.
Pricing verified against official vendor documentation. Updated daily. See our methodology.
Related Comparisons
Create an account, install the SDK, and see your first margin data in minutes.
See My Margin DataNo credit card required