DeepSeek V4 Benchmark wordmark

Public benchmark snapshot

DeepSeek V4 Benchmark

A static keyword page for people searching deepseek v4 benchmark, built from vendor-published model cards, API docs, and launch posts captured on April 24, 2026.

DeepSeek V4 is still a preview release. This page preserves published numbers and marks missing data instead of filling gaps.
Tracked models
4
Official sources
8
Update date
2026-04-24
Page type
Static single page

Snapshot

What stands out in the public record right now

The page focuses on published signals that matter for buyers, builders, and people comparing model families quickly.

Reasoning surface

DeepSeek V4 is credible, but not the public leader.

DeepSeek V4 Pro Max reports GPQA Diamond 90.1 and HLE 37.7. In the same public snapshot set, Gemini 3.1 Pro reports GPQA 94.3 and ARC-AGI-2 77.1.

DS V4: GPQA 90.1 Gemini 3.1 Pro: GPQA 94.3

Coding pressure

DeepSeek V4's strongest public case is coding.

DeepSeek V4 Pro Max posts LiveCodeBench 93.5, Codeforces 3206, and SWE-Bench Verified 80.6, which is enough to keep it in the same conversation as the top closed models.

LiveCodeBench 93.5 Codeforces 3206

Long context

The market has moved to 1M-scale context.

DeepSeek V4, Claude Opus 4.6, and Gemini 3.1 Pro all publish 1M token context support. OpenAI's GPT-5.4 API page lists a 1,050,000 token context window.

DeepSeek: 1M OpenAI: 1.05M Anthropic: 1M beta

Cost view

DeepSeek V4 pricing is the biggest missing public field.

OpenAI, Anthropic, and Google expose public API rates. DeepSeek's current public pricing pages still describe legacy model IDs, so the DeepSeek V4 row stays marked as Not publicly disclosed.

GPT-5.4: $2.50 / $15 Opus 4.6: $5 / $25 Gemini 3.1 Pro: $2 / $12

Comparison

DeepSeek V4 vs GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro

These are published highlights, not normalized lab reruns. Use them to understand positioning, then verify the original source before making a technical or purchasing decision.

Model Public status Published highlights Context and price Read
DeepSeek V4 Pro Max DeepSeek Preview released on April 24, 2026.
  • GPQA Diamond 90.1, HLE 37.7
  • LiveCodeBench 93.5, Codeforces 3206
  • SWE-Bench Verified 80.6, Terminal-Bench 67.9
  • 1M token context
  • API IDs: deepseek-v4-pro, deepseek-v4-flash
  • Price: Not publicly disclosed on a dedicated V4 pricing page
GPT-5.4 OpenAI Released on March 5, 2026.
  • ARC-AGI-2 73.3, GPQA Diamond 92.8
  • Terminal-Bench 75.1, SWE-Bench Pro 57.7
  • BrowseComp 82.7, HLE with tools 52.1
  • 1,050,000 token context
  • Input $2.50 / output $15 per 1M tokens
  • Supports search, file search, computer use, and MCP
Claude Opus 4.6 Anthropic Released on February 5, 2026.
  • HLE 40.0 no tools, 53.0 with tools
  • MRCR v2 1M score 76%
  • Pricing page unchanged while coding and search performance moved up
  • 1M token context in beta
  • Input $5 / output $25 per 1M tokens
  • Premium pricing applies above 200k tokens
Gemini 3.1 Pro Google Preview released on February 19, 2026.
  • ARC-AGI-2 77.1, GPQA Diamond 94.3
  • Terminal-Bench 68.5, SWE-Bench Verified 80.6
  • BrowseComp 85.9, MRCR v2 128k 84.9
  • 1M token context, 64k output
  • $2 / $12 at 200k or less, $4 / $18 above 200k
  • Preview only in Gemini API and related Google products

Reading tip: if you need a single headline, DeepSeek V4's public strength is coding. If you need the strongest public reasoning snapshot in this set, Gemini 3.1 Pro and GPT-5.4 still publish the sharper top-end numbers.

Methodology

How this benchmark page stays honest

The page avoids invented rollups, hidden scoring, and synthetic averages. Everything here maps back to a public vendor page.

Source rules

  • Only first-party launch posts, API docs, model cards, or technical reports.
  • Every benchmark row preserves the model name and reasoning mode the vendor published.
  • Missing public fields stay missing. This is why DeepSeek V4 pricing is still blank here.

Comparability rules

  • Benchmarks are not normalized across harnesses, tool settings, or search blocklists.
  • Tool-enabled and no-tool numbers stay labeled separately instead of being merged.
  • Context size alone does not mean equal long-context quality.

DeepSeek-specific note

  • The page tracks the new V4 preview family, not legacy deepseek-chat or deepseek-reasoner.
  • The official API quickstart already lists deepseek-v4-pro and deepseek-v4-flash.
  • Public V4 pricing had not landed in the pricing docs as of April 24, 2026.

FAQ

Questions people searching "deepseek v4 benchmark" usually ask

What is DeepSeek V4 Benchmark?

It is a static landing page that aggregates public benchmark, context window, and pricing signals for DeepSeek V4 and nearby frontier models so the search intent lands on something factual instead of speculation.

Is DeepSeek V4 actually available?

Yes. DeepSeek announced the V4 preview on Friday, April 24, 2026, and its API docs now list deepseek-v4-pro and deepseek-v4-flash.

Does DeepSeek V4 outperform GPT-5.4, Claude Opus 4.6, or Gemini 3.1 Pro?

Not across the board. DeepSeek V4 Pro Max looks strongest in coding-heavy public benchmarks, while GPT-5.4 and Gemini 3.1 Pro still publish stronger reasoning and agentic-search results, and Claude Opus 4.6 remains very competitive on long-running coding and knowledge work.

Why are some numbers impossible to compare directly?

Vendors use different harnesses, different reasoning budgets, and sometimes different tool stacks. That is why this page uses the phrase "public snapshot" instead of pretending these rows are a single normalized leaderboard.

Why is DeepSeek V4 pricing marked undisclosed?

Because the public DeepSeek pricing pages still describe legacy model IDs and do not yet publish a dedicated V4 API table. Making up an inferred price would be less useful than leaving the field empty.

Sources

Official pages used for this version

Open each source directly if you need the exact benchmark table, pricing page, or launch date context.