On April 24, 2026, DeepSeek released preview builds of V4-Pro and V4-Flash, two open-weights models that bring the frontier within reach of any business willing to download a file. The benchmark numbers approach GPT-5.5 and Claude Opus 4.7. The license is MIT. The API is roughly a twentieth of the closed-source equivalent. The strategic question for buyers has shifted in a single weekend.
What DeepSeek Actually Shipped
DeepSeek published two new mixture-of-experts models on Hugging Face on Friday. According to TechCrunch's coverage, V4-Pro carries 1.6 trillion total parameters with 49 billion activated per token, and V4-Flash carries 284 billion total with 13 billion activated. Both support a one million token context window in the same range as GPT-5.5 and Gemini 3.1 Pro.
The architectural improvements matter for buyers as much as the parameter counts. Bloomberg reported that V4 introduces a new sparse attention design which cuts inference cost at long context lengths. DeepSeek's own technical notes claim V4-Pro uses about 27 percent of the FLOPs and 10 percent of the KV cache that V3.2 needed at 1M tokens. That is not a marginal optimization. Long-context inference has been the most expensive part of agentic and document-heavy workflows, and V4 makes it materially cheaper to run.
Pricing on the hosted API reinforces the disruption. V4-Pro is priced at $1.74 per million input tokens and $3.48 per million output tokens, with cache hits cutting input cost by 90 percent. V4-Flash sits at $0.14 input and $0.28 output. Compared with GPT-5.5's $5 input and $30 output pricing, V4-Pro is roughly one-tenth the cost on output and one-twentieth the cost on a typical generation-heavy workflow.
DeepSeek positions both models as preview builds. Production-ready versions are expected in the coming weeks. The license is MIT, which means commercial use without royalty payments to DeepSeek.
Why the Benchmarks Matter More This Time
DeepSeek has been catching up on benchmarks since R1 in early 2025. The gap to closed-source frontier models has been narrowing every quarter. V4 is the first release where the gap is small enough that frontier-class results are not the deciding factor for many real workloads.
CNBC reported that V4-Pro posts 89.8 on IMOAnswerBench, ahead of Claude Opus 4.7 at 75.3 and Gemini 3.1 Pro at 81.0, with GPT-5.4 leading at 91.4. On coding competition benchmarks, DeepSeek says V4 performance is comparable to GPT-5.4. World knowledge trails Gemini 3.1 Pro by a small margin. Multimodality is the meaningful gap: V4 supports text only, while every closed-source frontier model also handles images, audio, and video.
For a business buyer, the practical read is that on the workflows most companies actually run today, code generation, structured reasoning, document analysis, retrieval-augmented chat, the open-source option is now within striking distance of the closed-source flagship. That was not true six months ago.
Our take: When R1 dropped in January 2025, we wrote in our analysis of the DeepSeek effect on AI budgets that cheap inference would commoditize the middle of the market while the frontier kept earning a premium. V4 partially invalidates that prediction. The frontier still earns a premium for multimodality, agentic surface area, and integrated stacks like the OpenAI superapp. But on raw text reasoning and coding, the premium is now thin enough that buyers can credibly choose open weights and recover the difference.
What V4 Changes for Your AI Vendor Strategy
The release does not invalidate closed-source frontier providers. It does change the leverage businesses have when negotiating, planning capacity, and deciding what to keep in-house. Three shifts deserve immediate attention.
Pricing leverage with closed-source vendors just improved. Before V4, a business renegotiating an OpenAI or Anthropic enterprise contract had limited credible alternatives for genuinely frontier-grade reasoning. After V4, the alternative is real. We are not suggesting threatening to migrate as a tactic. We are suggesting that procurement teams now have a defensible benchmark for what unsubsidized inference at this capability level looks like, and that changes the conversation.
Vendor-diversified architectures are easier to justify. The argument for routing a portion of traffic through an open-weights model has historically been fragile. Operations teams pushed back because the quality gap meant a meaningful drop in customer outcomes. With V4, you can route lower-stakes high-volume traffic, like internal summarization or non-customer-facing classification, to an open model and reserve closed-source frontier calls for tasks that genuinely need them. The same model-agnostic abstraction layer we have argued for since the GPT-5.5 launch is now even more valuable.
Sovereignty and data residency become tractable. Self-hosting a 1.6T parameter model is not trivial, but neither is it research-only territory. Several public clouds and specialized inference providers were live with V4 endpoints within hours of release, which reflects how much demand there is for non-OpenAI, non-Anthropic frontier inference. For regulated industries, public sector buyers, and businesses with legal exposure to cross-border data movement, having an MIT-licensed frontier model removes one of the biggest blockers to AI deployment.
What V4 Does Not Solve
It is easy to read a benchmark table and conclude that closed-source providers are in trouble. They are not, at least not yet, and pretending otherwise is a fast way to make a bad procurement decision.
V4 is text-only. If your workflows involve image understanding, voice, video, or screenshot-driven agentic computer use, you still need GPT-5.5, Claude Opus 4.7, or Gemini 3.1 Pro. The hosted product surface around closed-source models also matters. Codex, Atlas, Claude Code, Workspace Studio, and the integrated agent platforms shipping with each frontier lab give engineers and knowledge workers leverage that a raw API call does not.
Self-hosting is real work. A 1.6T parameter MoE model, even with 49B active, requires multi-GPU clusters to run at production latency. Operating a cluster at scale brings observability, security, evaluation, and governance challenges that the closed-source providers have already solved for you. We covered the broader question in open-source AI models and when free actually beats paid, and the framework still applies. The right cost comparison is not API pricing versus zero. It is API pricing versus the all-in cost of owning the deployment, including the engineering hours.
Performance claims from a model vendor are not the same as production performance on your data. DeepSeek's published benchmarks are credible enough to take seriously. They are not a substitute for running V4 against your own evaluation set on the same workflows where you currently use a closed-source model.
How to Run a V4 Evaluation This Quarter
The teams that will benefit most from V4 are the ones that move quickly without overcommitting. A focused two-week evaluation gets you to a real decision.
-
Pick three workflows you already pay closed-source rates for. A code generation task, a document analysis task, and a long-context retrieval-augmented task is a good starting set. These exercise the dimensions where V4 is strongest.
-
Run V4-Pro and V4-Flash through the DeepSeek API or a third-party host alongside your incumbent. Do not start by self-hosting. Use a hosted endpoint to validate quality fit before you commit infrastructure budget.
-
Score on output quality, latency, and total cost per completed task. Not per token. Per task. Token efficiency varies materially across providers, and headline price is a poor proxy for realized cost.
-
Stress-test on long context if that is part of your workload. V4's claimed FLOP and KV cache reductions at 1M tokens are the most consequential architectural change. Run a 200,000 to 800,000 token retrieval task and measure how it actually behaves, not how the spec sheet says it should.
-
Document the switch cost. If V4 wins on a workflow, what would it take to put it into production? Engineering hours, infrastructure changes, observability gaps, governance requirements. A win that takes six months to operationalize is not the same as a win you can ship in two weeks.
The structural advantage shifts to teams that can answer "yes, we have benchmarked this on our actual workloads" within a few weeks of release. That answer is not free. It requires a recurring vendor evaluation discipline rather than ad-hoc reactions to launches, and the teams that have built that habit are the ones extracting the most leverage from each release cycle.
Key Takeaways
- DeepSeek released V4-Pro (1.6T parameters, 49B activated) and V4-Flash (284B parameters, 13B activated) on April 24, 2026 under an MIT license, with both supporting a one million token context window.
- Benchmark results are within striking distance of GPT-5.5, Claude Opus 4.7, and Gemini 3.1 Pro on coding and reasoning, with V4-Pro leading on IMOAnswerBench at 89.8 ahead of Claude (75.3) and Gemini (81.0).
- API pricing for V4-Pro is roughly one-twentieth of Claude Opus 4.7 on output tokens, with cache hits cutting input cost by approximately 90 percent.
- V4 is text-only and does not match closed-source providers on multimodality, integrated agentic surfaces, or hosted product polish.
- Businesses should benchmark V4 on real workloads, use it for negotiating leverage, and consider it for vendor-diversified architectures and data-sovereignty-sensitive deployments before committing to self-hosting.
The businesses that move early on open-source frontier models will have a meaningful advantage. If you want to be one of them, let's start with a conversation.