On April 23, 2026, OpenAI launched GPT-5.5, its first fully retrained base model since GPT-4.5. The capability jump is real, but the strategic story is bigger. GPT-5.5 is the engine for an OpenAI superapp that merges ChatGPT, Codex, and the Atlas browser agent into one product. That changes the buying decision for business AI.
What OpenAI Actually Shipped
OpenAI rolled out GPT-5.5 to Plus, Pro, Business, and Enterprise users in ChatGPT and Codex on April 23, with GPT-5.5 Pro reserved for the paid tiers. According to OpenAI's own announcement, GPT-5.5 is "our smartest, most intuitive model yet," with the largest gains in agentic coding, computer use, knowledge work, and early scientific research.
API pricing is $5 per million input tokens and $30 per million output tokens, with a one million token context window and a 400,000 token window in Codex. According to The Decoder, this is exactly double the cost of GPT-5.4. GPT-5.5 Pro is priced at $30 and $180 per million tokens for the highest accuracy tier.
The benchmark numbers from OpenAI and third-party trackers tell a consistent story. GPT-5.5 sits at the top of the Artificial Analysis Intelligence Index with a score of 60, ahead of Claude Opus 4.7 and Gemini 3.1 Pro Preview at 57. It posts 90.1 percent on BrowseComp, 52.4 percent on FrontierMath Tier 1 through 3, and 82.7 percent on Terminal-Bench 2.0 for agentic coding. Claude Opus 4.7 still edges it on SWE-Bench Pro, 64.3 percent to 58.6 percent, which is worth remembering before you declare a single winner.
The Superapp Bet in Plain English
The more important announcement is not the benchmark table. It is what OpenAI is building around the model.
In TechCrunch's coverage, president Greg Brockman framed GPT-5.5 as "another big step along the path" to a superapp, a unified interface where ChatGPT, Codex, and the Atlas browser agent share context and act together. Alongside GPT-5.5, OpenAI shipped substantial Codex upgrades including browser control, Sheets and Slides generation, Docs and PDF handling, OS-wide dictation, and an auto-review mode that iterates on its own work.
Fortune and Microsoft's Azure team both confirmed parallel enterprise availability through Microsoft Foundry. That means the superapp vision is not only aimed at consumers. OpenAI wants the same merged experience inside the enterprise, delivered alongside the Azure-hosted versions that IT teams already procure.
Our take: This is the same direction of travel we described two weeks ago when Anthropic started moving into application categories like design and website building. The frontier labs no longer want to be wholesale token vendors. They want to own the desktop, the browser, and the workflow. GPT-5.5 is the first OpenAI model built explicitly for that job.
Why Pricing Doubled (and Why It Might Not Hurt as Much as It Looks)
A 100 percent price hike between generations is striking. It reads like a confidence move, and it probably is. OpenAI has chosen to stop racing to the bottom on per-token pricing and start charging for intelligence density.
The blunt numbers understate the story. OpenAI reports that GPT-5.5 uses roughly 40 percent fewer output tokens than GPT-5.4 to deliver the same result on typical workflows, which brings the realistic cost increase closer to 20 percent. Microsoft's Foundry blog added that at "medium" reasoning effort, GPT-5.5 matches Claude Opus 4.7 on several enterprise tasks at roughly a quarter of the cost. The takeaway for buyers is that headline API prices are a poor proxy for workload cost. The actual number depends on effort level, prompt shape, and how often you need to retry.
The broader signal is that the DeepSeek-style price collapse most enterprise finance teams expected to continue through 2026 is not arriving at the frontier. We made this point in our analysis of the DeepSeek effect on AI budgets, and it has held up. Cheaper open-source models keep commodity inference affordable. Frontier intelligence keeps getting more expensive as capabilities expand.
What GPT-5.5 Changes for Your Stack
The model launch alone would not reshape a roadmap. The superapp wrapper around it starts to. Three shifts matter for business AI buyers.
The integrated stack option is now real on OpenAI's side. For the past year, choosing an integrated frontier-lab stack meant mostly choosing Google, which has had Gemini embedded in Workspace for some time. Anthropic's Claude Code and Managed Agents pushed in the same direction. With GPT-5.5, Codex, and Atlas merging into one session, OpenAI is offering a comparable bundle. Businesses now have three credible integrated stacks to evaluate, not one or two.
Model choice is less stable than it was six months ago. Between March and April 2026, the leaderboard across Artificial Analysis, SWE-Bench Pro, and FrontierMath has moved at least four times. No single provider has held the top spot for long. That changes how you should architect. Hard-coding a specific provider into prompt templates, tool definitions, or evaluation pipelines is a tax you will pay every six to ten weeks when the ranking shifts. If you want a longer treatment of the tradeoffs, our comparison of Claude, GPT, Gemini, and DeepSeek is still the right frame.
Agentic workflows now depend on product features, not just model quality. GPT-5.5's Terminal-Bench 2.0 score of 82.7 percent is impressive, but it is only useful if the surrounding surface, Codex, Atlas, and the API, actually lets the agent touch the work. The winners of 2026 are not the models with the highest benchmark scores. They are the products where model, tools, memory, and observability compose cleanly. That is exactly the argument we made about what the model context protocol is solving earlier in the year.
How to Run a Real GPT-5.5 Evaluation
Resist the temptation to read the announcement, extrapolate from the demo, and move two teams over. Benchmark numbers are an entry ticket, not a buying signal.
-
Pick three workflows that already cost you real money. An agentic coding task, a knowledge-work workflow like weekly reporting, and one domain-specific process. These are the only tests that matter. Do not use toy prompts.
-
Run GPT-5.5, Claude Opus 4.7, and Gemini 3.1 Pro against the same prompts. Capture output quality, time to completion, and total token cost at matched effort levels. Rank each model per workflow. You will almost certainly find no single winner across all three.
-
Separately evaluate the product surface. Can your team actually use Codex plus Atlas inside their day? Does it play with your existing IDE, ticketing, and review workflow? A model that wins on paper but slows engineers down in practice is not the right choice.
-
Model your real cost, not the headline price. Replicate each task ten times and compute the realized cost per output. Include retries, refusals, and tool calls. This is where the "20 percent net increase" narrative either holds up or does not for your use case.
-
Write down what switching would cost. If you standardize on GPT-5.5 and the integrated OpenAI stack today, how many engineering days to swap to Claude or Gemini in 2027? If you cannot answer in a paragraph, build a thin abstraction layer before you commit.
A disciplined evaluation loop like this is the difference between a stack you understand and a stack you inherited. Teams that adopt structured vendor evaluations across the frontier model landscape as a recurring quarterly discipline make noticeably better stack decisions than teams that evaluate once and coast.
What Not to Do
Do not treat GPT-5.5 as a drop-in upgrade and skip the regression pass. The release notes emphasize token efficiency, which means prompt behavior has shifted. Prompts that were precisely tuned for GPT-5.4 can produce different answers. Run your eval suite before flipping traffic.
Do not commit to the full superapp bundle before running real pilots. Codex, Atlas, and the merged ChatGPT experience are powerful, but they also deepen lock-in. If you standardize your development workflow on Codex, switching providers in 18 months will cost more than switching a raw API call.
Do not assume OpenAI has won. GPT-5.5 leads on aggregate intelligence today. Claude Opus 4.7 still wins SWE-Bench Pro. Gemini 3.1 Pro wins other niches. The right portfolio view is that frontier providers leapfrog each other every few weeks, not that any of them is permanently ahead. The same discipline we describe in our build vs. buy framework applies here: decide based on the workload, not on the announcement.
Key Takeaways
- On April 23, 2026, OpenAI released GPT-5.5, its first fully retrained base model since GPT-4.5, with a one million token context window and a 400,000 token window in Codex.
- GPT-5.5 tops the Artificial Analysis Intelligence Index at 60, ahead of Claude Opus 4.7 and Gemini 3.1 Pro Preview at 57, though Claude still wins on SWE-Bench Pro.
- API pricing doubled to $5 per million input tokens and $30 per million output tokens, with OpenAI claiming a roughly 40 percent reduction in output tokens keeps the net cost increase near 20 percent.
- The real shift is the superapp strategy: GPT-5.5 is the foundation model for a unified ChatGPT, Codex, and Atlas browser experience, bringing OpenAI into direct competition with Google and Anthropic at the integrated stack layer.
- Businesses should benchmark all three frontier stacks on real workflows, maintain model-agnostic abstractions, and document explicit exit plans before committing to any single integrated vendor.
The businesses that move early on integrated AI stack decisions will have a meaningful advantage. If you want to be one of them, let's start with a conversation.