Enterprises are abandoning tokenmaxxing, the practice of maximizing AI token consumption as a stand-in for productivity, in favor of efficiency and measurable returns. After high-profile budget blowups and a startup that cut inference costs roughly 90 percent by changing models, model routing and cost discipline have become central to AI strategy in mid-2026.
What Is Tokenmaxxing, and Why Is It Ending?
For most of the past two years, many companies operated on a simple assumption: the more AI tokens your teams and agents burned, the more productive they must be. Some employers actively incentivized maximum usage, and engineering teams treated token counts as a leaderboard metric. Fortune reported that this culture, nicknamed tokenmaxxing, treated consumption as a proxy for output.
The problem is that token volume measures inputs, not results. The same hundreds of millions of tokens can represent a hard research task done well or an agent running in circles. As bills climbed and the link to business value stayed fuzzy, the model started to break.
That break became visible in late June 2026. CNBC reported that OpenAI and Anthropic now face a new reality as their largest enterprise customers shift from racing to burn tokens toward tightening budgets and demanding measurable returns. According to D.A. Davidson analyst Gil Luria, cited in that reporting, some of those big customers may begin limiting out-of-control token spend.
The Companies Pumping the Brakes
The clearest signal came from Uber. The company capped employee AI spending at 1,500 dollars per month on agentic coding tools after exhausting its entire 2026 AI budget in roughly four months, with about 5,000 engineers pushing token consumption past projections. Uber COO Andrew Macdonald said the company has struggled to draw a direct line between AI spend and the features it ships to users.
Uber was not alone. Reporting on the broader pullback noted that Microsoft cancelled Claude Code subscriptions for employees in several product divisions, and Meta took down the informal tokenmaxxing leaderboard its staff had built. Walmart, Amazon, and Cisco have introduced their own controls.
The most dramatic example came from AI agent startup Lindy. Its CEO, Flo Crivello, switched 100 percent of the company's traffic from Anthropic's Claude to DeepSeek, the Chinese maker of cheaper open-weight models. He said the move cut inference costs on migrated routes by about 90 percent, would save the roughly 25-person company millions of dollars, and was a matter of survival because AI costs had exceeded payroll. Notably, he also reported seeing performance increase on many core use cases.
Why Cheaper Did Not Always Mean Worse
The Lindy result cuts against a comfortable assumption: that the most expensive frontier model is always the right one. In reality, a large share of enterprise AI work is routine. Summarizing a ticket, drafting a reply, classifying a document, or extracting fields from a form does not require the most powerful reasoning model on the market. Yet CNBC's reporting noted that roughly 95 percent of enterprise AI usage still runs on frontier models.
That gap between what tasks need and what they run on is where the money leaks. The DeepSeek price disruption we covered in our analysis of what cheaper models mean for your AI budget was an early warning. The tokenmaxxing correction is the moment that warning turned into operational policy at large companies.
Our take: The lesson is not that frontier models are overrated. They remain the best choice for genuinely hard reasoning, novel code, and high-stakes output. The lesson is that running every request through a frontier model is like sending a freight truck to deliver a single envelope. Efficiency comes from matching the task to the right vehicle.
What the Efficiency Shift Means for Your Business
If you are a mid-market or growing company, this moment is good news. The frontier labs spent the last two years optimizing for the assumption that customers would pay for maximum consumption. That assumption is now being repriced in your favor, and you do not need a 5,000-engineer org to benefit.
The companies pulling ahead are the ones treating AI spend like any other major operating cost: instrumented, budgeted, and tied to outcomes. That means knowing which workflows consume the most tokens, which actually move a business metric, and which are quietly burning money. This is the same discipline we described in our framework for measuring AI ROI, now applied at the level of individual model calls.
Building a layer that sends routine calls to a cheap or open-weight model and escalates only hard tasks to a frontier model requires a deliberate model-orchestration architecture rather than a single hardcoded API key. The payoff is that you can change providers, add a cheaper tier, or cap spend without rewriting your application every time the market moves.
How to Move From Tokenmaxxing to Token Discipline
- Measure before you cut. Instrument your AI usage so you can see spend by workflow, team, and model. You cannot right-size what you cannot see, and blanket caps tend to throttle the high-value work along with the waste.
- Right-size models per task. Audit which workflows run on frontier models out of habit. Test cheaper and open-weight options against your real data, the way you would choose a model for a specific use case rather than by reputation.
- Add a routing layer. Default routine requests to an efficient model and escalate only the hard cases. This protects quality where it matters while cutting the long tail of unnecessary frontier calls.
- Tie usage to outcomes. Replace token-count leaderboards with metrics that track shipped features, resolved tickets, or revenue influenced. Spend follows whatever you measure, so measure the right thing.
- Keep optionality. Avoid architectures that lock you into one provider. As Crivello noted, he would switch back if prices fall. The teams that can move quickly capture every future price cut.
Common Mistakes to Avoid
The biggest mistake is overcorrecting. Tokenmaxxing was wasteful, but a hard freeze on AI spend can stall the projects that were actually working. The goal is precision, not austerity.
A second mistake is treating model choice as a one-time decision. Pricing, capability, and the cheap-tier landscape are shifting monthly. A model that was the obvious choice in January may be the wrong default by summer.
A third mistake is cutting cost without measuring quality. A cheaper model that quietly degrades your customer experience is not a saving; it is a deferred cost. Always test against real workloads and watch the output, not just the invoice.
Key Takeaways
- Tokenmaxxing, the practice of maximizing AI token consumption as a productivity proxy, is being abandoned by major enterprises in favor of efficiency and measurable returns.
- Uber capped employee AI spending at 1,500 dollars per month after burning its 2026 budget in four months, and Microsoft and Meta have pulled back internal AI usage incentives.
- Startup Lindy cut inference costs about 90 percent by moving from Claude to DeepSeek while reporting better performance on core tasks.
- Roughly 95 percent of enterprise AI usage still runs on frontier models, leaving large savings available through model routing and right-sizing.
- The winning approach is deliberate spend: measure usage, match tasks to the right model, route intelligently, and tie cost to business outcomes.
The businesses that move early on AI cost efficiency will have a meaningful advantage as frontier pricing resets. If you want to be one of them, let's start with a conversation.