Chinese open-weight models now process roughly 60 percent of the tokens flowing through OpenRouter, the largest neutral model router, while the share going to US models has collapsed from about 70 percent to near 30 percent in a single year. The shift is driven by pricing that runs 10 to 20 times cheaper than comparable US models, and it is forcing a real question for every business running AI at scale: where should your workloads actually run, and what are you sending across the border to get the savings?
What the OpenRouter Data Shows
OpenRouter sits between applications and dozens of model providers, so its traffic is a useful proxy for what developers and companies are actually paying to run in production. According to Dataconomy's reporting on the platform data, Chinese models crossed 61 percent of token consumption earlier this year. OpenRouter's own June 2026 analysis frames the same trend as an open-weight takeover led by a handful of Chinese labs.
The trajectory is steep. Chinese open-weight models represented less than 2 percent of weekly tokens on the platform in late 2024. Reporting on the token-share collapse places the crossover point in the week of February 9, 2026, when Chinese models processed 4.12 trillion tokens against 2.94 trillion for US models, and describes a gap that has kept widening since. On the other side of the ledger, industry coverage of the US decline documents American model share falling from roughly 70 percent to 30 percent over twelve months.
The names doing the work are consistent across sources: DeepSeek, Kimi from Moonshot AI, GLM from Zhipu, and MiniMax. These are not research curiosities. They are the default choice for a growing share of coding agents and automation pipelines.
Why the Shift Is Happening Now
Cost is the headline. Open-weight Chinese models frequently price an order of magnitude below US frontier models on a per-token basis. Coverage of the trend cites providers pushing input pricing to fractions of a cent per million tokens, a level that changes the unit economics of any high-volume workload. When an agent burns millions of tokens per task, a 10x price difference is the difference between a viable product and one that never ships.
Capability caught up. A year ago the tradeoff was cheap-but-weaker. That gap has largely closed on the benchmarks that matter to builders, particularly coding and agentic task completion. Coverage of the coding gains notes that these models are now genuinely competitive on the work developers push through them most.
Agentic workloads amplify everything. The rise of long-running agents means token volume is exploding, and cost sensitivity rises with it. High volume plus permissive open-weight licenses plus low price is exactly the combination that pulls traffic toward these models.
Our take: This is not the DeepSeek pricing shock of early 2025 repeating itself. This is that shock becoming the baseline. We wrote about the original moment in what the DeepSeek effect means for your AI budget; the difference now is that the cheap tier is no longer a bargain-basement alternative, it is where the majority of the world's tokens are being processed. Any budget model that still treats US frontier pricing as the reference point is anchored to a market that has moved.
The Catch: Volume Is Not the Same as Value
Here is the nuance most headlines miss. Winning token volume is not the same as winning revenue. One investor-focused analysis of the same OpenRouter data highlights a market splitting in two: a commodity layer where the cheapest capable model wins the bulk of raw tokens, and a premium layer where US frontier labs still capture a disproportionate share of spending because businesses pay up for their highest-stakes work.
For a decision maker, that framing is more useful than the raw share number. It tells you the market is not choosing a single winner. It is segmenting by workload. The practical implication is that your AI stack should probably segment the same way, rather than standardizing on one provider for everything.
The Data Governance Question You Cannot Skip
The cost case is straightforward. The risk case requires more care. The critical distinction is how you access these models, not which flag flies over the lab that trained them.
Sending a prompt to a Chinese provider's own hosted API means that prompt is processed under Chinese jurisdiction. Reporting on the trend notes that data routed through those endpoints falls under China's National Intelligence Law, which creates real exposure if prompts contain proprietary code, customer records, or internal documents. That is a governance fact to plan around, not a talking point, and it is a question your security and legal teams should answer before a single production workload ships.
The escape hatch is the same property that made these models popular: they are open weight. Because the parameters are downloadable under permissive MIT or Apache 2.0 licenses, you have options the closed US models do not offer:
- Self-host the model on your own infrastructure so no prompt ever leaves your environment.
- Route through a Western inference provider that serves the same open weights from US or EU data centers.
- Reserve the hosted Chinese API for low-sensitivity, high-volume tasks where the data exposure is acceptable.
Self-hosting is not a free lunch, and the infrastructure math is real. Serving a frontier-scale open-weight model can require on the order of hundreds of gigabytes of GPU memory, meaning a cluster of high-end accelerators rather than a single card. Teams that want the cost and control benefits without the jurisdiction risk typically need production-grade inference infrastructure and serving pipelines in place before the savings actually materialize. The license is free; the operational capability to run it well is not.
There is a second, softer governance dimension worth flagging. Independent testing has repeatedly found that these models decline or steer responses on politically sensitive topics tied to their country of origin. For most business workloads that is irrelevant, but for media, research, or public-sector applications it is a content-behavior variable to test for, not assume away.
How to Think About This in Practice
The worst response to this data is to pick a side. Standardizing entirely on cheap Chinese models exposes you to jurisdiction risk and vendor concentration in a different flag. Refusing to touch them on principle means leaving large, real savings on the table while your competitors bank them. The mature move is a routing strategy, and choosing the right model for each job is a discipline we broke down in choosing the right AI model for your business.
A workable framework has three tiers:
- High-volume, low-sensitivity tasks (bulk classification, draft generation, internal tooling). Strong candidates for open-weight models, self-hosted or served from a Western provider to control cost without the data exposure.
- Sensitive or regulated data. Keep it in your jurisdiction. That means self-hosted open weights or a US frontier model with contractual data protections, never the hosted Chinese API.
- Highest-stakes reasoning and customer-facing quality. This is where paying a premium for a US frontier model still tends to pay for itself, and where the revenue data suggests most enterprises already spend.
This only works if you know which of your workloads fall into which tier, and that requires knowing your data. Companies that have not mapped where sensitive information flows through their AI systems cannot route safely, which is one more reason the readiness problem we described in why your data is not AI-ready keeps showing up as the real bottleneck.
Common Mistakes to Avoid
Confusing the model's origin with your data's destination. An open-weight Chinese model self-hosted in your own cloud does not send anything to China. The risk lives in the endpoint you call, not the parameters you run.
Chasing the cheapest per-token price in isolation. Factor in self-hosting infrastructure, engineering time, and evaluation overhead. The sticker price and the total cost of ownership can diverge sharply for models you run yourself.
Standardizing on one provider for everything. Capability parity across the market makes single-vendor lock-in an avoidable risk. Build for substitution.
Skipping the legal and security review. Do not let a fast-moving engineering team wire a hosted foreign API into a production data path before your compliance function has weighed in. Retrofitting governance is far more expensive than designing for it.
Key Takeaways
- Chinese open-weight models now process roughly 60 percent of OpenRouter tokens, up from under 2 percent in late 2024, while the US share fell from about 70 percent to near 30 percent.
- The driver is cost: these models often run 10 to 20 times cheaper than US frontier models while staying competitive on coding and agent benchmarks.
- Token volume is not revenue. The market is splitting into a cheap commodity layer and a premium layer where US labs still capture disproportionate spend.
- Data routed through a Chinese provider's hosted API falls under Chinese jurisdiction, but because the models are open weight you can self-host or use a Western inference provider to avoid that exposure.
- The right answer for most businesses is a tiered routing strategy by workload sensitivity and volume, not an all-or-nothing bet.
The businesses that move early on a disciplined multi-model routing strategy will have a meaningful cost and flexibility advantage. If you want to be one of them, let's start with a conversation.