Vectrel
HomeOur ApproachProcessServicesWorkBlog
Start
Back to Blog
AI Strategy

Chinese Open-Weight Models Now Run Most of OpenRouter's Traffic: What It Means for Your AI Stack

Chinese open-weight models such as DeepSeek, Kimi, GLM, and MiniMax now process roughly 60 percent of tokens on OpenRouter, while the US share has fallen from about 70 percent to near 30 percent in a year. They are far cheaper to run, but routing data through Chinese endpoints carries jurisdiction and governance risk that businesses must plan for.

VT

Vectrel Team

AI Solutions Architects

Published

July 1, 2026

Reading Time

10 min read

#open-source-ai#ai-models#cost-optimization#enterprise-ai#ai-governance#ai-strategy

Vectrel Journal

Chinese Open-Weight Models Now Run Most of OpenRouter's Traffic: What It Means for Your AI Stack

Chinese open-weight models now process roughly 60 percent of the tokens flowing through OpenRouter, the largest neutral model router, while the share going to US models has collapsed from about 70 percent to near 30 percent in a single year. The shift is driven by pricing that runs 10 to 20 times cheaper than comparable US models, and it is forcing a real question for every business running AI at scale: where should your workloads actually run, and what are you sending across the border to get the savings?

#What the OpenRouter Data Shows

OpenRouter sits between applications and dozens of model providers, so its traffic is a useful proxy for what developers and companies are actually paying to run in production. According to Dataconomy's reporting on the platform data, Chinese models crossed 61 percent of token consumption earlier this year. OpenRouter's own June 2026 analysis frames the same trend as an open-weight takeover led by a handful of Chinese labs.

The trajectory is steep. Chinese open-weight models represented less than 2 percent of weekly tokens on the platform in late 2024. Reporting on the token-share collapse places the crossover point in the week of February 9, 2026, when Chinese models processed 4.12 trillion tokens against 2.94 trillion for US models, and describes a gap that has kept widening since. On the other side of the ledger, industry coverage of the US decline documents American model share falling from roughly 70 percent to 30 percent over twelve months.

The names doing the work are consistent across sources: DeepSeek, Kimi from Moonshot AI, GLM from Zhipu, and MiniMax. These are not research curiosities. They are the default choice for a growing share of coding agents and automation pipelines.

#Why the Shift Is Happening Now

Cost is the headline. Open-weight Chinese models frequently price an order of magnitude below US frontier models on a per-token basis. Coverage of the trend cites providers pushing input pricing to fractions of a cent per million tokens, a level that changes the unit economics of any high-volume workload. When an agent burns millions of tokens per task, a 10x price difference is the difference between a viable product and one that never ships.

Capability caught up. A year ago the tradeoff was cheap-but-weaker. That gap has largely closed on the benchmarks that matter to builders, particularly coding and agentic task completion. Coverage of the coding gains notes that these models are now genuinely competitive on the work developers push through them most.

Agentic workloads amplify everything. The rise of long-running agents means token volume is exploding, and cost sensitivity rises with it. High volume plus permissive open-weight licenses plus low price is exactly the combination that pulls traffic toward these models.

Our take: This is not the DeepSeek pricing shock of early 2025 repeating itself. This is that shock becoming the baseline. We wrote about the original moment in what the DeepSeek effect means for your AI budget; the difference now is that the cheap tier is no longer a bargain-basement alternative, it is where the majority of the world's tokens are being processed. Any budget model that still treats US frontier pricing as the reference point is anchored to a market that has moved.

#The Catch: Volume Is Not the Same as Value

Here is the nuance most headlines miss. Winning token volume is not the same as winning revenue. One investor-focused analysis of the same OpenRouter data highlights a market splitting in two: a commodity layer where the cheapest capable model wins the bulk of raw tokens, and a premium layer where US frontier labs still capture a disproportionate share of spending because businesses pay up for their highest-stakes work.

For a decision maker, that framing is more useful than the raw share number. It tells you the market is not choosing a single winner. It is segmenting by workload. The practical implication is that your AI stack should probably segment the same way, rather than standardizing on one provider for everything.

#The Data Governance Question You Cannot Skip

The cost case is straightforward. The risk case requires more care. The critical distinction is how you access these models, not which flag flies over the lab that trained them.

Sending a prompt to a Chinese provider's own hosted API means that prompt is processed under Chinese jurisdiction. Reporting on the trend notes that data routed through those endpoints falls under China's National Intelligence Law, which creates real exposure if prompts contain proprietary code, customer records, or internal documents. That is a governance fact to plan around, not a talking point, and it is a question your security and legal teams should answer before a single production workload ships.

The escape hatch is the same property that made these models popular: they are open weight. Because the parameters are downloadable under permissive MIT or Apache 2.0 licenses, you have options the closed US models do not offer:

  • Self-host the model on your own infrastructure so no prompt ever leaves your environment.
  • Route through a Western inference provider that serves the same open weights from US or EU data centers.
  • Reserve the hosted Chinese API for low-sensitivity, high-volume tasks where the data exposure is acceptable.

Self-hosting is not a free lunch, and the infrastructure math is real. Serving a frontier-scale open-weight model can require on the order of hundreds of gigabytes of GPU memory, meaning a cluster of high-end accelerators rather than a single card. Teams that want the cost and control benefits without the jurisdiction risk typically need production-grade inference infrastructure and serving pipelines in place before the savings actually materialize. The license is free; the operational capability to run it well is not.

There is a second, softer governance dimension worth flagging. Independent testing has repeatedly found that these models decline or steer responses on politically sensitive topics tied to their country of origin. For most business workloads that is irrelevant, but for media, research, or public-sector applications it is a content-behavior variable to test for, not assume away.

#How to Think About This in Practice

The worst response to this data is to pick a side. Standardizing entirely on cheap Chinese models exposes you to jurisdiction risk and vendor concentration in a different flag. Refusing to touch them on principle means leaving large, real savings on the table while your competitors bank them. The mature move is a routing strategy, and choosing the right model for each job is a discipline we broke down in choosing the right AI model for your business.

A workable framework has three tiers:

  1. High-volume, low-sensitivity tasks (bulk classification, draft generation, internal tooling). Strong candidates for open-weight models, self-hosted or served from a Western provider to control cost without the data exposure.
  2. Sensitive or regulated data. Keep it in your jurisdiction. That means self-hosted open weights or a US frontier model with contractual data protections, never the hosted Chinese API.
  3. Highest-stakes reasoning and customer-facing quality. This is where paying a premium for a US frontier model still tends to pay for itself, and where the revenue data suggests most enterprises already spend.

This only works if you know which of your workloads fall into which tier, and that requires knowing your data. Companies that have not mapped where sensitive information flows through their AI systems cannot route safely, which is one more reason the readiness problem we described in why your data is not AI-ready keeps showing up as the real bottleneck.

#Common Mistakes to Avoid

Confusing the model's origin with your data's destination. An open-weight Chinese model self-hosted in your own cloud does not send anything to China. The risk lives in the endpoint you call, not the parameters you run.

Chasing the cheapest per-token price in isolation. Factor in self-hosting infrastructure, engineering time, and evaluation overhead. The sticker price and the total cost of ownership can diverge sharply for models you run yourself.

Standardizing on one provider for everything. Capability parity across the market makes single-vendor lock-in an avoidable risk. Build for substitution.

Skipping the legal and security review. Do not let a fast-moving engineering team wire a hosted foreign API into a production data path before your compliance function has weighed in. Retrofitting governance is far more expensive than designing for it.

#Key Takeaways

  • Chinese open-weight models now process roughly 60 percent of OpenRouter tokens, up from under 2 percent in late 2024, while the US share fell from about 70 percent to near 30 percent.
  • The driver is cost: these models often run 10 to 20 times cheaper than US frontier models while staying competitive on coding and agent benchmarks.
  • Token volume is not revenue. The market is splitting into a cheap commodity layer and a premium layer where US labs still capture disproportionate spend.
  • Data routed through a Chinese provider's hosted API falls under Chinese jurisdiction, but because the models are open weight you can self-host or use a Western inference provider to avoid that exposure.
  • The right answer for most businesses is a tiered routing strategy by workload sensitivity and volume, not an all-or-nothing bet.

The businesses that move early on a disciplined multi-model routing strategy will have a meaningful cost and flexibility advantage. If you want to be one of them, let's start with a conversation.

FAQs

Frequently asked questions

What share of OpenRouter traffic do Chinese AI models have?

Chinese open-weight models now process roughly 60 percent of the tokens routed through OpenRouter, the largest neutral model router, up from under 2 percent in late 2024. Over the same period the share going to US models fell from about 70 percent to near 30 percent.

Why are businesses switching to Chinese open-weight models?

Cost is the main driver. Open-weight models from DeepSeek, Kimi, GLM, and MiniMax often price 10 to 20 times cheaper than comparable US models while scoring competitively on coding and agent benchmarks. For high-volume agentic workloads where token usage is enormous, the savings are hard to ignore.

Is it safe to send business data to Chinese AI models?

It depends on how you route it. Prompts sent to a Chinese provider's own API are processed under Chinese jurisdiction, which raises exposure for proprietary code or customer data. Because these models are open weight, you can self-host or use a Western inference provider to keep data in your own jurisdiction.

What is the difference between open-weight and open-source AI models?

Open-weight means the trained model parameters are downloadable and runnable on your own hardware, usually under a permissive license. It does not require the training data or full training code to be released. Most leading Chinese models ship open weights under MIT or Apache 2.0 licenses.

Should enterprises replace US models with Chinese ones?

Rarely wholesale. Most mature teams route by workload: cheaper open-weight models for high-volume, low-sensitivity tasks, and US frontier models for sensitive or highest-stakes work. The right split depends on your cost profile, data sensitivity, and compliance obligations, not on leaderboard position alone.

Share

Pass this article to someone building with AI right now.

Article Details

VT

Vectrel Team

AI Solutions Architects

Published
July 1, 2026
Reading Time
10 min read

Share

XLinkedIn

Continue Reading

Related posts from the Vectrel journal

AI Strategy

AI Model Distillation Attacks: What the Anthropic and Alibaba Dispute Means for Your AI Vendors

Anthropic says Alibaba ran the largest known distillation attack on Claude. Here is what model theft means for enterprise AI vendor risk and due diligence.

June 27, 20269 min read
AI Strategy

DeepSeek V4 Closes the Frontier Gap: What an Open-Source 1.6T Model Means for Your AI Strategy

DeepSeek V4 launched April 24, 2026 with frontier-class benchmarks, a 1M token context, and an MIT license. Here is what open-source parity means for AI buyers.

April 25, 202610 min read
AI Strategy

Claude Sonnet 5: What Near-Frontier Agents at Mid-Tier Prices Mean for Your Business

Claude Sonnet 5 matches the pricier Opus 4.8 on several agentic tasks at up to 60% less cost. Here is what it changes for running AI agents in production.

July 3, 20269 min read

Next Step

Ready to put these ideas into practice?

Every Vectrel project starts with a conversation about where your systems, data, and team are today.

Book a Discovery Call
Vectrel

Custom AI integrations built into your existing business infrastructure. From strategy to deployment.

Navigation

  • Home
  • Our Approach
  • Process
  • Services
  • Work
  • Blog
  • Start
  • Careers

Services

  • AI Strategy & Consulting
  • Custom AI Development
  • Full-Stack Web & SaaS
  • Workflow Automation
  • Data Engineering
  • AI Training & Fine-Tuning
  • Ongoing Support

Legal

  • Privacy Policy
  • Terms of Service
  • Applicant Privacy Notice
  • Security & Trust

© 2026 Vectrel. All rights reserved.