On May 5, 2026, the Center for AI Standards and Innovation (CAISI) at the Department of Commerce signed pre-deployment testing agreements with Google DeepMind, Microsoft, and xAI. Combined with earlier partnerships with Anthropic and OpenAI, the US government now reviews frontier models from every major American AI lab before those models reach the public. For enterprise buyers, that changes the texture of vendor diligence in ways most procurement processes have not caught up to.
What CAISI Just Signed
CAISI is housed inside the National Institute of Standards and Technology and was rebadged from the previous AI Safety Institute under the Trump administration. According to the official NIST announcement, the new agreements with Google DeepMind, Microsoft, and xAI cover pre-deployment evaluations of frontier models, follow-on post-deployment assessments, and joint research on AI security. Earlier 2024 partnerships with Anthropic and OpenAI have been renegotiated into updated memoranda of understanding to align with Commerce Secretary Howard Lutnick's directives and the broader America's AI Action Plan.
The structural detail that matters: CAISI now serves as the primary point of contact between industry and government for AI model testing, collaborative research, and best practice development. As CNBC reported, the deals build on a steady drumbeat of executive-branch attention to AI safety review since the start of the year.
What CAISI Actually Tests
The scope of evaluation is narrower than most people assume, and that is important to understand before drawing procurement conclusions.
CAISI's mandate covers three core national security risk areas: cybersecurity (model misuse for offensive cyber operations), biosecurity (assisting in the development of dangerous pathogens), and chemical weapons (uplift in producing chemical agents). Per The Hill, the agency also evaluates foreign AI systems for backdoors and covert malicious behavior, which gives the agreement a defensive industrial-policy dimension on top of safety review.
To get accurate readings of true model capabilities, developers frequently provide CAISI with versions where safeguards have been reduced or removed. As detailed in the NIST release, evaluators from across government can participate via the CAISI-convened TRAINS Taskforce, an interagency group focused on AI national security concerns. Testing can also occur in classified environments where appropriate.
The track record is more substantial than the May 5 announcement suggests. CAISI has already completed more than 40 evaluations, some of them on models that have not yet been publicly released. The quiet years of "AI Safety Institute" work were not idle.
Why This Matters Beyond Washington
For enterprise buyers, three things changed on May 5.
Every major US frontier model is now reviewed before launch. Six months ago, you could plausibly argue that government testing was a partial coverage problem. Now Anthropic, OpenAI, Google DeepMind, Microsoft, and xAI all participate. If your shortlist is anchored on US frontier providers, every model on it has been or will be touched by the same evaluation regime. That is a useful baseline for procurement narratives, and it is also a flattening of one differentiator some vendors used to claim.
Foreign model evaluation is now formalized. CAISI's explicit charter to assess foreign AI systems for "backdoors and other covert malicious behavior" gives the government a structured mechanism to flag specific Chinese or other non-US models. Enterprises that quietly added DeepSeek or Qwen-class models to evaluation pipelines should expect more guidance, not less, and possibly formal advisories. Document your foreign-model usage in the same registry where you track your other AI vendors.
Pre-deployment access creates an information asymmetry the market should price in. The government sees frontier model capabilities and risk profiles before customers do. That is a deliberate design choice, and it is unlikely to be reversed. Buyers can no longer assume their own pre-launch evaluation is the most rigorous look at a new model. Building a practical AI governance framework that accepts this asymmetry, rather than pretending it does not exist, is the realistic path forward.
What CAISI Does Not Do
This is where vendor sales decks will overreach, and where buyers need to push back.
CAISI evaluates national security tail risks. It is not measuring whether a model is fit for your specific business workload. It does not assess hallucination rates on your content domain, accuracy on your customer data, bias on your hiring pipeline, or stability on your code base. Models that pass CAISI cyber, bio, and chemical screens can still fail spectacularly in business deployments for reasons that are not on the agency's radar.
CAISI also does not currently publish detailed results on individual models. Its outputs are framed as research and feedback to the developer, not consumer-facing safety ratings. Treating "CAISI participation" as a quality signal is reasonable. Treating it as a CAISI-issued seal of approval is not, because no such seal exists.
A separate gap matters for procurement. As Stanford's 2026 AI Index documented, the Foundation Model Transparency Index dropped from 58 to 40 in a single year, with frontier vendors disclosing less about training data, training duration, and evaluation methodology. Government testing partially compensates for that by giving regulators visibility, but it does not give buyers any new disclosure rights. Your contracts still have to do that work.
How to Fold CAISI Into Your Vendor Process
Three concrete adjustments are warranted this quarter.
-
Add a single line to your AI vendor evaluation template. Ask whether the model on offer was subject to CAISI pre-deployment review, and request whatever summary the vendor is willing to share. Asking the question signals procurement maturity. The answer, even when it is "we cannot share specifics," tells you something about how the vendor handles security inquiries.
-
Update your procurement standard to address foreign frontier models explicitly. If you use or evaluate non-US frontier models, document the use case, data flow, and exit plan. A CAISI advisory or a Treasury or Commerce action against a specific provider will move faster than your ability to reorganize a production workload around a swap.
-
Hold the line on contractual disclosures. CAISI participation does not substitute for what your AI vendor should be telling you under contract: material model changes, evaluation methodology, data handling, incident response, and known-failure-mode reporting. The same minimum disclosure floor we recommended in our coverage of AI regulation applies. Government testing complements vendor disclosure. It does not replace it.
Common Mistakes to Avoid
Treating CAISI as a quality stamp. It is a national security review, not a product certification. A model can clear CAISI and still hallucinate, leak data, or fail your specific use case.
Assuming voluntary stays voluntary. The current regime is voluntary by design, but every major US lab now participates and the political alignment is broadly bipartisan. That trajectory tends to produce mandatory regimes within two to three years, not five to ten. Build your procurement process to handle a future where this becomes a requirement.
Ignoring the foreign model signal. If CAISI starts publishing formal advisories on specific foreign models, the lag between advisory and procurement action will determine your exposure. A documented inventory shortens that lag.
Skipping internal communication. Your security, legal, and procurement teams probably each saw the news in different feeds. Consolidate the internal narrative now, so you are not negotiating priorities mid-incident if a CAISI finding lands on a model you depend on.
Key Takeaways
- CAISI signed pre-deployment testing agreements with Google DeepMind, Microsoft, and xAI on May 5, 2026, expanding earlier partnerships with Anthropic and OpenAI.
- Every major US frontier AI lab now participates in government pre-release evaluation under updated MOUs aligned with America's AI Action Plan.
- CAISI focuses on cybersecurity, biosecurity, chemical weapons, and foreign-system malicious behavior, not general business-fit risks.
- The agency has already completed more than 40 evaluations, including on unreleased models, and serves as the government's primary point of contact for AI testing.
- For enterprise buyers, CAISI participation is a useful procurement signal but does not substitute for contractual disclosures or your own model evaluation.
Not sure where government AI testing fits in your procurement roadmap? Book a discovery call and we will help you figure that out, no strings attached.