The Alignment Tax
Three things happened this week that usually don't share a newsletter. The hardest reasoning benchmark ever built put frontier AI at 0.26% — humans at 100%. Software stocks broke below the S&P 500 for the first time in history. And Anthropic got blacklisted by the Pentagon for refusing to let its AI run domestic surveillance. The connecting thread: the distance between what AI can do in a lab, what it's doing in production, and what some companies will agree to build.
Key Developments
The Pentagon tried to blacklist Anthropic — and its biggest competitors defended it in court
Secretary Hegseth designated Anthropic as a "supply chain risk" and President Trump ordered all federal agencies to stop using Claude — the penalty for refusing to allow AI-driven domestic mass surveillance and autonomous weapons use. Microsoft, which holds equity in Anthropic, filed an amicus brief calling the designation a use of supply chain law to resolve a contract dispute that "may bring severe economic effects not in the public interest." More surprising: Google and OpenAI — competitors for the same government contracts — also filed supporting court documents. The whole AI industry just publicly agreed that using procurement blacklists to punish companies for ethical constraints sets a precedent none of them can afford.
Microsoft's equity stake in Anthropic means this is financial self-defense as much as principle — protecting a $4B+ investment from a government designation that would crater its value. Google and OpenAI's support is similarly self-interested: a legal precedent that lets the Pentagon blacklist AI companies for ethical constraints is a weapon any future administration could use against them. When financial interest and ethics align this neatly, calling it a values test overstates the moral clarity.
ARC-AGI-3 just proved that 'approaching human-level reasoning' was about the benchmarks, not the reasoning
The team behind ARC-AGI launched their third benchmark on March 25, and every frontier model that supposedly dominated ARC-AGI-2 — GPT-5.4, Claude 4.6, Gemini 3.1 — scored below 1%. Humans score 100%. ARC-AGI-3 replaces fixed puzzles with interactive game environments where the AI must explore, learn the rules, and generalize across difficulty levels with no stated goals — what a human naturally does when picking up a new game. No memorization, no fine-tuning, no pattern-matching on training data. Models don't generalize from first principles; they compress and recall patterns. Every "reasoning" milestone of the last two years describes how well AI pattern-matches, not how well it thinks.
ARC-AGI-3 was designed by François Chollet specifically to be hard for LLMs — it measures a narrow definition of reasoning (novel game generalization) that may not be relevant to the work most businesses care about. GPT-5.4 scored 83% on GDPVal (OpenAI's own benchmark — directionally correct, not independent validation) and solved an open math problem this same week. The 0.26% result says AI can't learn new games from scratch; it says nothing about whether AI can do your accounting, write your code, or draft your legal brief — tasks it's already doing at scale. ARC-AGI-3 tests one specific cognitive capability that happens to be hard to fake with pattern compression.
Software stocks just broke below the S&P 500 for the first time in history — the market has a name for why
The iShares IGV software ETF is down 21% year-to-date. Salesforce is down 30%. Adobe's forward P/E collapsed from a five-year average of 30x to 12x. For the first time in market history, software trades at a discount to the broader S&P 500. The market's thesis is "seat compression": one AI agent can replace the work of multiple software seats, breaking the per-seat revenue model that made SaaS stocks worth premium multiples for a decade. The question isn't whether SaaS companies will build AI — Salesforce already has Agentforce at $800M ARR. It's whether bolting AI onto per-seat subscription models is fast enough to offset the structural repricing. The market is saying no.
Adobe at 12x forward P/E and Salesforce at a 10-year price low look like deep value to any investor who believes the software layer survives AI. These companies have data moats, customer lock-in, and compliance requirements that AI agents can't easily replicate — try telling a regulated financial institution to swap Workday for a custom agent. And 95% of generative AI pilots fail to reach production scale (Deloitte, March 2026). The seat compression thesis assumes enterprises will replace software with AI agents; the data shows most can't even get agents off the ground.
What the Evidence Moved
GPT-5.4 Pro solves open FrontierMath problem (Epoch AI verified) + AI Scientist paper in Nature — two independent confirming signals for novel AI scientific insights
DOL DOLA 2.8M cases in production + Salesforce Agentforce embedded in all Suites + Workday Sana Labs + Microsoft Azure Copilot agents + Accenture/Databricks 327% deployment increase. Move reflects vendor-embedded definition being met; rigorous autonomy threshold remains ~0.45.
Tufts Jobs Risk Index 9.3M at risk (33 tipping-point occupations), Dallas Fed workers 22-25 in AI-exposed roles -16%, HBS automation postings -13%. Counter: CompTIA +1.9% tech employment, ADP Solow paradox (task time +346% in bad implementations)
Company Impact
harvey
Data refresh$200M raise at $11B — up from $8B in 90 days. 100K+ lawyers, 1,300 orgs, 25K+ agents, $195M ARR
CNBC
Salesforce
Data refreshDOL Agentforce DOLA deployment (2.8M cases) + Agentforce embedded in all Suites + $800M ARR
Salesforce press release
Monitoring: Adobe, Microsoft, workday
Sources
- Federal News Network — Microsoft backs Anthropic against Pentagon
- ARC Prize — ARC-AGI-3 launch
- FinancialContent — The Great SaaS Reset
- Salesforce — DOL Agentforce press release
- EU Council — High-risk AI enforcement delay
- Winbuzzer — GPT-5.4 FrontierMath solve
- CNBC — Harvey $200M raise at $11B
- The Diplomat — China AI Five-Year Plan
- Deloitte — State of AI in Enterprise 2026
- Dallas Fed — AI and Labor Market
- Harvard Business Review — How AI Is Changing the Labor Market
- National Today — Tufts AI Jobs Risk Index
- Josh Bersin — Workday and Sana strategy
Next issue drops Monday
Subscribe to get the briefing before the market opens.