Technology

Scaled Cognition Raised $100 Million on a Model It Says Won't Give a Wrong Answer. The Benchmarks Are From Last Year.

Khosla led the round at a reported $750 million. The strongest reliability claim is the investor's, not a measured result.

Janet Torvalds

June 26, 2026

Scaled Cognition, a Mountain View AI lab, raised $100 million in a Series A led by Khosla Ventures on June 25. The Wall Street Journal put the valuation at about $750 million. The company sells a model called APT, and the pitch around it is unusually blunt: an AI that, in the words of its lead investor, "will not give you a wrong answer."

That is a large thing to say out loud. Most people who build language models avoid the word "will" entirely, because the models work on probability and probability does not deal in guarantees. So it is worth separating what Scaled Cognition has actually built from what is being said about it this week.

What APT actually is

APT stands for Agentic Pretrained Transformer. The idea is that a model meant to take actions (issue a refund, change a flight, update a record) should be trained on actions, not just text. Ordinary LLMs are trained to predict the next token across a large pile of scraped web text. That makes them fluent. It does not teach them to follow a company's specific business rules, because that data is not sitting on the web.

So Scaled Cognition generated its own. The company's approach, described in its technical post, has three moving parts. It builds synthetic training data that pairs conversations with the correct actions to take in those conversations. It runs agent-to-agent self play, the same broad technique that produced strong chess and Go engines, adapted to messier multi-step tasks where there is no clean win or loss. And it optimizes the model on action-level objectives rather than token-level ones, then wires in deterministic controls for the business policies that have to hold every time.

That last piece is the interesting one. The system is a language model bolted to hard rules for the parts that cannot be left to a guess. That is a reasonable design. It is also, notably, not the same thing as a model that cannot be wrong.

The claim worth checking

Here is where the language gets slippery, and the slippage is the story.

Scaled Cognition's own technical post, published when it launched APT-1 in February 2025, says the model "behaves more deterministically, makes fewer hallucinations, and outperforms standard LLMs on agentic workflows." Fewer. That is a measurable, defensible engineering claim.

The funding announcement this week uses different words. The press release says APT delivers conversational quality "while eliminating hallucinations and guaranteeing policy-adherent performance." Eliminating. Guaranteeing. And Vinod Khosla, who led the round and has a board seat, went furthest: "a model that will not give you a wrong answer."

"Fewer" and "eliminating" are not the same claim. One is the kind of thing you can show with a benchmark. The other is the kind of thing that tends to meet its first counterexample on day one of a real deployment.

Show the methodology

The company says APT-1 tops the two hardest agentic benchmarks, Tau-Bench (from Sierra Research) and ComplexFuncBench (from Tsinghua's THUDM group). Both are real, public, and reasonable choices for this kind of system. The technical post notes the numbers were averaged over 10 runs, which is more than a lot of vendors bother to disclose.

The problem is what is missing. Those benchmark results are from the February 2025 launch post, and they are presented as a chart without the underlying numbers written out or the specific competing models named in the text. No fresh benchmarks came with the $100 million. If the claim has moved from "fewer hallucinations" to "eliminating hallucinations" in the intervening sixteen months, the data supporting that jump is not in any of the public materials. A benchmark that measures task success on Tau-Bench is also not the same as a guarantee of zero wrong answers in production, which is the claim being marketed.

Why this is not vaporware

None of this means the company is selling air. The founders have built this before. CTO Dan Klein is a UC Berkeley AI professor and a serious natural-language researcher. CEO Dan Roth previously ran Semantic Machines, an early conversational-AI company that Microsoft bought in 2018, after which he ran conversational AI there. This is the second company the two have built together.

The customers are real too. Genesys, a contact-center platform that says it serves more than 8,000 organizations across 100-plus countries, uses APT for its virtual agents and has also invested in Scaled Cognition. The company says it is in production with Fortune 500 firms in financial services, healthcare, telecom, and insurance, and that customers using its models are on track to automate more than a billion customer-service interactions over the next year. The architecture-first argument, that you cannot bolt reliability onto a frontier model after the fact and have to design for it from the start, is a coherent technical position and a real point of difference from the many companies wrapping a safety layer around someone else's model.

The bottom line

The bet underneath the round is sound: the thing blocking enterprises from handing real tasks to AI is not raw capability, it is the quiet, confident wrong answer that a human reviewer waves through. Scaled Cognition is selling trust into a $600 billion outsourcing market where a single bad answer can mean a fine or a lawsuit, and it owns the deployment model regulated industries want, running in a private cloud or fully self-hosted.

The reliability problem is the right problem. "Fewer hallucinations" backed by a disclosed benchmark is a credible answer to it. "A model that will not give you a wrong answer" is a sentence, and the distance between those two is exactly what a billion live customer interactions are about to measure.

Dan KleinTau-BenchAI startupsScaled CognitionSeries A fundingDan RothAI hallucinationsAgentic Pretrained TransformerAI agentsenterprise AIKhosla VenturesAPTVinod Khosla

Sources (5)

Scaled Cognition Raises $100M Series A Led by Khosla Ventureswww.globenewswire.com
Scaled Cognition raises $100M to build AI that won't hallucinatethenextweb.com
Scaled Cognition introduces the Agentic Pretrained Transformerwww.scaledcognition.com
Tau-Benchgithub.com
ComplexFuncBenchgithub.com

Keep reading

Technology

Runlayer Raised $30 Million to Sit Between Your AI Agents and Everything They Can Touch

Janet Torvalds

Technology

General Intuition Raised $320 Million to Train Robots on Gameplay Clips. The Action Labels Are Why It Might Work.

Janet Torvalds

Technology

Two Ex-Anthropic Researchers Raised $200 Million With No Product. The Valuation Is $1 Billion.

Janet Torvalds

Technology

Vishal Sikka Raised $32 Million to Build an IT Services Firm That Runs on AI. The Cost Claim Has No Number Yet.

Janet Torvalds