News
KPMG fabricated AI case studies in a report designed to sell clients on AI adoption
7+ hour ago (214+ words) the-decoder. com KPMG fabricated AI case studies in a report designed to sell clients on AI adoption A KPMG report on AI in business contained fabricated case studies. "Redefining excellence in the age of agentic AI" made false claims about…...
Microsoft Research's Mirage gives video generation a persistent spatial memory that doesn't forget what's around the corner
2+ hour, 47+ min ago (574+ words) Mirage is a new video world model that skips the costly detour through pixel-based memory. That speeds up generation and keeps a scene's spatial structure stable even during long camera moves. Researchers from several universities built it with Microsoft Research....
AI coding agents find the right file but miss the exact lines that matter, study shows
7+ hour, 52+ min ago (676+ words) A new benchmark separates code search from the actual fix and exposes a hidden weakness of AI coding agents. They land in the right neighborhood but miss the crucial spots. Until now, AI coding has mostly been judged by the…...
New AI model called "Count Anything" does exactly what it says, and that's harder than it sounds
23+ hour, 46+ min ago (310+ words) Getting those counts right has real consequences, whether it's a doctor reading a scan, a farmer estimating crop yields, or a city planner analyzing traffic. Until now, each of these tasks has required its own specialized system. It's a familiar…...
Microsoft CEO Satya Nadella admits he's a token-maxer, too: "It's addictive"
1+ day, 3+ hour ago (186+ words) the-decoder. com Microsoft CEO Satya Nadella admits he's a token-maxer, too: "It's addictive" Microsoft CEO Satya Nadella is now warning against "token-maxing," the uncritical use of the most powerful AI models for every task. "The hard truth is that the…...
Google Research's Gemini-SQL2 tops text-to-SQL benchmarks by a wide margin
1+ day, 4+ hour ago (187+ words) the-decoder. com Google Research's Gemini-SQL2 tops text-to-SQL benchmarks by a wide margin Google Research unveiled Gemini-SQL2, a new text-to-SQL system built on Gemini 3. 1 Pro. It translates natural language into executable SQL database queries. On the BIRD benchmark, which measures how…...
Microsoft's Skill Opt boosts GPT-5. 5 by using nothing but a trained Markdown file
1+ day, 4+ hour ago (752+ words) A simple Markdown file is apparently enough to boost GPT-5. 5 by more than 20 points on procedural tasks. That's the promise of Skill Opt, a method from Microsoft and three Chinese universities that trains instruction documents for AI agents the same…...
Meta shifts from "tokenmaxxing" to token managing as internal AI costs reportedly hit billions
1+ day, 6+ hour ago (244+ words) In an internal memo sent to about 6, 000 employees, Meta flagged an "exponential increase" in AI usage and warned the company is on track for billions in costs from internal use alone by 2026, The Information reports. Individual employees and teams had…...
Claude Fable 5 outpaces GPT-5. 5 by 13 points on Frontier Math's toughest problems
1+ day, 6+ hour ago (173+ words) the-decoder. com Claude Fable 5 outpaces GPT-5. 5 by 13 points on Frontier Math's toughest problems Anthropic's new model, Claude Fable 5, posts top scores on the Frontier Math benchmark. According to Epoch AI, Fable 5 hits 87 percent accuracy on tiers 1 through 3 and 88 percent on…...
Moonshot's open model Kimi K2. 7 Code undercuts GPT-5. 5 and Claude by up to 12x on price per token
1+ day, 8+ hour ago (645+ words) Moonshot AI has released Kimi K2. 7 Code, a new AI model built specifically for programming tasks and agent-based coding workflows. The model builds on its predecessor, Kimi K2. 6, and is available as an open-weights version on Hugging Face. According to Moonshot AI,…...