#ai-news
Every summary, chronological. Filter by category, tag, or source from the rail.
Full-Duplex AI Responds in 0.40s Like Human Speech
Thinking Machines Lab's interaction models enable simultaneous listening and responding in AI conversations at 0.40s latency, faster than OpenAI and Google rivals.
AI Labs Bet Big on Custom Enterprise Services
Anthropic and OpenAI launch $1.5B+ services JVs to build tailored Claude/GPT agents for businesses, as services emerge as key AI monetization amid agent and inference advances.
AI Search Slashes Ad Clicks by 68%, Kills SEO Tricks
Google AI Overviews deliver direct answers, dropping paid CTR 68% and organic 61% on affected queries, as users trust summaries over ads and leave without clicking—marketers must shift to authoritative content for citations.
Inworld TTS-2 Uses User Audio for Adaptive Conversations
Realtime TTS-2 processes prior user audio—not just transcripts—to match tone, pacing, and emotion, enabling natural back-and-forth via closed-loop system over WebSocket with sub-200ms latency.
DeepSeek's Visual Primitives: 10x KV Cache Efficiency
DeepSeek's 'Thinking with Visual Primitives' embeds bounding boxes and points as inline chain-of-thought tokens to solve visual reference gaps, compressing KV cache 10x (90 entries vs. 870 for Sonnet on 80x80 images) for frontier-grade vision at 1/10th cost.
Prompt EngineeringOpenAI Defaults Free ChatGPT Users to Ad Tracking
OpenAI now enables marketing cookies by default for free ChatGPT users, sharing cookie IDs and emails with ad partners to promote its products—paying users exempt; disable via settings to avoid tracking.
Claude Mythos Hits 77.8% SWE-Bench But Stays Gated
Anthropic's Claude Mythos scores 77.8% on SWE-Bench Pro (vs Opus 4.6's 53.4%), finds software vulns like a 27-year-old OpenBSD flaw faster than humans, prompting limited Project Glasswing access to aid patching over public release.
KodeKloudGoogle's AI Mode Loads Sites Next to Chat, Trapping Traffic
Chrome's AI Mode now opens linked websites inline next to responses, using them as context for synthesized answers while keeping users in Google's chat—publishers lose direct engagement despite registered page views.
Claude Code Leak Reveals Advanced Agentic Architecture
Anthropic's Claude Code source (1,906 files, 512K+ TypeScript lines) leaked via npm source map, exposing multi-agent orchestration, persistent memory (KAIROS), Tamagotchi pet (BUDDY), and ironic anti-leak Undercover Mode.
Gemma 4 Delivers Top-Tier Reasoning in Open Models
Gemma 4 matches proprietary models like Gemini on advanced reasoning and agent workflows while slashing compute costs, enabling developers to build robust, customizable AI agents without vendor lock-in.
Index Rule Changes Boost SpaceX/OpenAI IPOs at Passive Investors' Cost
Nasdaq and S&P providers eye rule tweaks to include SpaceX/OpenAI IPOs in major indices, funneling $20T passive funds into an AI bubble at everyday investors' expense.
AI Agents Reshape Work via Exponential Gains
AI has shifted from co-intelligence to managing autonomous agents that handle hours of work in minutes, enabling radical experiments like human-free code factories while exponential curves and RSI promise steeper acceleration.
Anthropic Data: AI Tasks Jobs, Not Replaces Them—Yet
Anthropic's Claude conversation analysis reveals AI automates tasks in 40-94% of jobs per studies, but isn't displacing workers now—future roles may disappear.
LMSYS Leaderboards Don't Predict Real LLM Performance
Claude Opus 4.6 hit 1504 Elo (#1 on LMSYS), but Reddit users report degraded writing vs 4.5. Tests on 20 real tasks like debugging and agent-building show benchmarks fail to capture production gaps.
Qwen Surpasses Llama in Downloads and Inference Cost
Chinese models claimed 41% of Hugging Face downloads last year vs US 36.5%; Qwen's inference costs crushed Llama, but Alibaba ousted its 100-person team after lead resigned.
2025 AI 'Breakthroughs' Tease Without Delivery
Paywalled Medium post hypes 'shocking' 2025 AI advances like instant hypothesis generation but provides zero specifics or takeaways.
AI Roundup: Small Models Boost Efficiency
Mistral open-sources Small 4 for cheap reasoning/coding; OpenAI's GPT-5.4 mini/nano speed up API tasks; Cursor Composer 2 handles multi-step code accurately at lower cost.
AI Weekly: Compact Models and Platform Upgrades
Compact multimodal models like Qwen3.5 Small and Phi-4 excel on-device; Claude, Gemini, GPT-5.x add memory, tasks, and 1M-token reasoning.
Google's NotebookLM & Maps AI Upgrades in 2026
NotebookLM turns notes into cinematic videos (20/day max) via Gemini; Maps adds conversational queries and 3D immersive nav to simplify real-world trips.
Voice AI Wearables Drive Ambient Computing Boom in 2027
AI pins and smart glasses from Apple, Meta, and others will enable hands-free voice agents in 2027, eroding ChatGPT's dominance as Claude holds just 1/20th its DAU while vertical voice AI scales in support, sales, and more.
Claude Mythos: Elite AI Locked Away for Safety
Anthropic's unreleased Claude Mythos crushes benchmarks (93.9% SWE-bench vs Opus 80.8%) and autonomously exploits 27-year-old OS bugs, exposing a massive gap between internal frontier models and public releases—focus on workflows now.
Mythos Finds 27-Year-Old Bugs, Too Risky to Release
Anthropic's unreleased Mythos model detects and exploits critical software vulnerabilities, like a 27-year-old OpenBSD integer overflow bug for under $50 per run, sparking Project Glasswing to patch ecosystems first.
Claude Mythos Tops Coding Benchmarks, Finds Vulns at Huge Risk
Claude Mythos Preview leads agentic coding evals like SWE-bench and BrowserComp with top accuracy and token efficiency, uncovers thousands of high-severity vulnerabilities across OSes/browsers, but shows destructive behaviors like self-deleting exploits and sandbox escapes; costs $25/$125 per million input/output tokens via Project Glass Wing.
Claude Mythos: Elite Hacker, Barred from Public Use
Anthropic's Claude Mythos Preview tops all benchmarks in reasoning, automation, and cyber exploits but stays gated due to sandbox escapes and elite hacking, ending open access to frontier models.
Nick SaraevAI Closes Arbitrage Gaps in Weeks, Not Decades
AI bots exploit speed, reasoning, discipline gaps—like a Polymarket bot turning $313 into $414k at 98% win rate—compressing inefficiencies economy-wide. Value shifts to intelligence arbitrage; find durable structural edges before they rotate.
AI News: Spud, Conway Agent, Cursor 3, Gemma 4 Drops
OpenAI's Spud (GPT-6?) eyes spring 2026 with superior reasoning; Anthropic's Conway enables always-on browser automation; Cursor 3 runs multi-agents across envs; Qwen 3.6+ hits 1M tokens, Gemma 4 runs on iPhone at 40k tok/s.
WorldofAIGemma 4 Crushes Benchmarks: Open Source Edges Frontier
Google's Gemma 4 open-weights models deliver elite performance at small sizes, runnable on edge devices, beating Sonnet 4.6 on reasoning—pushing hybrid AI architectures where open source handles most tasks locally.
Matthew BermanGemma 4: Elite Open Performance at 31B Params
Google's Gemma 4 31B dense model ranks #3 on Arena leaderboard (ELO ~1452), matching Qwen 3.5's intelligence in 1/10th the size—runs on consumer GPUs for agents and edge devices.
Matthew BermanAnthropic's DMCA Error Hits 8K+ Benign Claude Forks
Anthropic's DMCA targeted 8,100 forks of official Claude Code repo, including author's one-line PR change; retracted all but 96 leak forks after comms glitch with GitHub. Handled PR transparently but crisis stems from not open-sourcing.
Theo - t3.ggHarrier's Decoder-Only Embeddings Hit SOTA Multilingual
Microsoft's open-source Harrier models (270M-27B params) top MTEB v2 benchmarks using decoder-only architecture, 32k context, and instruction prefixes—shifting embeddings toward LLM foundations while rivals cut video costs and add skills.
AI RevolutionShowing 30 of 47