Today in AI engineering, design & research.
A reading room of curated AI summaries. The signal, distilled. One short brief when something good lands; the rest waits here for you.
Today's reading — editor's picks
NVIDIA's 10x Workflows with Codex on GPT-5.5
NVIDIA's 40k engineers use Codex (GPT-5.5) to autonomously build production systems in hours and run full ML research cycles, delivering 10x speedups and 20x code efficiency gains.
Codex Prompts Automate Finance Reporting and Models
Finance teams cut assembly time on MBR narratives, model cleanups, CFO packs, variance bridges, and forecasts by feeding Codex existing spreadsheets, dashboards, and notes via copy-paste prompts that cite sources and flag risks—no coding required.
10x Engineering Speed with Codex and ChatGPT Rollout
AutoScout24 slashed dev cycles from 2-3 weeks to 2-3 days by giving ChatGPT to 2,000 employees and Codex to 1,000 builders, using AI champions and workflow integration for organic adoption.
One short email when something good lands.
No daily firehose. No sponsored slop. Just the few summaries each week that move the needle for AI engineers and design engineers — picked by humans, sent at 7am.
The stream — chronological
NVIDIA's 10x Workflows with Codex on GPT-5.5
NVIDIA's 40k engineers use Codex (GPT-5.5) to autonomously build production systems in hours and run full ML research cycles, delivering 10x speedups and 20x code efficiency gains.
Codex Prompts Automate Finance Reporting and Models
Finance teams cut assembly time on MBR narratives, model cleanups, CFO packs, variance bridges, and forecasts by feeding Codex existing spreadsheets, dashboards, and notes via copy-paste prompts that cite sources and flag risks—no coding required.
10x Engineering Speed with Codex and ChatGPT Rollout
AutoScout24 slashed dev cycles from 2-3 weeks to 2-3 days by giving ChatGPT to 2,000 employees and Codex to 1,000 builders, using AI champions and workflow integration for organic adoption.
Parameter Golf: Creativity in Tiny ML Models
OpenAI's 16MB/10-min ML challenge drew 1,000+ participants and 2,000+ submissions, showcasing optimizations, quantization, novel architectures, and AI agents' role in accelerating research while creating review challenges.
Interaction Models: Native Real-Time Multimodal AI
Replace turn-based AI harnesses with native interaction models using 200ms micro-turns for continuous audio/video/text processing, enabling proactive visuals and simultaneous speech—outperforming GPT/Gemini on interaction benchmarks.
DeepMind's 4 Principles for Contextual AI Pointers
DeepMind's Gemini-powered mouse pointer captures visual/semantic context at cursor to enable natural pointing + speech interactions, guided by 4 principles that eliminate prompt-heavy AI detours.
Medicare's ACCESS Rewards AI Outcomes Over Time Spent
CMS's 10-year ACCESS model pays for chronic care outcomes like lower blood pressure, enabling AI agents to scale where human-only care couldn't—Pair Team's Flora AI handles 24/7 patient check-ins for vulnerable seniors.
Modular Hybrid-Memory Agent with OpenAI Tools
Build a production-ready autonomous agent in Python using hybrid vector+BM25 memory fused by RRF (K=60), modular tool dispatch, and a self-managing loop limited to 8 tool rounds for reliable reasoning and action.
AntAngelMed: 103B MoE Medical LLM Matches 40B Dense at 7x Speed
103B-param open-source medical LLM activates only 6.1B params via 1/32 MoE, rivals 40B dense models with 7x efficiency, tops HealthBench/MedBench, runs 200+ tps on H20.
Build Stateful Agents with File Systems & AI SDK v6
Give agents persistent sandboxes, bash tools, and memory files via AI SDK v6 to make them follow long tasks, build on prior work, and generate reusable Python scripts without manual context management.
GPU-Orchestrated Multi-Agent Sustainability Intelligence Blueprint
Chelsie Czop and Mitesh Patel demo a serverless multi-agent app using Google ADK, Gemma 4 on NVIDIA RTX PRO 6000 GPUs via Cloud Run, and Milvus RAG for real-time environmental risk reports from satellite, telemetry, and policy data.
RL Industrializes GenAI Production via Feedback Loops
95% of GenAI pilots fail production because instruction tuning and prompts can't systematically integrate defects and metrics. RL does, enabling smaller/cheaper/faster models that scale to millions in token costs at Fortune 500s like AT&T.
Gemini Enables Agentic Tasks and Prompt-Based Widgets on Android
Google's Gemini on Android now automates multi-app tasks like grocery shopping from notes to cart, browses web for bookings, fills forms, dictates naturally, and generates widgets from natural language descriptions—rolling out summer 2026 on Pixel/Samsung first.
Anthropic Bolsters Claude for Legal Automation Boom
Anthropic launches legal plugins and MCP connectors for Claude to automate law firm tasks like document review and drafting, entering a market where Harvey raised $200M at $11B valuation and Legora secured $600M Series D at $5.6B valuation.
Malleable Evals: Adaptive Testing for Changing AI Agents
Static benchmarks fail self-adapting agents; use production traces for agent-curated, always-on eval suites that self-optimize toward user intent.
AI Mockups Free Teams for System-Level Design
AI enables anyone to generate mockups in minutes, shifting focus from pixel layouts to crucial discussions on data structures, feature relationships, and user mental models for product coherency.
ChatGPT Adoption Broadens Across Demographics, Geography in 2026Q1
Q1 2026 consumer data shows ChatGPT usage growing among feminine-named users (>50% share), over-35s gaining share, emerging markets (e.g., Haiti +9 per-capita rank), and specialized work tasks like health docs.
CoCoDA: Co-Evolve DAGs to Scale Tool-Augmented Agents
CoCoDA uses a compositional code DAG to jointly evolve tool libraries and planners, enabling efficient retrieval from growing libraries and letting an 8B model match or beat a 32B teacher on GSM8K and MATH benchmarks.
Blankfein's Risk Playbook for Crises and Scaling Firms
Lloyd Blankfein shares how Goldman balanced aggressive risk-taking with contingency planning, stayed calm in crises, and built partnership culture—lessons for tech leaders facing AI uncertainties.
Dessn: Design Prototypes in Live Cloud Codebases
Dessn runs existing codebases in the cloud with zero setup, letting designers prompt AI iterations directly in production for seamless dev handoffs—raised $6M to prioritize design as code commoditizes.
Night Shift: Agents Run Recurring Jobs Automatically
Delegate repetitive tasks to AI agents using the Night Shift pattern—shared interface + scheduled skills + brief human reviews—so agents handle work overnight, surfacing only decisions needing your input.
Shopify Shop's Big Design Bets: Vision, AI, Craft
Katarina Batina explains how Shopify's Shop app thrives by prioritizing bold visions like low-density feeds and AI prototypes over strict metrics, fostering delight through cross-functional craft sprints.
Vapi's Control-Focused Voice AI Wins Ring, Hits $500M Val
Vapi beat 40 rivals to handle 100% of Amazon Ring's calls by giving engineers granular AI control, fueling $50M Series B at $500M valuation and 1B+ calls processed.
Agent OS Makes AI Agents Reliable and Scalable
Current AI agents are stateless 'goldfish' that forget tasks instantly. An Agent OS adds scheduling, memory, tools, identity, observability, and guardrails to manage them like a computer OS manages apps, enabling safe scaling.
Aurora Fixes Muon's Neuron Death in Tall MLPs
Aurora optimizer eliminates >25% neuron death in Muon's tall matrices by jointly enforcing left semi-orthogonality and uniform row norms √(n/m), delivering SOTA on nanoGPT speedrun with 6% compute overhead.
skfolio: Build & Tune Portfolio Optimizers in Python
skfolio's scikit-learn API lets you construct, validate, and compare 18+ portfolio strategies—from baselines to HRP, Black-Litterman, factors, and tuned models—on S&P 500 returns with walk-forward CV and GridSearchCV.
Daybreak: AI Agents for Proactive Vuln Patching
OpenAI's Daybreak expands Codex Security (launched March 2026) to ingest repos, build threat models, validate patches in isolation, and propose fixes with human review—reducing analysis from hours to minutes via tiered GPT-5.5 models gated by Trusted Access for Cyber.
Full-Duplex AI Responds in 0.40s Like Human Speech
Thinking Machines Lab's interaction models enable simultaneous listening and responding in AI conversations at 0.40s latency, faster than OpenAI and Google rivals.
GM Cuts 600 IT Jobs to Hire AI-Native Engineers
GM laid off 600 IT workers (10% of department) to recruit specialists in agent/model development, prompt engineering, data pipelines—showing enterprises must rebuild teams for production AI, not just add tools.
LLM Distillation: Soft, Hard, and Co Techniques Explained
Distill large teacher LLMs into efficient students via soft-label (match probabilities for dark knowledge), hard-label (imitate outputs for cheap scalability), or co-distillation (joint training to minimize performance gaps).
Showing 30 of 1779