AI Engineer
Every summary, chronological. Filter by category, tag, or source from the rail.
Building Great Agent Skills: The Missing Manual
To escape 'skill hell,' developers must treat agent skills as structured, maintainable code by optimizing triggers, minimizing context bloat, using 'leading words' for steering, and aggressively pruning irrelevant instructions.
AI EngineerPrototype Big, Deploy Small: A Framework for Local LLM Adoption
Stop overpaying for frontier models. By using a 'prototype big, deploy small' framework and rigorous capability evals, you can identify 'Sage' (Small and Good Enough) models that provide production-grade performance on-device, saving costs and improving latency.
Prototype Big, Deploy Small: A Framework for On-Device AI
Stop defaulting to expensive frontier models. By using a 'prototype big, deploy small' framework and rigorous local evals, you can replace costly cloud inference with smaller, faster, and more private on-device models.
The Future of AI: Shifting from Monolithic Agents to Composition
Justin Schroeder argues that the future of AI lies in 'domain-specific agents'—small, specialized, composable units—rather than monolithic agents, to solve the reliability, cost, and complexity issues inherent in current agentic architectures.
The Future of AI: Shifting from Monolithic to Domain-Specific Agents
Moving from large, monolithic agents to a composition-based architecture of small, domain-specific agents reduces costs, improves reliability, and enables safer, more scalable AI deployments.
Moving Upstream: Why Product Strategy Beats Prompting
As AI makes coding cheap, the bottleneck has shifted to product discovery. Success now depends on human-centric techniques like story mapping and value-based requirements to ensure you build what is actually worth building.
Why Product Strategy Beats Prompting in the AI Era
As AI makes coding cheap, the bottleneck for software development has shifted upstream. Success now depends on human-centric skills: eliciting requirements, mapping processes, and validating business value before writing a single line of code.
Building Deterministic Infrastructure for Autonomous AI Agents
Reliability in agentic systems is an infrastructure challenge, not a model one. To scale agents, you must build a 'control plane' that separates model reasoning from production execution via validation, policy enforcement, and circuit breakers.
Building Deterministic Infrastructure for Non-Deterministic AI Agents
To move AI agents from demos to production, engineers must shift focus from prompt engineering to building a robust 'agent control plane' that enforces determinism, safety, and resource governance over stochastic model outputs.
The Agentic AI Engineer: Eval-Driven Development Loops
The Agentic AI Engineer automates the agent development lifecycle—spec, build, evaluate, diagnose, and optimize—using a multi-agent system to remove the human bottleneck from production-ready AI agent maintenance.
The Agentic AI Engineer: Scaling Agent Development via Loops
To scale agent development, teams must move from manual iteration to an 'Agentic AI Engineer' model: a multi-agent system that automates the entire lifecycle of spec, build, eval, diagnose, and optimize.
The Prompt as a Platform: Agentic Engineering for Distributed Systems
Dominik Tornow argues that software engineering is shifting from general-purpose implementations to bespoke systems synthesized by agents from abstract specifications, using deterministic simulation as the critical feedback loop for design.
The Prompt is the Platform: Agentic Engineering for Distributed Systems
By moving agents upstream into the design phase using deterministic simulation, developers can synthesize bespoke, production-ready implementations from abstract specifications rather than relying on general-purpose libraries.
Automating ETL Pipeline Recovery with RL Agents
A reliable, safety-first architecture for ETL pipeline remediation that uses deterministic anomaly detection, Q-learning for action selection, and an external safety layer to reduce MTTR by 99.85%.
RL-Guided ETL Pipeline Remediation: Architecture and Evals
Automate ETL failure recovery using a deterministic anomaly detection layer, a Q-learning policy for action selection, and a hard-coded safety guardrail to ensure operational reliability.
Debugging AI Agents: Why Replayability Beats Determinism
Stop chasing bitwise determinism in LLMs. Instead, implement a 'record and replay' architecture to capture agent state transitions, enabling you to debug production failures by re-running traces with mocked nodes.
Debugging Production AI Agents via Record and Replay
Stop chasing bitwise determinism in LLMs. Instead, implement a record-and-replay architecture to capture agent state transitions, enabling deterministic debugging and regression testing of non-deterministic production failures.
Building Low-Latency Voice-In, Visuals-Out AI Agents
To achieve a seamless AI UX, shift from voice-in/voice-out to voice-in/visuals-out. This leverages the human brain's visual processing capacity and a more forgiving 1-second latency budget compared to the strict 200ms required for fluid speech.
AI EngineerOptimizing Voice-In, Visuals-Out AI Experiences
To build delightful AI agents, prioritize 'voice-in, visuals-out' interactions. By using fast models, eager inference, and aggressive prefix caching, you can meet the 1-second latency threshold required for seamless user interaction.
AI-Driven Multi-Document Correlation for Financial Compliance
Moving from isolated document validation to cross-document intelligence using graph-based entity correlation and probabilistic risk modeling significantly improves fraud detection and reduces false positives in enterprise compliance.
Cross-Document AI for Predictive Financial Compliance
Moving from document-level validation to cross-document graph correlation and probabilistic risk modeling reduces false positives by 76% and enables proactive fraud detection.
Stop Writing Tone Instructions: Use a 4-Layer AI Architecture
Stop relying on a single system prompt for brand voice. Instead, use a four-layer architecture—Immutable Identity, Situational Mode, Example-Anchored Voice, and a Deterministic Veto—to separate instructions from verification.
AI EngineerBuilding a Personal AI Research OS
Transform a fragmented 'Second Brain' into a living research system by using a file-based index and a three-layer architecture (Raw, Index, Wiki) instead of complex vector databases.
Building and Scaling Production AI Agents at OpenGov
OpenGov scales its 'OG Assist' agent platform by moving away from pre-built frameworks to a custom, Effect-TS native agent loop, prioritizing observability, human-in-the-loop safety, and modular tool-based architecture.
Solving the 'Amnesia' Problem in AI Coding Agents
Current AI coding agents are limited by 'repo-bound' vision and lack of episodic memory. Polygraph solves this by creating a meta-harness that provides agents with a unified dependency graph and shared session state across repositories.
The Log Is The Agent: Rethinking AI Agent Architecture
Treating the session log as the primary, durable primitive for AI agents—rather than the model or runtime—enables reliability, portability, and true ownership of agent state.
AI EngineerRecursive Coding Agents: Managing AI Geniuses
Recursive Language Models (RLMs) improve agent reliability by treating context as an object of computation, allowing agents to decompose complex tasks into recursive sub-agent calls that verify and execute work symbolically.
Engineering Principles for Agentic Systems
Building AI agents is not about writing prompts, but architecting systems. By applying traditional software engineering principles—decomposition, state management, and separation of concerns—you can build reliable, maintainable agentic systems that move beyond simple, brittle LLM interactions.
The Miranda Hypothesis: Why Persona Evals Fail
Current persona-based AI benchmarks measure 'convincingness' rather than historical fidelity, leading to 'Miranda distortion' where models prioritize culturally dominant narratives (like the Hamilton musical) over primary documentary records.
The Production AI Playbook: Deploying Agents at Enterprise Scale
Moving AI from demo to production requires shifting focus from model selection to five pillars: evaluation, observability, data foundation, orchestration, and governance.
AI EngineerShowing 30 of 162