№ 06 / FIREHOSE

Firehose

The stream in tweet form. 361 of 361 posts · 30 at a time

Meng To: Building Software with AI and Codex
This is a conversation about shifting from traditional design software like Figma to an agent-based, local-first workflow using Codex. The guest demonstrates how he uses the tool to build custom internal apps, automate image generation, and manage his entire project lifecycle through local markdown files and AI agents.
Optimizing LLM Inference: KV Cache and Paged Attention
This is a technical primer on how vLLM manages GPU memory to improve inference throughput. The video explains the mechanics of KV caching and paged attention, offering a few specific configuration tips for optimizing memory utilization, prefix caching, and chunked prefill in production environments.
Building Great Agent Skills: The Missing Manual
This video outlines a structured rubric for designing AI agent skills to avoid "skill hell," focusing on four key areas: choosing between user-invoked and model-invoked triggers, organizing content into steps and reference material, using "leading words" for steering, and pruning unnecessary instructions. You can find the framework and examples in the writing-great-skills resource.
Building Production-Grade Multi-Agent Systems with ADK
A technical walkthrough of building a multi-agent system using Google's Agent Development Kit and the Model Context Protocol. The video demonstrates how to use ADK’s built-in session state, instruction templating, and three-tier security callbacks to manage agent routing and tool execution without writing custom orchestration logic.
Building Custom Apps with Claude Code: A Step-by-Step Guide
This video demonstrates a workflow for building a custom time-tracking and invoicing web app using Claude Code. The creator walks through his process of using a PRD-generation tool to define scope and data models before having the AI write the application code.
Optimizing Software Delivery with AI-Assisted Code Reviews
This is a high-level conceptual overview of how AI-assisted code review works, explaining the roles of static analysis, linters, and LLMs in a development pipeline. It functions as an introductory primer for managers or developers curious about the benefits and risks of integrating these tools into their existing workflows.
Optimizing Software Workflows with AI Code Review
This is a high-level conceptual overview of how AI-assisted code review works, explaining the roles of static analysis, linters, and LLMs in a development pipeline. It serves as a general introduction to the topic rather than a hands-on tutorial or a review of specific AI Code Review tools.
SaaS Price Increase Playbook: A Strategic Guide
This talk is a practical guide for SaaS founders on how to execute price increases without alienating their user base. The speaker outlines a framework for choosing between adjusting price points, packaging, or value metrics, and provides a communication strategy for rolling out changes while managing customer expectations and internal team alignment.
Prototype Big, Deploy Small: A Framework for Local LLM Adoption
This presentation argues for replacing expensive, latency-heavy frontier models with smaller, local models (SLMs) for specific production tasks. The speaker outlines a "prototype big, deploy small" workflow, demonstrating how to build a golden dataset from production traces to validate that a smaller model can handle your specific use case as effectively as a foundation model.
Prototype Big, Deploy Small: A Framework for On-Device AI
This video advocates for a "prototype big, deploy small" workflow, encouraging developers to use frontier models for initial testing before migrating to smaller, local models for production. The presenter demonstrates how to build a "golden dataset" from production traces to validate these smaller models using Arize Phoenix for evaluation.
The Future of AI: Shifting from Monolithic Agents to Composition
This talk argues that the current approach to AI agents—stuffing tools and context into a single large model—is fundamentally flawed. The speaker advocates for a shift toward "domain-specific agents" that are modular and composable, rather than relying on monolithic systems or just adding more tools via the Model Context Protocol.
The Future of AI: Shifting from Monolithic to Domain-Specific Agents
This talk argues that the current "all-in-one" agent architecture is flawed because it relies on stuffing too much context into a single model. Instead, the speaker advocates for building smaller, domain-specific agents that can be composed together, treating them like modular software components rather than monolithic assistants.
Moving Upstream: Why Product Strategy Beats Prompting
This talk argues that as AI lowers the barrier to writing code, the primary bottleneck in software development has shifted back to traditional business analysis. The speaker advocates for using established frameworks like story mapping and a "Value-Architecture-Design" (VAD) approach to ensure teams build high-impact tools rather than just automating existing inefficiencies.
Why Product Strategy Beats Prompting in the AI Era
This talk argues that as AI lowers the barrier to writing code, the primary bottleneck in software development has shifted back to traditional business analysis. The speaker advocates for using established techniques like story mapping and a "Value-Architecture-Design" (VAD) framework to define requirements before prompting AI, emphasizing that the goal is to build software that is actually used rather than just demoed.
Building Deterministic Infrastructure for Autonomous AI Agents
This talk from Nishant Gupta outlines the shift from "prompt engineering" to "systems engineering" for production-grade AI agents. He argues that because LLMs are stochastic, developers must build a deterministic "control plane" around them to handle orchestration, observability, and safety, rather than letting agents interact directly with production infrastructure.
Building Deterministic Infrastructure for Non-Deterministic AI Agents
This talk outlines the shift from "prompt engineering" to "systems engineering" for production AI agents. Nishant Gupta argues that because LLMs are stochastic, you must wrap them in a deterministic control plane—using patterns like circuit breakers, policy-based tool validation, and observability—to prevent infrastructure-level failures like recursive retry loops.
The Agentic AI Engineer: Eval-Driven Development Loops
This video outlines a conceptual framework for automating the software development lifecycle of AI agents. The speakers propose using a multi-agent system to handle the specification, building, evaluation, and optimization of other agents, aiming to replace manual human oversight with an "eval-driven" loop.
The Agentic AI Engineer: Scaling Agent Development via Loops
This video is a conceptual pitch for Mutagent, a platform designed to automate the agent development lifecycle. The speakers outline an "eval-driven" workflow where an orchestrator agent manages the specification, building, testing, and monitoring of other AI agents to remove human bottlenecks from the iteration loop.
The Prompt as a Platform: Agentic Engineering for Distributed Systems
This talk outlines a workflow for using AI agents to synthesize bespoke software implementations from abstract specifications. The speaker argues that by using deterministic simulation environments to expose "forbidden" state information, agents can move upstream from writing code to designing robust distributed systems.
The Prompt is the Platform: Agentic Engineering for Distributed Systems
This talk outlines a methodology for using AI agents to synthesize bespoke software implementations from abstract specifications, rather than relying on general-purpose frameworks. The speaker details a multi-step workflow—moving from abstract spec to simulation to concrete implementation—using Resonate as a case study for building durable execution systems.
Automating ETL Pipeline Recovery with RL Agents
This talk outlines an architecture for automating ETL pipeline recovery using a combination of deterministic anomaly detection, Q-learning for action selection, and an external safety layer. The speaker emphasizes that the system's reliability stems from its modular design—separating observation from decision-making—rather than the RL model itself, and provides a public repository containing the synthetic benchmark and experimental scripts used to validate the approach.
RL-Guided ETL Pipeline Remediation: Architecture and Evals
This talk outlines a framework for automating ETL pipeline recovery using a combination of deterministic anomaly detection, Q-learning for action selection, and a hard-coded safety layer. The speaker demonstrates how this architecture—available in their GitHub repository—reduces mean time to recovery by offloading routine failures to an agent while escalating high-risk or novel cases to human engineers.
Debugging AI Agents: Why Replayability Beats Determinism
This talk argues that trying to force LLMs to be deterministic is a dead end for production debugging. Instead, the speakers propose a "record and replay" pattern—similar to Mozilla rr—where you capture inputs and outputs at the boundaries of your agent’s nodes to reconstruct and step through failed execution traces without needing to re-run the model.
Debugging Production AI Agents via Record and Replay
This talk argues that chasing "bitwise determinism" in LLM agents is a losing battle and proposes a "record and replay" architecture instead. The speakers demonstrate how to capture inputs and outputs at the boundary of each agent node—rather than the network layer—to allow engineers to step through failed production traces offline without re-triggering the model.
Thermodynamic Computing and the Future of AI-Driven Chip Design
This is a wide-ranging technical conversation with Thomas Ahle about Normal Computing's efforts to automate chip design. The discussion moves from the practical challenges of building an open-source Verilog simulator to the theoretical hurdles of formal verification and the physics behind their thermodynamic computing hardware.
Building Low-Latency Voice-In, Visuals-Out AI Agents
This talk argues that "voice-in, visuals-out" is a more practical UX pattern than full voice-to-voice because it leverages the human brain's visual processing power while staying within a more forgiving latency budget. The speaker outlines three technical requirements for building these responsive agents: using low-latency models like Haiku, triggering frequent, eager inference turns rather than waiting for silence, and aggressive use of prefix caching to keep response times under one second.
Optimizing Voice-In, Visuals-Out AI Experiences
This talk outlines a practical strategy for building "voice-in, visuals-out" AI products by prioritizing latency over conversational perfection. The speaker argues that while voice-to-voice interfaces are currently hindered by extreme latency requirements, voice-to-visuals allows for a more forgiving one-second response window, which he achieves by using fast models like Claude Haiku, eager inference intervals, and aggressive prefix caching.
AI-Driven Multi-Document Correlation for Financial Compliance
This presentation outlines a conceptual framework for enterprise fraud detection that replaces isolated document analysis with a graph-based approach. Varsha Shah explains how combining entity correlation, probabilistic risk modeling, and data normalization can identify patterns across disparate financial systems, though the talk remains high-level and does not provide implementation code or specific tool stacks.
Cross-Document AI for Predictive Financial Compliance
This is a high-level architectural overview of a multi-component framework for enterprise fraud detection. The speaker explains how to move from isolated document validation to a connected system using graph-based entity correlation, probabilistic risk modeling, and cross-jurisdictional data normalization.
Building Full-Stack Apps with AI Sub-Agents
A demonstration of Google Antigravity using voice prompts to orchestrate AI sub-agents. The presenters walk through a workflow where the tool generates a full-stack note-taking app by delegating tasks like frontend development, backend Go/SQLite setup, and testing to specialized agents.