№ 06 / FIREHOSE

Firehose

The stream in tweet form. 78 of 78 posts · 30 at a time

Build Stateful Agents with File Systems & AI SDK v6
Hands-on coding workshop where Nico Albanese builds a tool-loop agent from scratch using Vercel AI SDK v6, adding web search, bash execution, a memories.md file for persistence, and named sandboxes for file-system state across sessions. Follows timestamps for setup through sub-agents; clone the demo repo mentioned in-video to code along.
GPU-Orchestrated Multi-Agent Sustainability Intelligence Blueprint
Livestream talk by Google Cloud PM Chelsie Czop and NVIDIA's Jay Rodge demoing a multi-agent sustainability app orchestrated with Agent Development Kit, running Gemma 4 on Cloud Run with RTX PRO 6000 GPUs, and using Milvus for policy retrieval, followed by audience Q&A on agent challenges.
RL Industrializes GenAI Production via Feedback Loops
Conference talk by Alessandro Cappelli, Adaptive ML co-founder, pitching reinforcement learning pipelines over prompting or fine-tuning for scaling GenAI agents to production at Fortune 500s like AT&T—covers mock environments, synthetic data from training, and LLM judges as rewards.
Malleable Evals: Adaptive Testing for Changing AI Agents
Vincent Koc's conference talk on why static benchmarks fail for adaptive AI agents like OpenClaw, pushing a shift to "malleable evals" where agents self-generate test suites from production traces to handle behavioral drift and edge cases.
Blankfein's Risk Playbook for Crises and Scaling Firms
a16z podcast interview with former Goldman Sachs CEO Lloyd Blankfein (@lloydblankfein), mostly on risk management, staying calm in crises, Goldman’s partnership culture, and his path from public housing to Wall Street—AI and tech get brief mentions late in the talk.
Night Shift: Agents Run Recurring Jobs Automatically
Explains a simple three-part pattern for scheduling AI agents on recurring business tasks—shared interface for status, short human reviews, automated skills—then demos two examples (SEO meta audits via custom API, PR reviews) from the creator's setup using open-source tools.
Shopify Shop's Big Design Bets: Vision, AI, Craft
Interview with Shopify Design Director Katarina Batina on Shop app projects like the low-density super feed, cart redesign, and merchant stores, plus balancing metrics/vision, avoiding over-engineering, and AI's impact on strategy.
Agent OS Makes AI Agents Reliable and Scalable
IBM engineer Bri Kopecki uses school principal and toddler analogies to pitch an "Agent OS" layer—handling scheduling, memory, tools, identity, observability, and guardrails—for making AI agents reliable at scale. General conceptual talk; check IBM's AI Agents page for details.
Embed Pi Coding Agents via CLI Tools in Products
Conference talk by Matthias Luebken demoing how to embed the Pi coding agent SDK (which powers OpenClaw) into a B2B sales product: RFP emails route to per-customer agent sessions that pull CRM/ERP data via CLIs and output draft responses. Core advice: simplify data access and tools to match agent capabilities rather than adding complexity.
Stitch: Google's Free AI for Stunning UIs, No Design Needed
Google DevRel David East demos Stitch, a Google Labs AI tool for prompting UI designs via a web app and Design.md file, building a full crabbing tour site live before extracting Tailwind/HTML and integrating via Gemini CLI/MCP server. Ends with quick hot takes on AI in frontend work.
Scaling AI Agents to Slack Company Coworkers
Conference talk by Fryderyk Wiatrowski, Viktor co-founder, on scaling their Slack-native AI agent from personal tools (web/email agents) to company-wide use—covering Slack's input complexities (threads, DMs, edits, reactions), cross-user memory isolation, and model swaps breaking perceived personality.
MLX: Frontier AI Fully On-Device on Apple Silicon
Prince Canuma's conference talk demos MLX on Apple Silicon for on-device AI: real-time vision description, speech-to-speech pipelines, multimodal omni models, local Gemma runs, video generation on 16GB VRAM, Turbo Quant for 1M context, plus community voice apps and robots—all motivated by offline accessibility needs.
Mobbin MCP Links 600k UI Screens to Claude/Codex for Pro Designs
Demo of connecting Mobbin's MCP (600k app screens) to Claude Code for prompts like banking dashboards, competitive benchmarking, reports, and mood boards; brief Codex demo shows similar setup but highlights access limits.
SaaS Copy Fixes: VOC Research + 5 Conversion Killers
MicroConf talk by copywriter Lianna Patch critiquing common SaaS homepage/email mistakes (like "we-we" focus and info overload), sharing real examples from Bench, Bitly, Balsamiq, and others, plus VOC research tips and formulas like PASS/U-PASS. Her free VOC Research Kit is linked in the description.
CSS Experts Google Basics, New Features Eat JS's Lunch
Casual live panel at Smashing Meets Style Sheets where host Geoff Graham chats with CSS experts Lyra, Juan, and Chris about routine lookups (like list-style-type or view transitions), the CSS-JS divide for styling vs. state, and hype for scroll-driven animations.
CSS In-N-Out: Animating display:none with 3-2-1 Pattern
Chris Coyier conference talk demoing CSS animations for elements entering/exiting view (e.g., <dialog> modals), covering `transition-behavior: allow-discrete` on `display` and `@starting-style` for opacity fades.
Replay Logs Fail Agents: Use VM Snapshots Instead
Conference talk by Trigger.dev CEO Eric Allam contrasting replay journaling (step-wise LLM/tool logs for recovery) with OS-level snapshots on Firecracker microVMs for preserving agent execution state like files and subprocesses, plus DB logs for context. Includes backend history and recovery demos.
Fix Agent Context with Head/Tail + Memory, Not Summaries
Conference talk by Sally-Ann Delucia sharing Arize's year of context management lessons from building agent Alyx on trace data: why truncation/summarization failed, their head/tail truncation + retrievable memory fix, long-session evals, sub-agents for overload, and Claude Code source insights.
Close Playground-to-Production Gap with Feedback Loops
Mehedi Hassan's conference talk on scaling Granola's AI meeting chat beyond one-shot prototypes: pitfalls like web search token bloat/costs/provider drift, single-prompt limits across user roles, and fixes via custom tracing UI for tool/reasoning/cost visibility plus Electron refactor for web PR previews and Cursor auto-testing.
Agentic Consent: Dynamic Permissions for Safe AI Agents
IBM's Grant Miller defines "agentic consent" as dynamic, identity-verified permissions for AI agents, covering granular controls, time-bound access, just-in-time human prompts, and compliance needs like transparency and revocability. Learn about their AI governance approach.
Codex Chrome Extension Bridges Code to Real Browser Workflows
Narrator walks through Codex's new Chrome extension for signed-in browser automation (e.g., Gmail, Salesforce) plus CLI upgrades in v0.128/0.129 like Vim editing, better permissions, plugin sharing, hooks, and persisted goals—mostly explaining release notes.
Hermes Desktop App Enables Easy Self-Evolving AI Agents
Quick demo and setup walkthrough for the Hermes Desktop App, a native UI wrapper for Nous Research's Hermes Agent that simplifies local multi-agent management, tool integration, and persistent memory on Windows/Mac/Linux. Brief OpenClaw comparison and basic usage examples included.
Codex /goal Beats Claude Code for Autonomous Coding
Tutorial on enabling Codex's experimental `/goal` slash command (via config tweak) for long-running autonomous coding, contrasting it with ReAct loops, followed by a hands-off demo building a 2D arcade game "Rift Salvage" from a detailed plan—includes plugs for the creator's Claude Code & Codex course, free community, and consults.
Build Hermes AI Agent: VPS Setup to Scaled Automations
A hands-on 1-hour screen-share tutorial deploying open-source Hermes Agent to a Hostinger VPS, connecting it to Telegram, adding your first skill and cron job, GitHub backup, plus comparisons to Claude Code/OpenClaw and scaling tips.
Pomelli Catalog Scales On-Brand Ads from Product Sites
Hands-on demo of Google's Pomelli (Google Labs experiment), walking through brand setup from a website URL, auto-pulling products into a new Catalog feature, generating AI product photos, and building/downloading social campaigns for a jewelry shop example.
HTML Beats Markdown for AI Specs at 2-4x Token Cost
Reaction to Thariq Shihipar's thesis on using self-contained HTML files (with SVG, tables, JS) over Markdown for Claude Code specs, plans, and reports—covers 2-4x token cost (offset by 1M context), five use cases, and why Claude Code's filesystem/MCP access shines, with examples.
4-Step Audit Catches AI's 'Almost Right' Errors
Outlines a four-step AI workflow to catch subtle errors in high-stakes outputs like contracts or due diligence: finish the draft, extract claims into a table, validate against sources (supported/conflicts/no proof/needs human judgment), rewrite. Copy-paste prompts are in the presentation.
Archon Fixes AI Agent Randomness with Harness Engineering
Live demo of Archon (repo), a local framework for AI coding agents that uses YAML DAG workflows, git worktrees for parallel runs, and auto-loading skills to produce consistent PRs without repo conflicts. Covers setup on M4 Pro, transparency features, and tradeoffs like upfront workflow design.
TTS Converges on LLM-Style Autoregressive Audio Token Generation
Mistral AI scientist Samuel Humeau explains TTS architecture trends toward LLM-style autoregressive transformers that generate audio frames via neural codecs for compression, with live demos of their new open-weight model for voice cloning and low-latency agents, plus open challenges like streaming LLM text inputs.
AI Agents Need Scaffolding: Prompts to Plugins Guide
Breakdown of "agentic scaffolding" around LLMs in Codex: prompts for one-offs, skills for reusable "house styles," plugins for installable workflows, plus MCPs/connectors and hooks/scripts. Links to a Substack guide with decision trees and examples.