#cloud
Every summary, chronological. Filter by category, tag, or source from the rail.
GPU-Orchestrated Multi-Agent Sustainability Intelligence Blueprint
Chelsie Czop and Mitesh Patel demo a serverless multi-agent app using Google ADK, Gemma 4 on NVIDIA RTX PRO 6000 GPUs via Cloud Run, and Milvus RAG for real-time environmental risk reports from satellite, telemetry, and policy data.
Google Cloud TechMRC: Resilient Networking for 100K+ GPU AI Training
OpenAI's MRC protocol uses multi-plane topologies and packet spraying across hundreds of paths with SRv6 source routing to eliminate congestion, route around failures in microseconds, and connect 131k GPUs with just two switch tiers, enabling non-stop frontier model training.
AWS KMS Envelope Encryption Secures Data at Scale
Encrypt data efficiently with AWS KMS envelope pattern: Use master keys to generate ephemeral AES-256 DEKs for fast local encryption/decryption, storing only encrypted DEKs alongside ciphertext for auditable, revocable access.
MRC: OpenAI's Protocol for Resilient AI Training Networks
OpenAI's MRC extends RoCE with multipath spraying, microsecond failure recovery via SRv6, and multi-plane designs to deliver predictable performance in 131k-GPU clusters, using 2/3 fewer optics and 3/5 fewer switches than traditional setups.
MRC Enables 100k+ GPU Clusters with Resilient Multipath Networking
OpenAI's MRC protocol spreads packets across hundreds of paths for microsecond failure recovery, connecting 100,000+ GPUs via just 2 switch tiers—cutting power, cost, and downtime in AI training supercomputers.
Anthropic Leases 220K SpaceX GPUs to Boost Claude Limits 10x
Anthropic secures SpaceX's full Colossus-1 cluster (220,000+ NVIDIA GPUs, 300MW) online in a month, driving Claude API rate limits from 30K to 10M input tokens/min for top tiers and eliminating peak throttling.
Ditch preferred_username for Azure AD Guest Auth
Using preferred_username as identity anchor worked for employees but failed silently for all B2B guests, causing 403 errors post-launch. Anchor on oid instead for reliable identification.
Secure AI Agents via MCP Toolbox Custom Tools
MCP Toolbox prevents confused deputy attacks by letting developers pre-write constrained SQL tools with bound parameters, separating agent flexibility from app-controlled security for runtime agents.
Google Cloud TechSageMaker Fine-Tuning: LoRA Beats QLoRA on Cost-Perf Balance
LoRA cuts trainable params by 96% vs full fine-tuning, balancing cost savings and accuracy on Llama2-7B/Mistral7B; QLoRA saves 8x memory but trains slower due to dequantization overhead.
Bigtable Scales Petabytes for Real-Time NoSQL Workloads
Bigtable auto-scales to hundreds of petabytes and millions of ops/sec with low latency, powering Google Search/YouTube/Maps; ideal for time series, ML features, and streaming via Flink/Kafka integrations.
Google Cloud TechScale PyTorch DDP Multi-Node on AWS EC2: Infra-First Guide
Multi-node DDP demands identical environments, data access, and open security groups across EC2 instances; use torchrun launcher with DDPManager for minimal code changes and reliable gradient sync via NCCL.
TPUs Dominate at Infrastructure Scale Over Per-Chip GPU Wins
Google's TPU v8t (training) and v8i (inference) lag Nvidia GPUs per chip but deliver superior performance at scale—9600-chip superpods hit 121 exaFLOPS FP4—via cube topology and Virgo networking, optimizing for AI's bandwidth-heavy workloads.
Next '26: Build Agents with ADK, Skills, and Gemini
Google Cloud Next '26 demos production multi-agent systems using open-source ADK for any language/model, modular skills for efficient context, and tools like MCP servers—open-sourced Race Condition repo for marathon planning.
Google Cloud TechBatch Size Unlocks 1000x LLM Inference Efficiency
Reiner Pope deduces frontier LLM training and serving mechanics from roofline analysis, revealing batch size as the core driver of latency-cost tradeoffs, with optimal batches of ~2000 tokens amortizing weights for massive gains.
Scaffold AI Agent Prod Infra in 60s with Google Starter Pack
Google's Agent Starter Pack CLI generates full production-ready AI agent stack—FastAPI backend, Terraform IaC, CI/CD, Vertex AI eval, observability—in 60 seconds, cutting typical 3-9 month infra setup to minutes across 6 templates.
DIY Smart CodeGemma 4 Prod Stack: Model Armor, ADK Agents, Tracing
Deploy secure, observable Gemma 4 agents on Cloud Run using load balancers for Model Armor integration, ADK for model-agnostic agents with vLLM, and Prometheus/Cloud Trace for metrics like GPU util and latency.
Google Cloud TechMount S3 Buckets as File Systems with AWS S3 Files
AWS S3 Files mounts buckets directly as file systems on EC2, containers, and Lambda—eliminating FUSE hacks and sync scripts for AI/ML workflows, but misconfigurations risk exposing, corrupting, or losing data.
Self-Host Gemma 4 on Cloud Run GPUs: Ollama vs vLLM
Deploy open Gemma 4 LLM on serverless Cloud Run GPUs two ways: Ollama bakes model into container for instant cold starts; vLLM mounts from GCS FUSE for model swaps without rebuilds. Full CI/CD via Cloud Build.
Zero Leak Debt: Kill 100+ Leaked Secrets Platform-Wide
Leaked secrets from 2022 still process payments as 'leak debt'; ruthlessly audit across local dev, CI/CD, and production to reach zero static secrets that never leak, expire unexpectedly, or need manual rotation.
Parasail Brokers GPUs for Cheap AI Inference at Scale
Parasail generates 500B tokens daily by renting global GPUs and dodging peaks, enabling devs to run open-model agents affordably as API costs from OpenAI/Anthropic rise.
Next '26 Sneak Peek: Agents, Demos, Hands-On AI Building
Google Cloud Next '26 spotlights production-ready AI agents via live demos, massive showcase floor with hack zones, and sessions on Gemini, ADK, generative UI—perfect for developers shipping autonomous apps.
Google Cloud TechKepler's 40-GPU Orbital Cluster Powers Edge AI in Space
Kepler Communications operates the largest orbital compute cluster with 40 Nvidia Orin processors across 10 satellites, enabling distributed edge inference for sensors—proving value before 2030s mega data centers arrive.
Anthropic Eyes Custom Chips Amid $30B Claude Surge
Anthropic explores in-house AI chips at early stage as Claude hits $30B annual run rate (up from $9B), securing 3.5GW TPU compute while custom silicon costs ~$500M.
Scaling TPUs on GKE for Massive AI Workloads
GKE treats TPU slices as atomic units for seamless scaling up to 9k+ chips, with flexible capacity like DWS Flex/Calendar and custom fallbacks for cost-efficient ML training/inference.
Google Cloud TechSelf-Host Archon v3 on Hetzner VPS with Docker
Provision Hetzner VPS, apply cloud-init YAML for auto-setup of Archon v3 with Caddy HTTPS reverse proxy, Postgres DB, then configure .env secrets and optional form auth for secure 24/7 access via subdomain.
Anthropic Tops $30B ARR as AI Hits Helium Wall
Anthropic overtakes OpenAI with 30x revenue growth to $30B ARR via top coding models, but Qatar's 34% helium cutoff doubles prices, bottlenecking AI datacenters.
Cut Snowflake Cortex Code Costs with Prompts and Limits
Precise prompts reduce token usage; monitor via ACCOUNT_USAGE tables, set alerts, and enforce per-user daily credit limits like 5 for Snowsight to prevent surprise bills.
Scale Stateless Backends by Broadcasting Client Updates
Horizontal scaling routes callbacks to replicas without client SSE/WebSocket connections, silently dropping updates—broadcast via Redis Pub/Sub so the owning replica delivers reliably.
Reliable Scraping Pipelines: Playwright + Bright Data + Kubernetes
Deploy Playwright scrapers reliably in production using Bright Data's remote Browser API and Kubernetes Jobs/CronJobs to handle browser startup, proxies, retries, and scheduling overlaps.
Claude Code Leak Reveals AI Supply Chain Perils
Leaked Claude Code source exposes npm vulnerabilities and AI agent risks in CI/CD, urging defenders to harden supply chains, rotate credentials rigorously, and test updates in labs amid brazen threat actor speed.
Showing 30 of 45