Summaries · #cloud

DAY 01Yesterday MAY 12 · 20261 SUMMARIES

Google Cloud TechAI & LLMsMay 12, 2026

GPU-Orchestrated Multi-Agent Sustainability Intelligence Blueprint

Chelsie Czop and Mitesh Patel demo a serverless multi-agent app using Google ADK, Gemma 4 on NVIDIA RTX PRO 6000 GPUs via Cloud Run, and Milvus RAG for real-time environmental risk reports from satellite, telemetry, and policy data.

Google Cloud Tech

DAY 02Monday MAY 11 · 20261 SUMMARIES

OpenAI NewsDevOps & CloudMay 11, 2026

MRC: Resilient Networking for 100K+ GPU AI Training

OpenAI's MRC protocol uses multi-plane topologies and packet spraying across hundreds of paths with SRv6 source routing to eliminate congestion, route around failures in microseconds, and connect 131k GPUs with just two switch tiers, enabling non-stop frontier model training.

OpenAI News

DAY 03Friday MAY 8 · 20261 SUMMARIES

Level Up CodingDevOps & CloudMay 8, 2026

AWS KMS Envelope Encryption Secures Data at Scale

Encrypt data efficiently with AWS KMS envelope pattern: Use master keys to generate ephemeral AES-256 DEKs for fast local encryption/decryption, storing only encrypted DEKs alongside ciphertext for auditable, revocable access.

Level Up Coding

DAY 04Thursday MAY 7 · 20261 SUMMARIES

MarkTechPostDevOps & CloudMay 7, 2026

MRC: OpenAI's Protocol for Resilient AI Training Networks

OpenAI's MRC extends RoCE with multipath spraying, microsecond failure recovery via SRv6, and multi-plane designs to deliver predictable performance in 131k-GPU clusters, using 2/3 fewer optics and 3/5 fewer switches than traditional setups.

MarkTechPost

DAY 05May 6, 2026 MAY 6 · 20263 SUMMARIES

The DecoderAI News & TrendsMay 6, 2026

MRC Enables 100k+ GPU Clusters with Resilient Multipath Networking

OpenAI's MRC protocol spreads packets across hundreds of paths for microsecond failure recovery, connecting 100,000+ GPUs via just 2 switch tiers—cutting power, cost, and downtime in AI training supercomputers.

The Decoder

The DecoderAI News & TrendsMay 6, 2026

Anthropic Leases 220K SpaceX GPUs to Boost Claude Limits 10x

Anthropic secures SpaceX's full Colossus-1 cluster (220,000+ NVIDIA GPUs, 300MW) online in a month, driving Claude API rate limits from 30K to 10M input tokens/min for top tiers and eliminating peak throttling.

Level Up CodingSoftware EngineeringMay 6, 2026

Ditch preferred_username for Azure AD Guest Auth

Using preferred_username as identity anchor worked for employees but failed silently for all B2B guests, causing 403 errors post-launch. Anchor on oid instead for reliable identification.

DAY 06May 5, 2026 MAY 5 · 20261 SUMMARIES

Google Cloud TechAI & LLMsMay 5, 2026

Secure AI Agents via MCP Toolbox Custom Tools

MCP Toolbox prevents confused deputy attacks by letting developers pre-write constrained SQL tools with bound parameters, separating agent flexibility from app-controlled security for runtime agents.

Google Cloud Tech

DAY 07May 3, 2026 MAY 3 · 20261 SUMMARIES

Towards AIAI & LLMsMay 3, 2026

SageMaker Fine-Tuning: LoRA Beats QLoRA on Cost-Perf Balance

LoRA cuts trainable params by 96% vs full fine-tuning, balancing cost savings and accuracy on Llama2-7B/Mistral7B; QLoRA saves 8x memory but trains slower due to dequantization overhead.

Towards AI

DAY 08April 30, 2026 APR 30 · 20263 SUMMARIES

Google Cloud TechDevOps & CloudApr 30, 2026

Bigtable Scales Petabytes for Real-Time NoSQL Workloads

Bigtable auto-scales to hundreds of petabytes and millions of ops/sec with low latency, powering Google Search/YouTube/Maps; ideal for time series, ML features, and streaming via Flink/Kafka integrations.

Google Cloud Tech

Learning DataDevOps & CloudApr 30, 2026

Scale PyTorch DDP Multi-Node on AWS EC2: Infra-First Guide

Multi-node DDP demands identical environments, data access, and open security groups across EC2 instances; use torchrun launcher with DDPManager for minimal code changes and reliable gradient sync via NCCL.

Caleb Writes CodeAI News & TrendsApr 30, 2026

TPUs Dominate at Infrastructure Scale Over Per-Chip GPU Wins

Google's TPU v8t (training) and v8i (inference) lag Nvidia GPUs per chip but deliver superior performance at scale—9600-chip superpods hit 121 exaFLOPS FP4—via cube topology and Virgo networking, optimizing for AI's bandwidth-heavy workloads.

DAY 09April 29, 2026 APR 29 · 20262 SUMMARIES

Google Cloud TechAI & LLMsApr 29, 2026

Next '26: Build Agents with ADK, Skills, and Gemini

Google Cloud Next '26 demos production multi-agent systems using open-source ADK for any language/model, modular skills for efficient context, and tools like MCP servers—open-sourced Race Condition repo for marathon planning.

Google Cloud Tech

Dwarkesh PatelApr 29, 2026

Batch Size Unlocks 1000x LLM Inference Efficiency

Reiner Pope deduces frontier LLM training and serving mechanics from roofline analysis, revealing batch size as the core driver of latency-cost tradeoffs, with optimal batches of ~2000 tokens amortizing weights for massive gains.

DAY 10April 19, 2026 APR 19 · 20261 SUMMARIES

DIY Smart CodeAI & LLMsApr 19, 2026

Scaffold AI Agent Prod Infra in 60s with Google Starter Pack

Google's Agent Starter Pack CLI generates full production-ready AI agent stack—FastAPI backend, Terraform IaC, CI/CD, Vertex AI eval, observability—in 60 seconds, cutting typical 3-9 month infra setup to minutes across 6 templates.

DIY Smart Code

DAY 11April 18, 2026 APR 18 · 20263 SUMMARIES

Google Cloud TechAI & LLMsApr 18, 2026

Gemma 4 Prod Stack: Model Armor, ADK Agents, Tracing

Deploy secure, observable Gemma 4 agents on Cloud Run using load balancers for Model Armor integration, ADK for model-agnostic agents with vLLM, and Prometheus/Cloud Trace for metrics like GPU util and latency.

Google Cloud Tech

Towards AIDevOps & CloudApr 18, 2026

Mount S3 Buckets as File Systems with AWS S3 Files

AWS S3 Files mounts buckets directly as file systems on EC2, containers, and Lambda—eliminating FUSE hacks and sync scripts for AI/ML workflows, but misconfigurations risk exposing, corrupting, or losing data.

Google Cloud TechAI & LLMsApr 18, 2026

Self-Host Gemma 4 on Cloud Run GPUs: Ollama vs vLLM

Deploy open Gemma 4 LLM on serverless Cloud Run GPUs two ways: Ollama bakes model into container for instant cold starts; vLLM mounts from GCS FUSE for model swaps without rebuilds. Full CI/CD via Cloud Build.

DAY 12April 15, 2026 APR 15 · 20262 SUMMARIES

Level Up CodingDevOps & CloudApr 15, 2026

Zero Leak Debt: Kill 100+ Leaked Secrets Platform-Wide

Leaked secrets from 2022 still process payments as 'leak debt'; ruthlessly audit across local dev, CI/CD, and production to reach zero static secrets that never leak, expire unexpectedly, or need manual rotation.

Level Up Coding

TechCrunch AIAI & LLMsApr 15, 2026

Parasail Brokers GPUs for Cheap AI Inference at Scale

Parasail generates 500B tokens daily by renting global GPUs and dodging peaks, enabling devs to run open-model agents affordably as API costs from OpenAI/Anthropic rise.

DAY 13April 14, 2026 APR 14 · 20261 SUMMARIES

Google Cloud TechApr 14, 2026

Next '26 Sneak Peek: Agents, Demos, Hands-On AI Building

Google Cloud Next '26 spotlights production-ready AI agents via live demos, massive showcase floor with hack zones, and sessions on Gemini, ADK, generative UI—perfect for developers shipping autonomous apps.

Google Cloud Tech

DAY 14April 13, 2026 APR 13 · 20261 SUMMARIES

TechCrunch AIDevOps & CloudApr 13, 2026

Kepler's 40-GPU Orbital Cluster Powers Edge AI in Space

Kepler Communications operates the largest orbital compute cluster with 40 Nvidia Orin processors across 10 satellites, enabling distributed edge inference for sensors—proving value before 2030s mega data centers arrive.

TechCrunch AI

DAY 15April 10, 2026 APR 10 · 20261 SUMMARIES

__oneoff__AI News & TrendsApr 10, 2026

Anthropic Eyes Custom Chips Amid $30B Claude Surge

Anthropic explores in-house AI chips at early stage as Claude hits $30B annual run rate (up from $9B), securing 3.5GW TPU compute while custom silicon costs ~$500M.

__oneoff__

DAY 16April 9, 2026 APR 9 · 20262 SUMMARIES

Google Cloud TechDevOps & CloudApr 9, 2026

Scaling TPUs on GKE for Massive AI Workloads

GKE treats TPU slices as atomic units for seamless scaling up to 9k+ chips, with flexible capacity like DWS Flex/Calendar and custom fallbacks for cost-efficient ML training/inference.

Google Cloud Tech

DIY Smart CodeDevOps & CloudApr 9, 2026

Self-Host Archon v3 on Hetzner VPS with Docker

Provision Hetzner VPS, apply cloud-init YAML for auto-setup of Archon v3 with Caddy HTTPS reverse proxy, Postgres DB, then configure .env secrets and optional form auth for secure 24/7 access via subdomain.

DAY 17April 8, 2026 APR 8 · 20265 SUMMARIES

AI SupremacyAI News & TrendsApr 8, 2026

Anthropic Tops $30B ARR as AI Hits Helium Wall

Anthropic overtakes OpenAI with 30x revenue growth to $30B ARR via top coding models, but Qatar's 34% helium cutoff doubles prices, bottlenecking AI datacenters.

AI Supremacy

Towards AIDevOps & CloudApr 8, 2026

Cut Snowflake Cortex Code Costs with Prompts and Limits

Precise prompts reduce token usage; monitor via ACCOUNT_USAGE tables, set alerts, and enforce per-user daily credit limits like 5 for Snowsight to prevent surprise bills.

Level Up CodingDevOps & CloudApr 8, 2026

Scale Stateless Backends by Broadcasting Client Updates

Horizontal scaling routes callbacks to replicas without client SSE/WebSocket connections, silently dropping updates—broadcast via Redis Pub/Sub so the owning replica delivers reliably.

Level Up CodingDevOps & CloudApr 8, 2026

Reliable Scraping Pipelines: Playwright + Bright Data + Kubernetes

Deploy Playwright scrapers reliably in production using Bright Data's remote Browser API and Kubernetes Jobs/CronJobs to handle browser startup, proxies, retries, and scheduling overlaps.

IBM TechnologyDevOps & CloudApr 8, 2026

Claude Code Leak Reveals AI Supply Chain Perils

Leaked Claude Code source exposes npm vulnerabilities and AI agent risks in CI/CD, urging defenders to harden supply chains, rotate credentials rigorously, and test updates in labs amid brazen threat actor speed.