CATEGORY · 10 OF 11

DevOps & Cloud

Infrastructure that holds. Deployments, observability, cost discipline, and the platform decisions that determine how fast a small team can move.

45SUMMARIES
+5THIS WEEK
19SOURCES
Category · DevOps & Cloud
DAY 01Monday MAY 11 · 20262 SUMMARIES
OpenAI NewsDevOps & Cloud

MRC: Resilient Networking for 100K+ GPU AI Training

OpenAI's MRC protocol uses multi-plane topologies and packet spraying across hundreds of paths with SRv6 source routing to eliminate congestion, route around failures in microseconds, and connect 131k GPUs with just two switch tiers, enabling non-stop frontier model training.

OpenAI News
OpenAI NewsAI & LLMs

OpenAI's Codex Controls: Sandbox, Rules, Telemetry

OpenAI deploys Codex coding agents with sandboxing for bounded execution, auto-approvals for low-risk actions, network/command restrictions, and OpenTelemetry logs to enable safe, auditable developer workflows without broad access.

DAY 02Friday MAY 8 · 20261 SUMMARIES
Level Up CodingDevOps & Cloud

AWS KMS Envelope Encryption Secures Data at Scale

Encrypt data efficiently with AWS KMS envelope pattern: Use master keys to generate ephemeral AES-256 DEKs for fast local encryption/decryption, storing only encrypted DEKs alongside ciphertext for auditable, revocable access.

Level Up Coding
DAY 03Thursday MAY 7 · 20262 SUMMARIES
Data and BeyondDevOps & Cloud

AI Agents Expose IDP Flaws Built for Humans

Internal Developer Platforms (IDPs) assume human interpreters for ambiguities like unclear errors and tribal knowledge; AI agents fail because they execute exactly as interfaces allow, demanding explicit, machine-readable contracts to avoid disasters like deleting entire databases.

Data and Beyond
MarkTechPostDevOps & Cloud

MRC: OpenAI's Protocol for Resilient AI Training Networks

OpenAI's MRC extends RoCE with multipath spraying, microsecond failure recovery via SRv6, and multi-plane designs to deliver predictable performance in 131k-GPU clusters, using 2/3 fewer optics and 3/5 fewer switches than traditional setups.

DAY 04May 6, 2026 MAY 6 · 20262 SUMMARIES
Level Up CodingDevOps & Cloud

Manual Deployment Unlocks Foundry Hosted Agents

Deploy Foundry hosted agents by building container images in ACR, setting up Foundry Project with RBAC, creating via Azure SDK with env vars and resources (cpu=0.25, mem=0.5Gi), then assigning Azure AI User RBAC to Agent ID—avoids azd preview failures.

Level Up Coding
Google Cloud TechDevOps & Cloud

Migrate MongoDB to Firestore Serverless Seamlessly

Firestore's MongoDB-compatible API lets you reuse existing code, drivers, and aggregation pipelines on a serverless DB with real-time queries for AI agents and five-nines availability.

DAY 05May 5, 2026 MAY 5 · 20261 SUMMARIES
Python in Plain EnglishDevOps & Cloud

Replace Cron with Temporal for Reliable Data Jobs

Cron fails on retries, overlaps, and writes due to zero observability. Temporal workflows add retries (3s initial, 2x backoff, 8 max attempts), atomic writes, unique output files per run ID, SKIP overlap policy, and full execution history via UI—surviving crashes with state in Temporal.

Python in Plain English
DAY 06May 3, 2026 MAY 3 · 20261 SUMMARIES
IBM TechnologyDevOps & Cloud

Proactive Synthetic Monitoring Catches DevOps Failures Early

Simulate user actions like logins, searches, and API calls to detect regressions, availability issues, and performance degradation before production traffic, integrating tests into CI/CD for consistent validation.

IBM Technology
DAY 07May 1, 2026 MAY 1 · 20261 SUMMARIES
Vercel BlogDevOps & Cloud

Vercel Sandbox Firewall Enables Postgres Connections

Vercel Sandbox now supports outbound Postgres connections to hosted DBs like Neon and Supabase by detecting TLS upgrades during negotiation—no code changes required, just add DB host to allowed domains.

Vercel Blog
DAY 08April 30, 2026 APR 30 · 20262 SUMMARIES
Google Cloud TechDevOps & Cloud

Bigtable Scales Petabytes for Real-Time NoSQL Workloads

Bigtable auto-scales to hundreds of petabytes and millions of ops/sec with low latency, powering Google Search/YouTube/Maps; ideal for time series, ML features, and streaming via Flink/Kafka integrations.

Google Cloud Tech
Learning DataDevOps & Cloud

Scale PyTorch DDP Multi-Node on AWS EC2: Infra-First Guide

Multi-node DDP demands identical environments, data access, and open security groups across EC2 instances; use torchrun launcher with DDPManager for minimal code changes and reliable gradient sync via NCCL.

DAY 09April 29, 2026 APR 29 · 20261 SUMMARIES
Generative AIDevOps & Cloud

GitHub RCE via Single Git Push X-Stat Injection

Authenticated users exploited X-Stat field injection in GitHub's internal git protocol for RCE on GitHub.com and GHES using a standard git push, enabling access to millions of repos (CVE-2026-3854, High severity).

Generative AI
DAY 10April 19, 2026 APR 19 · 20261 SUMMARIES
DIY Smart CodeAI & LLMs

Scaffold AI Agent Prod Infra in 60s with Google Starter Pack

Google's Agent Starter Pack CLI generates full production-ready AI agent stack—FastAPI backend, Terraform IaC, CI/CD, Vertex AI eval, observability—in 60 seconds, cutting typical 3-9 month infra setup to minutes across 6 templates.

DIY Smart Code
DAY 11April 18, 2026 APR 18 · 20263 SUMMARIES
Google Cloud TechAI & LLMs

Gemma 4 Prod Stack: Model Armor, ADK Agents, Tracing

Deploy secure, observable Gemma 4 agents on Cloud Run using load balancers for Model Armor integration, ADK for model-agnostic agents with vLLM, and Prometheus/Cloud Trace for metrics like GPU util and latency.

Google Cloud Tech
Towards AIDevOps & Cloud

Mount S3 Buckets as File Systems with AWS S3 Files

AWS S3 Files mounts buckets directly as file systems on EC2, containers, and Lambda—eliminating FUSE hacks and sync scripts for AI/ML workflows, but misconfigurations risk exposing, corrupting, or losing data.

Google Cloud TechAI & LLMs

Self-Host Gemma 4 on Cloud Run GPUs: Ollama vs vLLM

Deploy open Gemma 4 LLM on serverless Cloud Run GPUs two ways: Ollama bakes model into container for instant cold starts; vLLM mounts from GCS FUSE for model swaps without rebuilds. Full CI/CD via Cloud Build.

DAY 12April 16, 2026 APR 16 · 20261 SUMMARIES
Google Cloud TechDevOps & Cloud

Scale 60M req/mo solo on Cloud Run for $180

Solo builder scales feature flag SaaS RocketFlag to 60M requests/month across regions using Go on Cloud Run, batch DB writes to Firestore/BigQuery, and Cloud Armor—total Dec bill $180 USD (252 AUD) with zero SRE time.

Google Cloud Tech
DAY 13April 15, 2026 APR 15 · 20261 SUMMARIES
Level Up CodingDevOps & Cloud

Zero Leak Debt: Kill 100+ Leaked Secrets Platform-Wide

Leaked secrets from 2022 still process payments as 'leak debt'; ruthlessly audit across local dev, CI/CD, and production to reach zero static secrets that never leak, expire unexpectedly, or need manual rotation.

Level Up Coding
DAY 14April 14, 2026 APR 14 · 20261 SUMMARIES
Better StackDevOps & Cloud

Zrok: Open-Source ngrok Fix for Secure Localhost Sharing

Zrok enables one-command sharing of localhost apps, files, TCP/UDP services publicly or privately via tokens—zero-trust on OpenZiti beats ngrok's limits, random URLs, and public exposure without port forwarding.

Better Stack
DAY 15April 13, 2026 APR 13 · 20261 SUMMARIES
TechCrunch AIDevOps & Cloud

Kepler's 40-GPU Orbital Cluster Powers Edge AI in Space

Kepler Communications operates the largest orbital compute cluster with 40 Nvidia Orin processors across 10 satellites, enabling distributed edge inference for sensors—proving value before 2030s mega data centers arrive.

TechCrunch AI
DAY 16April 11, 2026 APR 11 · 20261 SUMMARIES
Better StackDevOps & Cloud

Run S3-Compatible MinIO Locally to Cut Dev Costs

Deploy MinIO via Docker on your laptop for S3-compatible object storage using unchanged boto3 Python code, solving AWS S3 cost, latency, and lock-in issues for local dev and AI/RAG pipelines.

Better Stack
DAY 17April 9, 2026 APR 9 · 20262 SUMMARIES
Google Cloud TechDevOps & Cloud

Scaling TPUs on GKE for Massive AI Workloads

GKE treats TPU slices as atomic units for seamless scaling up to 9k+ chips, with flexible capacity like DWS Flex/Calendar and custom fallbacks for cost-efficient ML training/inference.

Google Cloud Tech
DIY Smart CodeDevOps & Cloud

Self-Host Archon v3 on Hetzner VPS with Docker

Provision Hetzner VPS, apply cloud-init YAML for auto-setup of Archon v3 with Caddy HTTPS reverse proxy, Postgres DB, then configure .env secrets and optional form auth for secure 24/7 access via subdomain.

DAY 18April 8, 2026 APR 8 · 20266 SUMMARIES
Towards AIDevOps & Cloud

Claude Flags for Reliable CCA CI/CD Pipelines

For CCA exam CI/CD, use -p, --bare, --output-format json flags on Claude Code for non-interactive runs; validate JSON outputs with schemas, add retry loops, and enable prompt caching to avoid hangs and control costs.

Towards AI
Towards AIDevOps & Cloud

Cut Snowflake Cortex Code Costs with Prompts and Limits

Precise prompts reduce token usage; monitor via ACCOUNT_USAGE tables, set alerts, and enforce per-user daily credit limits like 5 for Snowsight to prevent surprise bills.

Frontend CanteenDevOps & Cloud

Observability Essentials for Microservices Ops

Log per layer without sensitive data, trace with OpenTelemetry across 50+ services via W3C headers and tail sampling, use RED/USE metrics tied to user SLOs, and build actionable alerts, dashboards, and runbooks to debug tail latency and simulate failures.

Level Up CodingDevOps & Cloud

Scale Stateless Backends by Broadcasting Client Updates

Horizontal scaling routes callbacks to replicas without client SSE/WebSocket connections, silently dropping updates—broadcast via Redis Pub/Sub so the owning replica delivers reliably.

Level Up CodingDevOps & Cloud

Reliable Scraping Pipelines: Playwright + Bright Data + Kubernetes

Deploy Playwright scrapers reliably in production using Bright Data's remote Browser API and Kubernetes Jobs/CronJobs to handle browser startup, proxies, retries, and scheduling overlaps.

IBM TechnologyDevOps & Cloud

Claude Code Leak Reveals AI Supply Chain Perils

Leaked Claude Code source exposes npm vulnerabilities and AI agent risks in CI/CD, urging defenders to harden supply chains, rotate credentials rigorously, and test updates in labs amid brazen threat actor speed.

Showing 30 of 45