#open-source
Every summary, chronological. Filter by category, tag, or source from the rail.
AntAngelMed: 103B MoE Medical LLM Matches 40B Dense at 7x Speed
103B-param open-source medical LLM activates only 6.1B params via 1/32 MoE, rivals 40B dense models with 7x efficiency, tops HealthBench/MedBench, runs 200+ tps on H20.
TwELL Delivers 20% LLM Speedups via GPU-Optimized Sparsity
Use ReLU gate activation + L1=2e-5 on hidden activations to induce 99.5% sparsity in feedforward layers, then TwELL CUDA kernels yield 20.5% inference and 21.9% training speedups on H100s with no accuracy loss.
Replay Logs Fail Agents: Use VM Snapshots Instead
Replay durability constrains agent code with growing logs; split into context logs (DB durable) and execution snapshots (14MB Firecracker VMs, <1s save/100ms restore) for multi-day sessions.
AI EngineerHermes Desktop App Enables Easy Self-Evolving AI Agents
Hermes Agent runs 24/7 persistent, self-improving AI agents locally with long-term memory and closed learning loops; new Desktop App adds intuitive UI for setup, multi-agent management, and tools on Windows, macOS, Linux.
Rust CUDA Kernels via Direct PTX Compilation
cuda-oxide lets you write safe Rust SIMT GPU kernels that compile directly to PTX using a custom rustc backend, skipping C++ or DSLs—host/device in one .rs file, with cargo oxide build producing binary + .ptx.
Build Hermes AI Agent: VPS Setup to Scaled Automations
Follow this step-by-step guide to deploy Hermes Agent on a VPS, integrate Telegram, create skills/crons, backup to GitHub, and scale multiple agents for proactive AI assistance.
Trigger.dev: Async Infra Powers 90% AI Agents
Trigger.dev evolved from Zapier-for-devs background jobs to a reliable SDK for executing AI agents, hitting PMF with v3's hosted execution and checkpoint-resume primitives—perfectly timed for agent era, now 90% usage from agents.
Y CombinatorSymphony: Agents Autonomously Claim and Complete Tasks
OpenAI's Symphony uses issue trackers like Linear to let coding agents claim tasks, spin up isolated workspaces, and only ping humans for reviews—solving the 3-5 session supervision bottleneck. Install by prompting an agent with a 2000+ line spec to build it.
Spec-Kit: Specs-First AI Coding for Reliable Production Code
GitHub's open-source Spec-Kit (90k+ stars) uses Spec-Driven Development to ground AI agents in structured specs, generating testable code that matches intent—fixing 'vibe-coding' failures in prototypes turned production.
Anthropic Open-Sources Wall St Analyst Agents
Anthropic released 10 end-to-end Claude agents mimicking Goldman Sachs analyst roles, with prompts, checklists, 11 licensed data connectors, and 7 vertical bundles—democratizing workflows once locked behind $25k terminals and bank secrecy.
AI Summaries (evaluation playlist)Zig Rejects Bun's Fork Over LLM Policy and Flawed Speed Hack
Bun's Zig fork uses LLM for 4x faster debug builds via parallel analysis, but Zig rejects it for non-determinism risks and upstream incompatibility; Zig prioritizes careful engineering with LLVM bypass for true 40s-to-0.5s speedups.
TokenSpeed Beats TensorRT-LLM 9-11% on Agentic Coding Inference
TokenSpeed open-source engine optimizes agentic workloads with long contexts (>50K tokens) and multi-turn convos, delivering 9% lower latency and 11% higher throughput than TensorRT-LLM at 70-100 TPS/user on NVIDIA B200.
DeepSeek-TUI: Viral Open-Source Claude Code Rival
DeepSeek-TUI, a Rust-based terminal AI coding agent powered by DeepSeek V4's 1M-token context, hit 10k+ GitHub stars in days as a cheap, customizable alternative to Claude Code, built by a music/law student using AI-assisted coding.
OpenClaw's April Shift: Model-Swappable Agent Runtime
OpenClaw evolved from viral demo to durable agent runtime with task orchestration, mature memory, and channels—enabling workflows that swap models like Claude, Codex, or Gemma 4 to survive provider changes.
IBM Granite Speech 4.1: 3 ASR Models for Accuracy, Features, Speed
IBM's 2B Granite Speech 4.1 suite offers three trade-offs: base leads Open ASR Leaderboard (WER 5.33, RTF 231), Plus adds diarization/timestamps, NAR hits RTF 1820 on H100 via transcript editing.
637MB LLM Runs Offline on Base MacBook Air, Works Surprisingly Well
TinyLlama, a 637MB open-source LLM, runs instantly on a stock MacBook Air via Ollama—no internet, GPU, or API needed—handling Node.js servers and casual chats effectively, lowering the bar for useful local AI.
SIE: Dynamic Inference for Small Models on Shared GPUs
Open-source SIE engine from Superlinked enables hot-swapping small embedding models (e.g., Stella, ColBERT) on one GPU via LRU eviction, cutting costs and solving context rot in agents by preprocessing data.
Open Design: Free Open-Source Claude Design Clone
Open Design replicates Claude Design's AI-powered UI generation locally for free, using any model or CLI agent, with 31 skills and 72 design systems for production-ready landing pages, decks, and prototypes.
Self-Host Vane + Ollama for Private AI Web Research
Install Vane in Docker on Windows 11 with local Ollama and Qwen3.5:9b to run citation-backed searches privately, bypassing cloud services like OpenAI.
North Korea Hit Axios NPM Maintainer, Exposing 100M Downloads
OpenAI detected NK hackers, but they compromised Axios (100M weekly downloads) via fake job offer to maintainer Jason Saayman on Microsoft Teams—not OpenAI directly.
Symphony: Agents Autonomously Manage Tasks from Linear
OpenAI's Symphony spec lets Codex agents pull open tickets from Linear, work independently until completion, and self-file issues—boosting merged PRs 6x in 3 weeks by eliminating human micromanagement.
Tiny LLMs and On-Device Agents via LiteRT-LM on Edge Hardware
LiteRT-LM runs Gemma 2B/4B models at 1000+ tokens/sec on phones and delivers agent skills with function calling, while tiny 100-500M param models excel in fine-tuned in-app tasks like voice-to-action at 85-90% reliability.
AI EngineerHyperFrames Wins for AI Agents: 7s Setup vs Remotion's 50s
HyperFrames delivers 7-second time-to-first-video with zero build step and Apache 2.0 license, beating Remotion's 50s React-heavy setup—ideal for AI agents generating videos from HTML prompts without coding skills.
Open-Source AI Auto-Tags PDFs for Accessibility
OpenDataLoader delivers production-ready, open-source PDF auto-tagging via heuristic or hybrid AI modes, reconstructing structure for screen readers and AI pipelines without proprietary tools.
10 New OSS Tools to Supercharge Claude Code
Recent open-source tools for Claude Code deliver wins like 5% token savings via caveman brevity, 71.5x fewer tokens with Graphify graphs, local design cloning, video processing, and self-healing browsers—check repos for immediate productivity boosts.
Chase AIOpen Design: GUI Claude Design Clone Without Usage Limits
Open Design replicates Claude Design's graphical interface for AI-generated prototypes and slide decks, built on Huashu Design, integrates with any LLM CLI like Claude Code to bypass Anthropic usage restrictions, and includes 31 skills plus 72 pre-built design systems.
Chase AIQwen-Scope SAEs Unlock Actionable LLM Internals
Qwen-Scope's open SAEs on 7 Qwen models decompose activations into interpretable features for steering outputs, proxy benchmark analysis (ρ=0.85 correlation), toxicity classification (F1>0.90), and training fixes like 50% code-switching reduction.
SiYuan: Refactor Notes Like Code Without Broken Links
SiYuan uses permanent block IDs for unbreakable references and built-in SQL databases, letting developers organize technical notes like structured codebases locally, outperforming Obsidian's file links and Notion's cloud lock-in.
Better StackGemma Chat: Offline Vibe Coding with Gemma 4 on Mac
Gemma Chat runs Google's Gemma 4 locally on Apple Silicon Macs via MLX for private, offline app building with live previews, file editing, and agentic tools—no API keys or subscriptions needed.
Pi's Self-Modifying Agents: Power and Perils
Mario Zechner built Pi, a minimalist self-modifying AI coder powering OpenClaw. With Armin Ronacher, they praise its potential but warn against over-automation eroding code quality—human judgment remains key.
AI Summaries (evaluation playlist)Showing 30 of 169