#machine-learning
Every summary, chronological. Filter by category, tag, or source from the rail.
AutoScientist Co-Optimizes Data and Models to Double Fine-Tuning Wins
Adaption's AutoScientist automates fine-tuning by jointly optimizing datasets and models for any capability, doubling win-rates and enabling frontier AI training outside big labs—free for 30 days.
Parameter Golf: Creativity in Tiny ML Models
OpenAI's 16MB/10-min ML challenge drew 1,000+ participants and 2,000+ submissions, showcasing optimizations, quantization, novel architectures, and AI agents' role in accelerating research while creating review challenges.
Interaction Models: Native Real-Time Multimodal AI
Replace turn-based AI harnesses with native interaction models using 200ms micro-turns for continuous audio/video/text processing, enabling proactive visuals and simultaneous speech—outperforming GPT/Gemini on interaction benchmarks.
AntAngelMed: 103B MoE Medical LLM Matches 40B Dense at 7x Speed
103B-param open-source medical LLM activates only 6.1B params via 1/32 MoE, rivals 40B dense models with 7x efficiency, tops HealthBench/MedBench, runs 200+ tps on H20.
RL Industrializes GenAI Production via Feedback Loops
95% of GenAI pilots fail production because instruction tuning and prompts can't systematically integrate defects and metrics. RL does, enabling smaller/cheaper/faster models that scale to millions in token costs at Fortune 500s like AT&T.
Aurora Fixes Muon's Neuron Death in Tall MLPs
Aurora optimizer eliminates >25% neuron death in Muon's tall matrices by jointly enforcing left semi-orthogonality and uniform row norms √(n/m), delivering SOTA on nanoGPT speedrun with 6% compute overhead.
skfolio: Build & Tune Portfolio Optimizers in Python
skfolio's scikit-learn API lets you construct, validate, and compare 18+ portfolio strategies—from baselines to HRP, Black-Litterman, factors, and tuned models—on S&P 500 returns with walk-forward CV and GridSearchCV.
LLM Distillation: Soft, Hard, and Co Techniques Explained
Distill large teacher LLMs into efficient students via soft-label (match probabilities for dark knowledge), hard-label (imitate outputs for cheap scalability), or co-distillation (joint training to minimize performance gaps).
BLT Cuts Inference Bandwidth 50-92% via Diffusion & Speculation
Meta/Stanford researchers accelerate Byte Latent Transformer (BLT) inference with BLT-D (diffusion decoding), BLT-S (self-speculation), and BLT-DV (diffusion+verification), reducing memory bandwidth 50-92% at 3B params while nearing baseline performance on translation/coding tasks.
MRC: Resilient Networking for 100K+ GPU AI Training
OpenAI's MRC protocol uses multi-plane topologies and packet spraying across hundreds of paths with SRv6 source routing to eliminate congestion, route around failures in microseconds, and connect 131k GPUs with just two switch tiers, enabling non-stop frontier model training.
TwELL Delivers 20% LLM Speedups via GPU-Optimized Sparsity
Use ReLU gate activation + L1=2e-5 on hidden activations to induce 99.5% sparsity in feedforward layers, then TwELL CUDA kernels yield 20.5% inference and 21.9% training speedups on H100s with no accuracy loss.
2026 Vector DBs: Match Scale, Cost, Stack for RAG Success
Leverage existing Postgres/Mongo with pgvector (millions vectors, free) or Atlas ($30/mo max Flex) to avoid sprawl; self-host Qdrant ($30-50/mo for 50M vectors) for perf; Pinecone ($20/mo) or Milvus (100B+) for managed scale.
SFT + RL Recovers Sandbagged AI Capabilities Using Weak Supervisors
Combine Supervised Fine-Tuning (SFT) then Reinforcement Learning (RL) with weak supervisors like GPT-4o-mini or Llama 3.1-8B to recover 88-99% of sandbagged model performance across math, science, and coding tasks—but training and deployment must be indistinguishable.
Reproduce 2011 Sentiment Word Vectors in Python
Build sentiment-aware word embeddings from IMDb reviews via semantic learning with star ratings and linear SVM classification, reproducing Maas et al. (2011) – simple method rivals modern LLMs.
Star Elastic: Pack 30B/23B/12B Models in One Checkpoint
NVIDIA's Star Elastic embeds nested 30B (3.6B active), 23B (2.8B), and 12B (2.0B) reasoning models in a single checkpoint via importance-ranked weight-sharing, slashing training costs 360x and enabling phase-specific sizing for 16% accuracy gains at 1.9x lower latency.
NVIDIA Halves DSA Top-K Time via Decode Stability
NVIDIA exploits autoregressive decoding's temporal stability—similar queries and gradually evolving scores—to cut DeepSeek Sparse Attention's Top-K bottleneck by half using Guess-Verify-Refine.
Scanpy Pipeline for PBMC scRNA-seq Clustering & Trajectories
Process PBMC-3k data with Scanpy: filter cells (min 200 genes, <2500 genes, <5% mt), remove Scrublet doublets, select HVGs (min_mean=0.0125, max_mean=3, min_disp=0.5), Leiden cluster at res=0.5, annotate via markers, infer PAGA/DPT trajectories, score IFN response.
Sovereign AI Grounds Robotics in Physics for 1.1M States/Sec
Sovereign AI uses JEPA with physics anchors on JAX/TPU v6 to process 1.1M states/sec at 0.894ms latency, detecting failures 4.7x better via energy patterns, with Gemini 3.1 Pro generating auditable reports and recovery plans.
NMI Bias Favors Complex Clusters Over Insight
Normalized Mutual Information (NMI) rewards over-segmentation and complexity in clustering, inflating scores for intuitively poor algorithms and distorting AI evaluations.
Balance Linear Simplicity and Nonlinear Flexibility to Avoid Fit Failures
Linear models underfit nonlinear data with rigid straight boundaries; nonlinear models overfit by memorizing noise with wiggly curves. Fix via bias-variance tradeoff for optimal generalization.
Time Series Fundamentals Before Modeling
Time series data depends on order—avoid shuffling or random splits. Decompose into trend, seasonality, cycles, noise; ensure stationarity (constant mean/variance/autocovariance) via differencing, logs, detrending; diagnose with ACF/PACF for AR/MA patterns.
Teach AI Values' Why Before What for Stronger Alignment
Model Spec Midtraining (MSM)—exposing models to value explanations before behavior fine-tuning—slashes agentic misalignment from 54-68% to 5-7% using 10-60x less data than alternatives.
MRC: OpenAI's Protocol for Resilient AI Training Networks
OpenAI's MRC extends RoCE with multipath spraying, microsecond failure recovery via SRv6, and multi-plane designs to deliver predictable performance in 131k-GPU clusters, using 2/3 fewer optics and 3/5 fewer switches than traditional setups.
Neuro-Symbolic AI Pairs Neural Patterns with Logic for Explainability
Neural networks excel at patterns but lack reasoning; neuro-symbolic AI combines them with symbolic logic for auditable decisions, driven by 2026 regulations, Tufts' 95% robotics success (vs 34%), and production at JPMorgan/EY.
Triple YOLO Recall with Adaptive Post-Processing
In crowded scenes, set YOLO confidence to 0.05, then filter dynamically by frame score distribution, box size (lower threshold for <5% height boxes), and pose keypoints (nose + shoulders) to detect 3x more people without retraining.
Build CLIP: 400M Images, Zero Labels via Contrastive Learning
CLIP trains vision models on 400 million scraped image-text pairs using a single contrastive objective—no manual labels needed—matching ResNet-101 zero-shot on ImageNet and powering DALL-E 2, Stable Diffusion, LLaVA.
MRC Enables 100k+ GPU Clusters with Resilient Multipath Networking
OpenAI's MRC protocol spreads packets across hundreds of paths for microsecond failure recovery, connecting 100,000+ GPUs via just 2 switch tiers—cutting power, cost, and downtime in AI training supercomputers.
Gemma 4 MTP Drafters: 3x Faster Inference, No Quality Loss
Pair Gemma 4 with lightweight MTP drafters using speculative decoding to generate up to 3x more tokens per pass by drafting sequences and verifying in parallel, sharing KV cache for efficiency without altering outputs.
Generative AI: Prediction to Creation via Scale
Generative AI shifts machines from analyzing data (traditional AI's strength) to creating new content like text or images, powered by Markov chains, deep learning, and massive datasets/compute yielding $33.9B investment in 2024.
GPU Bandwidth Limits LLM Speed, Not FLOPS
Generating one token from a 70B model on H100 needs 140GB weight reads—one op per byte—making memory bandwidth the inference bottleneck, not compute throughput.
Showing 30 of 154