№ 02 / SUMMARIES

#deep-learning

Every summary, chronological. Filter by category, tag, or source from the rail.

Tag · #deep-learning
DAY 01Yesterday MAY 12 · 20261 SUMMARIES
MarkTechPostAI & LLMs

Aurora Fixes Muon's Neuron Death in Tall MLPs

Aurora optimizer eliminates >25% neuron death in Muon's tall matrices by jointly enforcing left semi-orthogonality and uniform row norms √(n/m), delivering SOTA on nanoGPT speedrun with 6% compute overhead.

MarkTechPost
DAY 02Saturday MAY 9 · 20261 SUMMARIES
Towards AI

NVIDIA Halves DSA Top-K Time via Decode Stability

NVIDIA exploits autoregressive decoding's temporal stability—similar queries and gradually evolving scores—to cut DeepSeek Sparse Attention's Top-K bottleneck by half using Guess-Verify-Refine.

Towards AI
DAY 03Thursday MAY 7 · 20262 SUMMARIES
Towards AIData Science & Visualization

Triple YOLO Recall with Adaptive Post-Processing

In crowded scenes, set YOLO confidence to 0.05, then filter dynamically by frame score distribution, box size (lower threshold for <5% height boxes), and pose keypoints (nose + shoulders) to detect 3x more people without retraining.

Towards AI
Towards AI

Build CLIP: 400M Images, Zero Labels via Contrastive Learning

CLIP trains vision models on 400 million scraped image-text pairs using a single contrastive objective—no manual labels needed—matching ResNet-101 zero-shot on ImageNet and powering DALL-E 2, Stable Diffusion, LLaVA.

DAY 04May 6, 2026 MAY 6 · 20262 SUMMARIES
Generative AI

Generative AI: Prediction to Creation via Scale

Generative AI shifts machines from analyzing data (traditional AI's strength) to creating new content like text or images, powered by Markov chains, deep learning, and massive datasets/compute yielding $33.9B investment in 2024.

Generative AI
Towards AIAI & LLMs

GPU Bandwidth Limits LLM Speed, Not FLOPS

Generating one token from a 70B model on H100 needs 140GB weight reads—one op per byte—making memory bandwidth the inference bottleneck, not compute throughput.

DAY 05April 28, 2026 APR 28 · 20261 SUMMARIES
Caleb Writes Code

Diffusion: Data-Efficient Framework Outshining Autoregressives on Scarce Data

Diffusion is a training framework—not architecture—that creates extra samples by gradually noising clean data over 1,000 steps, outperforming autoregressives on 25-100M tokens where data is limited but compute abundant; lags in text due to slow inference and infrastructure.

Caleb Writes Code
DAY 06April 21, 2026 APR 21 · 20262 SUMMARIES
Towards AI

PCL: Confidence RL for Dynamic LLM Environments

PCL algorithm integrates predictive confidence scores into LLM RL rewards via ensembles and blended token/sequence signals, enabling adaptation to nonstationary changes without retraining.

Towards AI
Generative AIAI & LLMs

Sentences Define Word Meanings via Self-Attention

Transformers ended 30 years of sequential processing flaws by using self-attention, where every word weighs relevance from the entire sentence context, powering GPT and all modern LLMs.

DAY 07April 20, 2026 APR 20 · 20262 SUMMARIES
Caleb Writes CodeAI & LLMs

LLM Inference: mmap Loading & Quantization Deep Dive

Efficient LLM inference hinges on mmap for lazy memory loading (e.g., <10s startup on llama.cpp) and quantization like GGUF K-Quants or AWQ/EXL2 to shrink 15GB models while preserving quality via salient weights and mixed precision.

Caleb Writes Code
Level Up CodingData Science & Visualization

Preprocessing Swings CNN Accuracy from 65% to 87% on CIFAR-10

Raw CIFAR-10 pixels yield 65% test accuracy; normalization/standardization lift to 69%; geometric augmentation maintains ~67%; photometric brightness/contrast crashes to 20%; combined pipeline with deeper CNN hits 87%.

DAY 08April 17, 2026 APR 17 · 20261 SUMMARIES
AI Simplified in Plain EnglishAI & LLMs

53x AI Efficiency via Model Distillation by 2025

Train small 'student' models on large 'teacher' models' soft probabilities—not just labels—to match performance while slashing size, speed, and costs by 53x by 2025.

AI Simplified in Plain English
DAY 09April 16, 2026 APR 16 · 20261 SUMMARIES
MarkTechPostAI & LLMs

Parcae Stabilizes Loops to Match 2x Transformer Quality

Parcae enforces looped transformer stability via negative diagonal matrices in a dynamical system, outperforming baselines and achieving 87.5% of a twice-sized Transformer's quality at half parameters.

MarkTechPost
DAY 10April 13, 2026 APR 13 · 20261 SUMMARIES
MarkTechPostData Science & Visualization

Build FNO & PINN Surrogates for Darcy Flow with PhysicsNeMo

Step-by-step Colab guide: generate 2D Darcy datasets via GRF & finite differences, implement/train FNO operators and PINNs, add CNN baselines, benchmark inference speeds for fast physics surrogates.

MarkTechPost
DAY 11April 8, 2026 APR 8 · 20268 SUMMARIES
Towards AI

Word2Vec: Turning Word Neighborhoods into Embeddings

Word2Vec learns dense word vectors by predicting local contexts with CBOW or Skip-gram, clustering similar words like 'cat' and 'dog' via repeated gradient updates from shared neighborhoods.

Towards AI
Andrej Karpathy GistsSoftware Engineering

Batch GEMMs for Fast LSTM in Torch

Fuse LSTM operations into nngraph module to batch 4 GEMMs, slashing overhead vs standard nn.LSTM (optimized by @jcjohnson).

Andrej Karpathy GistsSoftware Engineering

Batched L2 Norm Layer for Torch Neural Nets

Custom Torch nn.Module normalizes each row of n x d input tensor to unit L2 norm, with efficient batched forward/backward passes for training.

Andrej Karpathy GistsData Science & Visualization

Minimal NumPy RNN for Char-Level Text Gen

Build a vanilla RNN language model from scratch in ~170 lines of NumPy: processes text chunks of 25 chars, trains with BPTT and Adagrad, generates samples after 100 iterations.

Andrej Karpathy GistsData Science & Visualization

NumPy Batched LSTM Forward/Backward

Efficient pure NumPy LSTM processes batched sequences (n,b,input_size); init with Xavier + forget bias=3; verified via sequential match and numerical gradients.

Andrej Karpathy GistsSoftware Engineering

Policy Gradients for Pong: 100-Line RL Agent

Train a 2-layer NN to play Atari Pong from raw pixels using REINFORCE policy gradients. Uses 80x80 binary diff frames, discounts rewards with gamma=0.99, standardizes advantages, RMSProp updates every 10 episodes. Converges on CPU in hours.

Andrej Karpathy BlogAI & LLMs

Karpathy's Pure Python AI From Scratch

Andrej Karpathy distills neural nets, LLMs, RL, and Bitcoin into 200-500 line pure Python scripts—no deps needed—to teach core mechanics hands-on.

Learning Data

Pause Before Trust: AI Fooled My Instincts

AI generates undetectable fakes that exploit human trust shortcuts—train yourself to pause and question realistic audio, video, or text instead of believing instantly.

DAY 12April 7, 2026 APR 7 · 20261 SUMMARIES
Reinike AI

TurboQuant: 6x KV Cache Compression Without Attention Loss

TurboQuant rotates KV vectors before quantizing to 3.5 bits/channel (quality-neutral) or 2.5 bits (minor degradation), plus error repair, yielding 6x memory savings and up to 8x speedups for long-context LLMs.

Reinike AI

Showing 28 of 28