№ 02 / SUMMARIES

#quantization

Every summary, chronological. Filter by category, tag, or source from the rail.

Tag · #quantization
DAY 01April 20, 2026 APR 20 · 20261 SUMMARIES
Caleb Writes CodeAI & LLMs

LLM Inference: mmap Loading & Quantization Deep Dive

Efficient LLM inference hinges on mmap for lazy memory loading (e.g., <10s startup on llama.cpp) and quantization like GGUF K-Quants or AWQ/EXL2 to shrink 15GB models while preserving quality via salient weights and mixed precision.

Caleb Writes Code

Showing 3 of 3