№ 02 / SUMMARIES

#post-training

Every summary, chronological. Filter by category, tag, or source from the rail.

Tag · #post-training
DAY 01Yesterday JUN 29 · 20261 SUMMARIES
arXiv cs.AIAgents & Orchestration

ATOD: Hybrid Distillation for Autonomous Agent Training

ATOD combines on-policy distillation with reinforcement learning using an annealed schedule and turn-level reweighting to train small agent models that outperform their larger teacher models.

arXiv cs.AI

Showing 1 of 1