#post-training
Every summary, chronological. Filter by category, tag, or source from the rail.
Tag · #post-training
ATOD: Hybrid Distillation for Autonomous Agent Training
ATOD combines on-policy distillation with reinforcement learning using an annealed schedule and turn-level reweighting to train small agent models that outperform their larger teacher models.
arXiv cs.AI
Showing 1 of 1