#reasoning
Every summary, chronological. Filter by category, tag, or source from the rail.
Tag · #reasoning
Tandem Reinforcement Learning: Aligning AI Reasoning with Humans
Tandem Reinforcement Learning (TRL) forces stronger models to co-generate reasoning with weaker models, resulting in more legible, robust, and human-compatible chains of thought without sacrificing performance.
arXiv cs.AI
Strategy-Guided Policy Optimization for LLM Reasoning
Strategy-Guided Policy Optimization (SGPO) improves LLM reasoning by distilling reusable problem-solving strategies rather than just imitating specific solution trajectories, leading to better generalization.
arXiv cs.AI
VibeThinker-3B: High-Performance Reasoning at 3B Parameters
VibeThinker-3B is a compact, open-source reasoning model that achieves performance comparable to massive models on math and coding tasks by using a specialized 'Spectrum-to-Signal' post-training pipeline.
MarkTechPost
Showing 3 of 3