ATHENA-R1: An AI Agent for Iterative Biomedical Treatment Reasoning

Reframing Treatment Reasoning as an Iterative Process

Treatment reasoning is inherently iterative, requiring the integration of disease context, comorbidities, and evolving biomedical knowledge. Traditional LLMs often struggle with this because they lack the ability to verify information against external, grounded sources. ATHENA-R1 addresses this by reframing treatment reasoning as a learnable process of iterative evidence gathering. The agent operates across a "universe" of 212 biomedical tools, allowing it to identify missing information, execute relevant tools, and synthesize evidence before reaching a conclusion.

A Two-Level Self-Learning Framework

The authors trained ATHENA-R1 without human-annotated traces by employing a two-level self-learning framework:

Multi-Agent Construction: A system of agents generates the necessary tools, tasks, and reasoning trajectories to create a dataset for supervised fine-tuning.
Reinforcement Learning (RL): The model is further refined using RL with scientific feedback. The reward function specifically targets reasoning quality, including evidence gathering, grounded tool use, and logical non-redundancy.

Performance and Clinical Validation

ATHENA-R1 demonstrates significant improvements over existing models across multiple benchmarks:

Benchmark Performance: It achieved 94.7% accuracy on open-ended drug reasoning and 82.9% on patient treatment cases, outperforming GPT-5 by 17.8 and 10.7 percentage points, respectively.
Expert Preference: In blinded evaluations by experts from 28 rare disease organizations, ATHENA-R1 was preferred over reference models across all criteria.
Real-World Impact: The agent generated adverse-event hypotheses that were tested against electronic health records from 5.4 million patients. These hypotheses reached adjusted odds ratios of 1.48–1.84, demonstrating the model's ability to produce clinically actionable insights.

Reframing Treatment Reasoning as an Iterative Process

A Two-Level Self-Learning Framework

Performance and Clinical Validation

More from AI & LLMs

Safe Multi-Agent RL via Constraint Manifold Control

Skill-Guided Continuation Distillation for GUI Agents

Orchestra-o1: A Framework for Omnimodal Agent Orchestration

Arbor: Enhancing Agent Cognition via Tree Search