#mechanistic-interpretability
Every summary, chronological. Filter by category, tag, or source from the rail.
Tag · #mechanistic-interpretability
Steering LLM Personality via Latent Feature Interventions
Researchers have developed a mechanistic method to steer LLM personality traits by identifying and modifying latent features in the model's residual stream using sparse autoencoders, enabling precise behavioral control without retraining.
arXiv cs.AI
Showing 1 of 1