Similar Items: Crafting Reversible SFT Behaviors in Large Language Models
- Safety and accuracy follow different scaling laws in clinical large language models
- UniSD: Towards a Unified Self-Distillation Framework for Large Language Models
- Concept-Based Abductive and Contrastive Explanations for Behaviors of Vision Models
- Bolek: A Multimodal Language Model for Molecular Reasoning
- Tool Calling is Linearly Readable and Steerable in Language Models
- Where's the Plan? Locating Latent Planning in Language Models with Lightweight Mechanistic Interventions