Similar Items: Characterizing the Expressivity of Local Attention in Transformers
- Long Context Pre-Training with Lighthouse Attention
- Self-Attention as Transport: Limits of Symmetric Spectral Diagnostics
- Detecting Hallucinations in Large Language Models via Internal Attention Divergence Signals
- GazeVLM: Active Vision via Internal Attention Control for Multimodal Reasoning
- Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key
- Task Vector Geometry Underlies Dual Modes of Task Inference in Transformers