Similar Items: Long Context Pre-Training with Lighthouse Attention
- The Impossibility Triangle of Long-Context Modeling
- Efficient Pre-Training with Token Superposition
- Fuzzy Fingerprinting Encoder Pre-trained Language Models for Emotion Recognition in Conversations: Human Assessment and Validity Study
- Characterizing the Expressivity of Local Attention in Transformers
- Self-Attention as Transport: Limits of Symmetric Spectral Diagnostics
- Detecting Hallucinations in Large Language Models via Internal Attention Divergence Signals