Similar Items: Attention Once Is All You Need: Efficient Streaming Inference with Stateful Transformers
- Masked Generative Transformer Is What You Need for Image Editing
- Elastic Attention Cores for Scalable Vision Transformers
- Understanding In-Context Learning for Nonlinear Regression with Transformers: Attention as Featurizer
- Multi-Stream LLMs: Unblocking Language Models with Parallel Streams of Thoughts, Inputs and Outputs
- Transformers Efficiently Perform In-Context Logistic Regression via Normalized Gradient Descent
- The Structural Origin of Attention Sink: Variance Discrepancy, Super Neurons, and Dimension Disparity