Similar Items: Understanding In-Context Learning for Nonlinear Regression with Transformers: Attention as Featurizer
- Transformers Efficiently Perform In-Context Logistic Regression via Normalized Gradient Descent
- Beyond Gaussian Bottlenecks: Topologically Aligned Encoding of Vision-Transformer Feature Spaces
- The Structural Origin of Attention Sink: Variance Discrepancy, Super Neurons, and Dimension Disparity
- Complex Equation Learner: Rational Symbolic Regression with Gradient Descent in Complex Domain
- Spiking Sequence Machines and Transformers
- Fast Byte Latent Transformer