Similar Items: Spiking Sequence Machines and Transformers
- Globally Optimal Training of Spiking Neural Networks via Parameter Reconstruction
- Fast Byte Latent Transformer
- Transformers with Selective Access to Early Representations
- Taming Outlier Tokens in Diffusion Transformers
- Transformed Latent Variable Multi-Output Gaussian Processes
- DEFault++: Automated Fault Detection, Categorization, and Diagnosis for Transformer Architectures