Similar Items: DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices
- UniPool: A Globally Shared Expert Pool for Mixture-of-Experts
- Do Sparse Autoencoders Capture Concept Manifolds?
- Proximal Projection for Doubly Sparse Regularized Models
- Fine-Grained Graph Generation through Latent Mixture Scheduling
- On Computing Total Variation Distance Between Mixtures of Product Distributions
- Transcoda: End-to-End Zero-Shot Optical Music Recognition via Data-Centric Synthetic Training