Similar Items: Routers Learn the Geometry of Their Experts: Geometric Coupling in Sparse Mixture-of-Experts
- DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices
- UniPool: A Globally Shared Expert Pool for Mixture-of-Experts
- SAVGO: Learning State-Action Value Geometry with Cosine Similarity for Continuous Control
- Learning the Helmholtz equation operator with DeepONet for non-parametric 2D geometries
- Do Sparse Autoencoders Capture Concept Manifolds?
- Proximal Projection for Doubly Sparse Regularized Models