Similar Items: Optimization and Generalization of Gradient Descent for Shallow ReLU Networks with Minimal Width
- Optimizing Attention with Mirror Descent: Generalized Max-Margin Token Selection
- Reparameterized Complex-valued Neurons Can Efficiently Learn More than Real-valued Neurons via Gradient Descent
- A Symplectic Analysis of Alternating Mirror Descent
- Convergence and complexity of block majorization-minimization for constrained block-Riemannian optimization
- Optimization and Generalization of Gradient Descent for Shallow ReLU Networks with Minimal Width
- Unsupervised Feature Selection via Nonnegative Orthogonal Constrained Regularized Minimization