Similar Items: Optimizing Attention with Mirror Descent: Generalized Max-Margin Token Selection
- A Symplectic Analysis of Alternating Mirror Descent
- Optimization and Generalization of Gradient Descent for Shallow ReLU Networks with Minimal Width
- Fast Rates in $α$-Potential Games via Regularized Mirror Descent
- Two-Timescale Gradient Descent Ascent Algorithms for Nonconvex Minimax Optimization
- IdentityByDescentDispersal.jl: Inferring dispersal rates with identity-by-descent blocks
- An Asymptotically Optimal Coordinate Descent Algorithm for Learning Bayesian Networks from Gaussian Models