Similar Items: Revisiting Gradient Normalization and Clipping for Nonconvex SGD under Heavy-Tailed Noise: Necessity, Sufficiency, and Acceleration
- Revisiting Gradient Normalization and Clipping for Nonconvex SGD under Heavy-Tailed Noise: Necessity, Sufficiency, and Acceleration
- Revisiting Gradient Normalization and Clipping for Nonconvex SGD under Heavy-Tailed Noise: Necessity, Sufficiency, and Acceleration
- Revisiting Gradient Normalization and Clipping for Nonconvex SGD under Heavy-Tailed Noise: Necessity, Sufficiency, and Acceleration
- Nonconvex Stochastic Bregman Proximal Gradient Method with Application to Deep Learning
- Two-Timescale Gradient Descent Ascent Algorithms for Nonconvex Minimax Optimization
- Guaranteed Nonconvex Low-Rank Tensor Estimation via Scaled Gradient Descent