Similar Items: Revisiting Policy Gradients for Restricted Policy Classes: Escaping Myopic Local Optima with $k$-step Policy Gradients
- Randomized Subspace Nesterov Accelerated Gradient
- Decentralized Proximal Stochastic Gradient Langevin Dynamics
- On the Wasserstein Gradient Flow Interpretation of Drifting Models
- Unmasking On-Policy Distillation: Where It Helps, Where It Hurts, and Why
- Optimal Posterior Sampling for Policy Identification in Tabular Markov Decision Processes
- RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards