Similar Items: Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime
- RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards
- Steer Like the LLM: Activation Steering that Mimics Prompting
- Exploration Hacking: Can LLMs Learn to Resist RL Training?
- Enhancing RL Generalizability in Robotics through SHAP Analysis of Algorithms and Hyperparameters
- Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior
- SLIM: Sparse Latent Steering for Interpretable and Property-Directed LLM-Based Molecular Editing