Similar Items: Exploration Hacking: Can LLMs Learn to Resist RL Training?
- Enhancing RL Generalizability in Robotics through SHAP Analysis of Algorithms and Hyperparameters
- On the Hardness of Junking LLMs
- RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards
- Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime
- AssayBench: An Assay-Level Virtual Cell Benchmark for LLMs and Agents
- V4FinBench: Benchmarking Tabular Foundation Models, LLMs, and Standard Methods on Corporate Bankruptcy Prediction