Similar Items: Aligning Flow Map Policies with Optimal Q-Guidance
- Task-Adaptive Embedding Refinement via Test-time LLM Guidance
- Optimal Posterior Sampling for Policy Identification in Tabular Markov Decision Processes
- ORCE: Order-Aware Alignment of Verbalized Confidence in Large Language Models
- Beyond Gaussian Bottlenecks: Topologically Aligned Encoding of Vision-Transformer Feature Spaces
- Revisiting Policy Gradients for Restricted Policy Classes: Escaping Myopic Local Optima with $k$-step Policy Gradients
- ELF: Embedded Language Flows