Similar Items: Verifier-Backed Hard Problem Generation for Mathematical Reasoning
- On the Hardness of Junking LLMs
- Trust, but Verify: Peeling Low-Bit Transformer Networks for Training Monitoring
- Bolek: A Multimodal Language Model for Molecular Reasoning
- U-Define: Designing User Workflows for Hard and Soft Constraints in LLM-Based Planning
- Visual Latents Know More Than They Say: Unsilencing Latent Reasoning in MLLMs
- TopBench: A Benchmark for Implicit Prediction and Reasoning over Tabular Question Answering