Similar Items: Implicit Representations of Grammaticality in Language Models
- Beyond Decodability: Reconstructing Language Model Representations with an Encoding Probe
- Mitigating Misalignment Contagion by Steering with Implicit Traits
- Beyond Negative Rollouts: Positive-Only Policy Optimization with Implicit Negative Gradients
- Geometry-Calibrated Conformal Abstention for Language Models
- The Frequency Confound in Language-Model Surprisal and Metaphor Novelty
- The Counterexample Game: Iterated Conceptual Analysis and Repair in Language Models