Channels - Beyond Confidence: Rethinking Self-Assessments for Performance Prediction in LLMs :: FRELIP Discovery

Similar Items: Beyond Confidence: Rethinking Self-Assessments for Performance Prediction in LLMs

Quick Look
Beyond Benchmarks: MathArena as an Evaluation Platform for Mathematics with LLMs
Quick Look
Beyond "I cannot fulfill this request": Alleviating Rigid Rejection in LLMs via Label Enhancement
Quick Look
LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling
Quick Look
The First Token Knows: Single-Decode Confidence for Hallucination Detection
Quick Look
Grounded or Guessing? LVLM Confidence Estimation via Blind-Image Contrastive Ranking
Quick Look
Misaligned by Reward: Socially Undesirable Preferences in LLMs
Quick Look
SERE: Structural Example Retrieval for Enhancing LLMs in Event Causality Identification
Quick Look
Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key
Quick Look
Rethinking Local Learning: A Cheaper and Faster Recipe for LLM Post-Training
Quick Look
Beyond Semantics: An Evidential Reasoning-Aware Multi-View Learning Framework for Trustworthy Mental Health Prediction
Quick Look
mdok-style at SemEval-2026 Task 10: Finetuning LLMs for Conspiracy Detection
Quick Look
mdok-style at SemEval-2026 Task 9: Finetuning LLMs for Multilingual Polarization Detection
Quick Look
When LLMs Stop Following Steps: A Diagnostic Study of Procedural Execution in Language Models
Quick Look
Uncertainty-Aware Structured Data Extraction from Full CMR Reports via Distilled LLMs
Quick Look
Automated Clinical Report Generation for Remote Cognitive Remediation: Comparing Knowledge-Engineered Templates and LLMs in Low-Resource Settings
Quick Look
Beyond Decodability: Reconstructing Language Model Representations with an Encoding Probe
Quick Look
DGPO: Beyond Pairwise Preferences with Directional Consistent Groupwise Optimization
Quick Look
Beyond Negative Rollouts: Positive-Only Policy Optimization with Implicit Negative Gradients
Quick Look
Beyond Semantics: Measuring Fine-Grained Emotion Preservation in Small Language Model-Based Machine Translation
Quick Look
Self-Attention as Transport: Limits of Symmetric Spectral Diagnostics
Quick Look
SkillOS: Learning Skill Curation for Self-Evolving Agents
Quick Look
Rebellious Student: Reversing Teacher Signals for Reasoning Exploration with Self-Distilled RLVR
Quick Look
Self-Induced Outcome Potential: Turn-Level Credit Assignment for Agents without Verifiers
Quick Look
PairAlign: A Framework for Sequence Tokenization via Self-Alignment with Applications to Audio Tokenization