Channels - Beyond Red-Teaming: Formal Guarantees of LLM Guardrail Classifiers :: FRELIP Discovery

Similar Items: Beyond Red-Teaming: Formal Guarantees of LLM Guardrail Classifiers

Quick Look
Defending Quantum Classifiers against Adversarial Perturbations through Quantum Autoencoders
Quick Look
Ecologically-Constrained Task Arithmetic for Multi-Taxa Bioacoustic Classifiers Without Shared Data
Quick Look
Computing Equilibrium beyond Unilateral Deviation
Quick Look
Generating Statistical Charts with Validation-Driven LLM Workflows
Quick Look
Steer Like the LLM: Activation Steering that Mimics Prompting
Quick Look
Building informative materials datasets beyond targeted objectives
Quick Look
Attractor-Vascular Coupling Theory: Formal Grounding and Empirical Validation for AAMI-Standard Cuffless Blood Pressure Estimation from Smartphone Photoplethysmography
Quick Look
Why Global LLM Leaderboards Are Misleading: Small Portfolios for Heterogeneous Supervised ML
Quick Look
U-Define: Designing User Workflows for Hard and Soft Constraints in LLM-Based Planning
Quick Look
Evaluating the Architectural Reasoning Capabilities of LLM Provers via the Obfuscated Natural Number Game
Quick Look
Continual Knowledge Updating in LLM Systems: Learning Through Multi-Timescale Memory Dynamics
Quick Look
Low-Cost Black-Box Detection of LLM Hallucinations via Dynamical System Prediction
Quick Look
How Many Iterations to Jailbreak? Dynamic Budget Allocation for Multi-Turn LLM Evaluation
Quick Look
SLIM: Sparse Latent Steering for Interpretable and Property-Directed LLM-Based Molecular Editing
Quick Look
When No Benchmark Exists: Validating Comparative LLM Safety Scoring Without Ground-Truth Labels
Quick Look
Beyond Gaussian Bottlenecks: Topologically Aligned Encoding of Vision-Transformer Feature Spaces
Quick Look
Beyond Pairs: Your Language Model is Secretly Optimizing a Preference Graph
Quick Look
RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards
Quick Look
Self-Play Enhancement via Advantage-Weighted Refinement in Online Federated LLM Fine-Tuning with Real-Time Feedback
Quick Look
Joint Treatment Effect Estimation from Incomplete Healthcare Data: Temporal Causal Normalizing Flows with LLM-driven Evolutionary MNAR Imputation
Quick Look
Early Detection of Water Stress by Plant Electrophysiology: Machine Learning for Irrigation Management
Quick Look
Exponential families from a single KL identity
Quick Look
TopBench: A Benchmark for Implicit Prediction and Reasoning over Tabular Question Answering
Quick Look
A Unified Framework of Hyperbolic Graph Representation Learning Methods