Similar Items: Guaranteed Jailbreaking Defense via Disrupt-and-Rectify Smoothing
- TwinGate: Stateful Defense against Decompositional Jailbreaks in Untraceable Traffic via Asymmetric Contrastive Learning
- ContextualJailbreak: Evolutionary Red-Teaming via Simulated Conversational Priming
- Sparse Tokens Suffice: Jailbreaking Audio Language Models via Token-Aware Gradient Optimization
- Re-Triggering Safeguards within LLMs for Jailbreak Detection
- SoK: Robustness in Large Language Models against Jailbreak Attacks
- LITMUS: Benchmarking Behavioral Jailbreaks of LLM Agents in Real OS Environments