Similar Items: SoK: Robustness in Large Language Models against Jailbreak Attacks
- Pop Quiz Attack: Black-box Membership Inference Attacks Against Large Language Models
- TwinGate: Stateful Defense against Decompositional Jailbreaks in Untraceable Traffic via Asymmetric Contrastive Learning
- Sparse Tokens Suffice: Jailbreaking Audio Language Models via Token-Aware Gradient Optimization
- Guaranteed Jailbreaking Defense via Disrupt-and-Rectify Smoothing
- Re-Triggering Safeguards within LLMs for Jailbreak Detection
- ContextualJailbreak: Evolutionary Red-Teaming via Simulated Conversational Priming