Similar Items: Re-Triggering Safeguards within LLMs for Jailbreak Detection
- Guaranteed Jailbreaking Defense via Disrupt-and-Rectify Smoothing
- ContextualJailbreak: Evolutionary Red-Teaming via Simulated Conversational Priming
- SoK: Robustness in Large Language Models against Jailbreak Attacks
- LITMUS: Benchmarking Behavioral Jailbreaks of LLM Agents in Real OS Environments
- Trident: Improving Malware Detection with LLMs and Behavioral Features
- GLiGuard: Schema-Conditioned Classification for LLM Safeguard