Similar Items: Latent Adversarial Detection: Adaptive Probing of LLM Activations for Multi-Turn Attack Detection
- Detecting Adversarial Data via Provable Adversarial Noise Amplification
- Autonomous Adversary: Red-Teaming in the age of LLM
- Backdoor Mitigation in Object Detection via Adversarial Fine-Tuning
- Low Rank Adaptation for Adversarial Perturbation
- LoopTrap: Termination Poisoning Attacks on LLM Agents
- When Alignment Isn't Enough: Response-Path Attacks on LLM Agents