The Importance of Adversarial Evaluations for AI Safety
Do not write that jailbreak paper
The Worst (But Only) Claude 3 Tokenizer
Universal Jailbreak Backdoors from Poisoned Human Feedback