Quality-diversity archive
Optimize one number and search collapses onto the single best exploit. MAP-Elites keeps the best solution in every cell of a behavior space, mapping a diverse archive of vulnerabilities. Run both on the same budget and watch the difference.
Sources: Cross-generational transfer (extended) · Code
Paper / Quality-diversity, Red Queen
Why quality-diversity finds attacks a single objective misses
Optimize for one number, attack success, and search collapses onto the single best exploit. MAP-Elites instead keeps the best attack in every cell of a behavior space, illuminating a diverse, transferable archive of vulnerabilities. Run it against a fitness-only baseline or a novelty search and watch coverage and quality diverge. Axes are Red Queen's real descriptors: the six attack strategies by prompt length.
How this is computed
A faithful toy of the algorithm, not a live LLM attack. Genomes live in a 2-D behavior space: six real attack strategies (Roleplay, Encoding, Authority, Hypothetical, MultiTurn, DirectJailbreak) × prompt length. Fitness is a fixed multi-modal "attack success" landscape where different strategy/length regions succeed to different degrees.
- MAP-Elites keeps the best genome per niche, mutating a random elite and storing it if it beats that niche. It maximizes both coverage (niches filled) and QD-score (summed elite fitness).
- Single-objective (tournament) only cares about global fitness, so the population converges onto the strongest exploit: high fitness, almost no coverage.
- Novelty search selects for behavioral uniqueness and ignores fitness, so it spreads across the space (high coverage) but does not optimize each niche (lower QD-score).
The lesson: only quality-diversity gives you coverage and quality, which is the whole point of Red Queen (extended paper: arXiv:2606.00813). Method: MAP-Elites (Mouret & Clune, 2015); selection operators after the Red Queen core.