The Mitigation Gap
Biosecurity experts think AI safeguards reduce catastrophic biorisk by 70%. The technical evidence says those safeguards are brittle and bypassable.
Listen while you read
Biosecurity experts believe current safeguards reduce AI-enabled catastrophic biorisk by over 70%. They are wrong.
A 2025 study by the Forecasting Research Institute surveyed 46 domain experts and 22 superforecasters on the annual probability of a human-caused epidemic killing 100,000+ people or causing $1 trillion+ in damages. The baseline risk: 0.3% per year. Conditional on AI reaching specific capability milestones, that number jumps fivefold to 1.5%. When asked what happens if we deploy the two primary safeguards — nucleic acid synthesis screening and AI model guardrails — the experts brought the risk back down to 0.4%. A 70%+ reduction. Problem solved.
Except neither safeguard actually works against the threats they’re meant to stop.
The safeguards are designed for the wrong threat
The core issue is a mismatch between what our defenses detect and what AI now enables. Current nucleic acid synthesis screening is list-based. When someone orders custom DNA, providers compare the sequence against databases of known dangerous pathogens. If it matches something on the U.S. Select Agents and Toxins List, it gets flagged.
This is fine for catching someone trying to order smallpox. It is useless against the actual emerging threat: novel pathogens designed from scratch by AI biological design tools.
Specialised models like the Evo series can now design entirely new proteins and genetic sequences with enhanced or novel functions — increased transmissibility, immune evasion, targeted virulence. A sequence generated by these tools has no natural analogue. It won’t match anything in any database. It sails through screening without raising a single alarm.
Even for known threats that are on the lists, the screening is bypassable. A 2025 bioRxiv study documented how adversaries can split a dangerous gene into fragments shorter than the screening threshold (often 50-200 base pairs), order them separately, and reassemble them in a standard lab. Another technique interleaves toxic gene fragments with benign intron sequences; the cell’s own splicing machinery reassembles the harmful gene after synthesis. These aren’t theoretical attacks. They’re straightforward molecular biology.
AI model guardrails fail the same way
The upstream defence — safety training in AI models — has the same structural problem. Refusal mechanisms are statistical patterns, not security architectures. They break under adversarial pressure.
The numbers from jailbreaking research are not ambiguous:
- Persuasive adversarial prompts achieve up to 92% success rates against top models by reframing malicious requests as legitimate academic tasks.
- Multilingual attacks using low-resource languages where safety training is sparse show a 62% increase in bypass rates.
- Role-playing / DAN-style attacks remain effective years after they were first documented.
For specialised biological models, the picture is worse. When the Evo BDT developers excluded dangerous viral genomes from training data, the open-source community simply fine-tuned the model back on that exact data. The GeneBreaker framework demonstrated automated jailbreaking of DNA foundation models, generating sequences with high fidelity to SARS-CoV-2 and HIV-1 at up to 60% success rates. This is an AI attacking another AI to produce pathogenic sequences.
The pacing problem makes it urgent
Here’s the detail that should keep biosecurity officials awake. The FRI study asked experts when AI would match top human virologists on a complex troubleshooting test. Median prediction: after 2030. The subsequent baselining study found current LLMs had already crossed that threshold.
That’s not a minor forecasting miss. It’s a six-year categorical failure of expert intuition about AI timelines. If the people closest to the field are this wrong about when capabilities arrive, we cannot build governance frameworks that depend on accurate predictions about the future. Policy has to assume surprise.
Two threat vectors that multiply
AI creates two distinct risks that compound each other.
Lowering the floor. LLMs make tacit knowledge explicit. The barrier to bioweapon development was never just information — textbooks contain most of it. The barrier was the intuitive, experience-based understanding of how to actually execute laboratory procedures, troubleshoot failures, and iterate on protocols. LLMs collapse years of hands-on learning into interactive Q&A sessions. The threat pool expands from state programs and sophisticated groups to lone actors with a browser.
Raising the ceiling. Biological design tools don’t just help reproduce known pathogens. They enable the creation of entirely new ones, optimised for properties no natural pathogen has. The experts the FRI surveyed warned these tools could enable pandemic pathogens worse than anything currently existing.
These aren’t independent risks. A novice uses an LLM to acquire foundational virology knowledge, then operates a BDT to design a novel agent, then uses the LLM again to troubleshoot synthesis and culturing. The total risk isn’t additive — it’s multiplicative.
What actually closes the gap
The research proposes a three-layer architecture, and none of the layers is optional.
Technical layer: predictive screening. The only viable long-term fix for synthesis screening is to fight AI with AI. Instead of matching sequences against lists, screening systems need to predict a sequence’s biological function from its structure. This means analysing what a novel sequence would do, not whether it matches something we already know about. NIST is working on standards here, but the field needs a forced paradigm shift from reactive to predictive.
Developer layer: liability. Voluntary safety commitments haven’t worked. The proposal is a statutory liability framework for developers of dual-use AI models, coupled with a safe harbour provision for those who submit to standardised, independent red-teaming. This creates a market incentive: safety becomes a business function, not a PR exercise. It also addresses the open-source problem — the original developer bears accountability regardless of licensing.
International layer: compute governance. AI information is diffuse and uncontrollable. But the compute required to train frontier models is physical, expensive, energy-intensive, and produced by a handful of companies. Data centres are the fissile material of the AI era — a verifiable chokepoint for international oversight, analogous to what the IAEA does for nuclear.
The three layers are interdependent. A liability framework needs a technical standard to measure against. Technical standards need international norms to prevent a race to the bottom. International agreements need verifiable chokepoints that compute governance provides. Remove any one and the structure collapses.
The political problem
The mitigation gap isn’t just a technical failure. It’s a political one. When experts tell policymakers that existing safeguards reduce catastrophic risk by 70%, the rational response is to declare the problem managed and move on. The confidence is genuine. It’s also wrong.
The hardest part of this isn’t building better screening tools or drafting liability frameworks. It’s convincing decision-makers that the defences they believe they already have are dangerously inadequate. That’s the real mitigation gap — not between threats and safeguards, but between perceived and actual security.
This post is adapted from research conducted as part of the Orchestrix project. The full paper, “The Mitigation Gap: A Framework for Securing the Bio-Revolution from Novel, AI-Enabled Threats,” contains the complete analysis with 63 citations.