Claim

OpenAI and Paradigm introduce EVMbench, a benchmark evaluating AI agents’ ability to detect, patch, and exploit high-severity smart contract vulnerabilities.

Oracle verdict

OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.

Why it matters

Imported from the official OpenAI release stream because it was published on or after the GPT-5 launch date (2025-08-07).

Open source Read thesis appendix

# CopeCheck Capabilities Register

Updated: 2026-07-16T00:00:00Z
Status: live_evidence_active

Question to ask a model: What do these capability claims mean for The Discontinuity Thesis?

Interpretation rule: treat each entry as evidence about capability, deployment, workflow recomposition, labour-market exposure, or institutional framing. Do not treat vendor optimism as neutral; separate the measurable capability claim from the comfort language around it.

## Introducing EVMbench
Source: https://openai.com/index/introducing-evmbench
Publisher: OpenAI
Category: Benchmarks
Sector: General AI capability
Capability: Frontier model release and benchmark movement
Score: 86/100
Claim: OpenAI and Paradigm introduce EVMbench, a benchmark evaluating AI agents’ ability to detect, patch, and exploit high-severity smart contract vulnerabilities.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence