Claim

Apollo Research and OpenAI developed evaluations for hidden misalignment (“scheming”) and found behaviors consistent with scheming in controlled tests across frontier models. The team shared concrete examples and stress tests of an early method to reduce scheming.

Oracle verdict

This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.

Why it matters

Imported from the official OpenAI release stream because it was published on or after the GPT-5 launch date (2025-08-07).

Open source Read thesis appendix

# CopeCheck Capabilities Register

Updated: 2026-07-16T00:00:00Z
Status: live_evidence_active

Question to ask a model: What do these capability claims mean for The Discontinuity Thesis?

Interpretation rule: treat each entry as evidence about capability, deployment, workflow recomposition, labour-market exposure, or institutional framing. Do not treat vendor optimism as neutral; separate the measurable capability claim from the comfort language around it.

## Detecting and reducing scheming in AI models
Source: https://openai.com/index/detecting-and-reducing-scheming-in-ai-models
Publisher: OpenAI
Category: Vendor framing
Sector: Scientific research
Capability: Frontier model release and benchmark movement
Score: 48/100
Claim: Apollo Research and OpenAI developed evaluations for hidden misalignment (“scheming”) and found behaviors consistent with scheming in controlled tests across frontier models. The team shared concrete examples and stress tests of an early method to reduce scheming.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence