A live capabilities register for press releases, case studies, research papers, benchmarks, and labour-market signals. Each item keeps the measurable claim, the source, and the Oracle verdict on what it means for the thesis.
Paste a source URL and the Oracle will decide whether it belongs in the capabilities register. Recommended items are saved to the Appendix III review queue rather than published blindly.
Capability map
average pressure score
Machine brief
# CopeCheck Capabilities Register
Updated: 2026-06-02T20:47:39Z
Status: live_evidence_active
Question to ask a model: What do these capability claims mean for The Discontinuity Thesis?
Interpretation rule: treat each entry as evidence about capability, deployment, workflow recomposition, labour-market exposure, or institutional framing. Do not treat vendor optimism as neutral; separate the measurable capability claim from the comfort language around it.
## Introducing Claude Opus 4
Source: https://www.anthropic.com/news/claude-opus-4-8
Publisher: Anthropic
Category: Benchmarks
Sector: Software engineering
Capability: Autonomous multi-agent operation
Score: 72/100
Claim: Claude Opus 4 (claude-opus-4-8) introduces extended thinking, interleaved reasoning, and the ability to run hundreds of parallel subagents unattended in fully autonomous agentic workflows. Anthropic highlights use cases where the model replaces attorneys and engineers, writes full codebases autonomously, and handles open-ended multi-step tasks without human supervision.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Law Professors Prefer AI Over Peer Answers
Source: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6849678
Publisher: Salinas, Frieders, Guha, Ma, Nyarko et al. / Stanford Law liftlab
Category: Benchmarks
Sector: Legal education
Capability: Expert-level legal tutoring surpassing human instructors
Score: 88/100
Claim: LLMs rated at 75.33% win rate over expert law professors in blinded evaluation; Claude Opus 4.7 ranked #1; all AI models outperformed every human instructor; LLM harmful-response rate (3.53%) vs professors (12.06%)
Oracle verdict: This paper is a tombstone written by the people whose graves it is marking. The authors conducted one of the most methodologically careful studies of professional AI displacement published in legal academia, documented the results with statistical precision, and filed it under benchmark evaluation. The cope is institutional: the authors work at institutions whose value proposition depends on the human expertise they just measured as inferior. The omission of labor market implications is not an oversight -- it is load-bearing architecture.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Accenture Ireland: Generating Impact — Turning Frontier AI Capabilities into Frontline Productivity and Growth in Ireland
Source: https://www.accenture.com/content/dam/accenture/final/accenture-com/document-fy26/q3/Generating-Impact-Ireland.pdf
Publisher: Accenture Ireland
Category: Deployments
Sector: Cross-sector Irish economy / consulting
Capability: AI-enabled workflow recomposition at economy scale
Score: 78/100
Claim: Accenture reports that 82% of Irish working hours are now ‘AI-reinventable’ (up from 42% in 2024), that AI is already being used for tasks accounting for 20% of working hours, and that 39% of Irish employees expect their job to be unrecognisable or disappear completely by end of the decade. Entry-level hiring demand expectations have deteriorated sharply: share of executives expecting increased entry-level demand fell from 49% to 33%, while those expecting reduced demand rose from 21% to 37%. Writing and editing declined across 51 Irish occupations 2023–2025.
Oracle verdict: This is the Discontinuity Thesis rendered as a consulting pitch deck. The data Accenture presents — 82% of hours in scope, 39% expecting disappearance, entry-level pipeline contracting — is exactly the frontier evidence Appendix III exists to capture. The framing (‘generating impact’, ‘reinvention’, ‘opportunity’) is the cope layer the thesis predicts. When the firms selling the transition also control the vocabulary of the transition, displacement becomes ‘reinvention’ and mass job anxiety becomes workers being ‘prepared to engage positively.’
Thesis relevance: Appendix III, section six: consultancy cope framing as evidence signal — the firms selling AI transformation are now publishing displacement data inside productivity narratives
## Singular Bank automates banking workflows with OpenAI
Source: https://openai.com/index/singular-bank/
Publisher: OpenAI
Category: Deployments
Sector: Financial services / retail banking
Capability: End-to-end banking workflow automation: document processing, client onboarding, compliance
Score: 81/100
Claim: Singular Bank integrated OpenAI models to automate core banking workflows including document processing, client onboarding, and routine compliance tasks. Processing time for previously manual workflows was reduced by over 80%. Staff were redeployed from execution roles to oversight roles. The bank now treats AI as operational infrastructure.
Oracle verdict: A bank replaced the work of banking with OpenAI. Not augmented, not assisted — automated. The Singular Bank case is the financial-sector equivalent of AutoScout24 in software: the workflow was recomposed around AI execution and human oversight is the residual. When banks describe this as ‘redeployment,’ the thesis reads: the jobs that existed to process documents and onboard clients no longer exist in their prior form. The infrastructure framing (‘AI is now core infrastructure’) is the tell — infrastructure does not get rolled back.
Thesis relevance: Appendix III, section four: financial-sector deployment evidence — banking workflows automated end-to-end, not assisted
## Databricks brings GPT-5.5 to enterprise agent workflows
Source: https://openai.com/index/databricks
Publisher: OpenAI
Category: Benchmarks
Sector: Enterprise operations
Capability: Frontier model release and benchmark movement
Score: 83/100
Claim: Databricks uses GPT-5.5 for enterprise agent workflows after the model set a new state of the art on the OfficeQA Pro benchmark.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## A new personal finance experience in ChatGPT
Source: https://openai.com/index/personal-finance-chatgpt
Publisher: OpenAI
Category: Vendor framing
Sector: Financial services
Capability: Financial workflow automation
Score: 64/100
Claim: Preview a new personal finance experience in ChatGPT for Pro users in the U.S. Securely connect your financial accounts and get AI-powered insights and guidance grounded in your financial context, goals, and priorities.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Chatham Financial trade validation compressed from 30 minutes to under 4
Source: https://www.linkedin.com/company/openai/
Publisher: OpenAI for Business / Chatham Financial
Category: Deployments
Sector: Financial risk management
Capability: Trade validation and compliance monitoring
Score: 68/100
Claim: OpenAI for Business and Chatham Financial described a GPT-5.5-Codex workflow that reduced trade validation from roughly 30 minutes to under 4 minutes, with real-time compliance monitoring for 160+ registered employees and audit-ready workflow outputs.
Oracle verdict: Thirty minutes of trade validation became less than four. The important part is not the time saving by itself; it is that verification, compliance, and audit-output generation are being pulled into a machine-readable workflow.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Agents, robots, and us: how AI reshapes work and skills in Europe
Source: https://www.mckinsey.com/mgi/our-research/agents-robots-and-us-how-ai-reshapes-work-and-skills-in-europe
Publisher: McKinsey Global Institute
Category: Labour market
Sector: European labour markets
Capability: Task automation and skill recomposition
Score: 79/100
Claim: McKinsey Global Institute estimates that 58% of current work hours across ten European countries are technically automatable with existing technologies, including 44% by agents and 14% by robots.
Oracle verdict: The report uses cautious productivity language, but the measurement frame is already discontinuity-shaped: hours, tasks, agents, robots, and skill substitution. That is the thesis's operating layer in consultant language.
Thesis relevance: Appendix III, sections five to seven: labour-market evidence and deployment continuation
## Working with AI: measuring the occupational implications of generative AI
Source: https://www.microsoft.com/en-us/research/publication/working-with-ai-measuring-the-occupational-implications-of-generative-ai/
Publisher: Microsoft Research
Category: Labour market
Sector: Occupational exposure research
Capability: Generative AI task overlap across occupations
Score: 74/100
Claim: Microsoft Research analysed 200,000 anonymised Bing Copilot conversations and mapped generative AI applicability across occupations, with high exposure concentrated in communication, analysis, writing, sales, and knowledge-work roles.
Oracle verdict: The caveat is doing institutional work: task overlap does not prove full occupation replacement. But the thesis does not require full occupation replacement; it requires workflow-level recomposition that reduces the human production layer.
Thesis relevance: Appendix III, sections five to seven: labour-market evidence and provider framing
## Sea's View on the Future of Agentic Software Development with Codex
Source: https://openai.com/index/sea-david-chen
Publisher: OpenAI
Category: Deployments
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 95/100
Claim: Sea Limited's CPO explains why the company is deploying Codex across engineering teams to accelerate AI-native software development in Asia.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Work with Codex from anywhere
Source: https://openai.com/index/work-with-codex-from-anywhere
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 74/100
Claim: Use Codex anywhere with the ChatGPT mobile app. Monitor, steer, and approve coding tasks in real time across devices and remote environments.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Helping ChatGPT better recognize context in sensitive conversations
Source: https://openai.com/index/chatgpt-recognize-context-in-sensitive-conversations
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 52/100
Claim: Learn how new ChatGPT safety updates improve context awareness in sensitive conversations, helping detect risk over time and respond more safely.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## PwC is deploying Claude to build technology, execute deals, and reinvent enterprise functions for clients
Source: https://www.anthropic.com/news/pwc-expanded-partnership
Publisher: Anthropic
Category: Deployments
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 85/100
Claim: Anthropic and PwC today announced an expansion of their strategic alliance, deepening how PwC uses Claude to build technology, execute deals, and reinvent enterprise functions for clients across every industry it serves. Most enterprises are still running on systems and processes built for a pre-AI world—a drag that is estimated to be more than $2.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Anthropic forms $200 million partnership with the Gates Foundation
Source: https://www.anthropic.com/news/gates-foundation-partnership
Publisher: Anthropic
Category: Benchmarks
Sector: Healthcare and life sciences
Capability: Healthcare and life-sciences reasoning
Score: 65/100
Claim: We’re partnering with the Gates Foundation to commit $200 million in grant funding, Claude usage credits, and technical support for programs in global health, life sciences, education, and economic mobility over the next four years. These programs will be implemented with partners in the US and around the world. This commitment is central to Anthropic’s.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Building a safe, effective sandbox to enable Codex on Windows
Source: https://openai.com/index/building-codex-windows-sandbox
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 74/100
Claim: Learn how OpenAI built a secure sandbox for Codex on Windows, enabling safe, efficient coding agents with controlled file access and network restrictions.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Our response to the TanStack npm supply chain attack
Source: https://openai.com/index/our-response-to-the-tanstack-npm-supply-chain-attack
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Cyber defence and misuse monitoring
Score: 64/100
Claim: OpenAI details its response to the TanStack “Mini Shai-Hulud” supply chain attack, outlines protections taken to secure systems and signing certificates, and explains why macOS users must update OpenAI apps by June 12, 2026. Learn what happened, what was affected, and how OpenAI is strengthening defenses against evolving software supply chain threats.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Introducing Claude for Small Business
Source: https://www.anthropic.com/news/claude-for-small-business
Publisher: Anthropic
Category: Labour market
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 82/100
Claim: We're launching Claude for Small Business —a package of connectors and ready-to-run workflows that put Claude inside the tools small businesses depend on—to help small business owners take full advantage of AI and cross off items on the to-do list. Small businesses account for 44% of U.S. GDP and employ nearly half the private-sector workforce, but their.
Oracle verdict: This is a labour-market context signal rather than a single workflow proof point. It helps the thesis track whether adoption, education, wages, and institutional behaviour are moving in the same direction as the capability curve.
Thesis relevance: Appendix III, section five: labour-market and adoption evidence
## AutoScout24 scales engineering with AI-powered workflows
Source: https://openai.com/index/autoscout24/
Publisher: OpenAI
Category: Deployments
Sector: Marketplace software
Capability: Software delivery workflow automation
Score: 84/100
Claim: OpenAI reports that AutoScout24 rolled out ChatGPT to roughly 2,000 employees and Codex to roughly 1,000 builder employees, with selected projects compressed from 2-3 weeks to 2-3 days.
Oracle verdict: This is the thesis moving from benchmark to operating model. The claim is not that engineers got a better autocomplete; it is that planning, review, refactoring, documentation, and incident analysis are being reorganised around agentic work continuation.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## What Parameter Golf taught us about AI-assisted research
Source: https://openai.com/index/what-parameter-golf-taught-us
Publisher: OpenAI
Category: Benchmarks
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 86/100
Claim: Parameter Golf brought together 1,000+ participants and 2,000+ submissions to explore AI-assisted machine learning research, coding agents, quantization, and novel model design under strict constraints.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## How NVIDIA engineers and researchers build with Codex
Source: https://openai.com/index/nvidia
Publisher: OpenAI
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: Teams use Codex with GPT-5.5 to ship production systems and turn research ideas into runnable experiments.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## OpenAI Campus Network: Student club interest form
Source: https://openai.com/index/openai-campus-network-student-club-interest-form
Publisher: OpenAI
Category: Vendor framing
Sector: Education
Capability: Education and workforce adoption
Score: 64/100
Claim: Join the OpenAI Campus Network—connect student clubs worldwide, access AI tools, host events, and build an AI-powered campus community.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## OpenAI launches DeployCo to help businesses build around intelligence
Source: https://openai.com/index/openai-launches-the-deployment-company
Publisher: OpenAI
Category: Benchmarks
Sector: Enterprise operations
Capability: Frontier model release and benchmark movement
Score: 83/100
Claim: OpenAI launches DeployCo, a new enterprise deployment company built to help organizations bring frontier AI into production and turn it into measurable business impact.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Running Codex safely at OpenAI
Source: https://openai.com/index/running-codex-safely
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 74/100
Claim: How OpenAI runs Codex securely with sandboxing, approvals, network policies, and agent-native telemetry to support safe and compliant coding agent adoption.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Scaling Trusted Access for Cyber with GPT-5.5 and GPT-5.5-Cyber
Source: https://openai.com/index/gpt-5-5-with-trusted-access-for-cyber
Publisher: OpenAI
Category: Benchmarks
Sector: Cybersecurity
Capability: Frontier model release and benchmark movement
Score: 88/100
Claim: OpenAI expands Trusted Access for Cyber with GPT-5.5 and GPT-5.5-Cyber, helping verified defenders accelerate vulnerability research and protect critical infrastructure.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Parloa builds service agents customers want to talk to
Source: https://openai.com/index/parloa
Publisher: OpenAI
Category: Deployments
Sector: Customer operations
Capability: Enterprise workflow automation
Score: 95/100
Claim: Parloa leverages OpenAI models to power scalable, voice-driven AI customer service agents, enabling enterprises to design, simulate, and deploy reliable, real-time interactions.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Advancing voice intelligence with new models in the API
Source: https://openai.com/index/advancing-voice-intelligence-with-new-models-in-the-api
Publisher: OpenAI
Category: Vendor framing
Sector: Media and content
Capability: Multimodal content generation and media workflows
Score: 64/100
Claim: Explore new realtime voice models in the OpenAI API that can reason, translate, and transcribe speech, enabling more natural and intelligent voice experiences.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Testing ads in ChatGPT
Source: https://openai.com/index/testing-ads-in-chatgpt
Publisher: OpenAI
Category: Vendor framing
Sector: Customer operations
Capability: Vendor platform capability signal
Score: 64/100
Claim: OpenAI begins testing ads in ChatGPT to support free access, with clear labeling, answer independence, strong privacy protections, and user control.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Introducing Trusted Contact in ChatGPT
Source: https://openai.com/index/introducing-trusted-contact-in-chatgpt
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 52/100
Claim: Introducing Trusted Contact in ChatGPT, an optional safety feature that notifies someone you trust if serious self-harm concerns are detected.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Simplex rethinks software development with Codex
Source: https://openai.com/index/simplex
Publisher: OpenAI
Category: Deployments
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 95/100
Claim: Simplex boosts software development with ChatGPT Enterprise and Codex, reducing design, build, and testing time while scaling AI-driven workflows.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## How ChatGPT learns about the world while protecting privacy
Source: https://openai.com/index/how-chatgpt-protects-privacy
Publisher: OpenAI
Category: Vendor framing
Sector: Cybersecurity
Capability: Vendor platform capability signal
Score: 52/100
Claim: Learn how ChatGPT safeguards your privacy, reduces personal data in training, and gives you control over whether your conversations improve AI models.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Introducing ChatGPT Futures: Class of 2026
Source: https://openai.com/index/introducing-chatgpt-futures-class-of-2026
Publisher: OpenAI
Category: Benchmarks
Sector: Education
Capability: Education and workforce adoption
Score: 76/100
Claim: Meet the ChatGPT Futures Class of 2026—26 student innovators using AI to build, research, and drive real-world impact. Discover how this generation is redefining learning, creativity, and opportunity with ChatGPT.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Uber uses OpenAI to help people earn smarter and book faster
Source: https://openai.com/index/uber
Publisher: OpenAI
Category: Deployments
Sector: Commerce and marketplace
Capability: Multimodal content generation and media workflows
Score: 82/100
Claim: Uber uses OpenAI to power AI assistants and voice features that help drivers earn smarter and riders book faster across a global real-time marketplace.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## How frontier firms are pulling ahead
Source: https://openai.com/index/introducing-b2b-signals
Publisher: OpenAI
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 93/100
Claim: OpenAI’s B2B Signals research shows how frontier enterprises deepen AI adoption, scale Codex-powered agentic workflows, and build durable competitive advantage.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Higher usage limits for Claude and a compute deal with SpaceX
Source: https://www.anthropic.com/news/higher-limits-spacex
Publisher: Anthropic
Category: Deployments
Sector: AI infrastructure
Capability: Agent platform and API infrastructure
Score: 85/100
Claim: We’ve agreed to a partnership with SpaceX that will substantially increase our compute capacity. This, along with our other recent compute deals, means that we’ve been able to increase our usage limits for Claude Code and the Claude API. Below, we describe these changes and the progress we’re making on compute.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Unlocking large scale AI training networks with MRC (Multipath Reliable Connection)
Source: https://openai.com/index/mrc-supercomputer-networking
Publisher: OpenAI
Category: Vendor framing
Sector: Enterprise operations
Capability: Vendor platform capability signal
Score: 64/100
Claim: OpenAI introduces MRC (Multipath Reliable Connection), a new supercomputer networking protocol released via OCP to improve resilience and performance in large-scale AI training clusters.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## GPT-5.5 Instant System Card
Source: https://openai.com/index/gpt-5-5-instant-system-card
Publisher: OpenAI
Category: Benchmarks
Sector: General AI capability
Capability: Frontier model release and benchmark movement
Score: 76/100
Claim: Official OpenAI release: GPT-5.5 Instant System Card.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## GPT-5.5 Instant: smarter, clearer, and more personalized
Source: https://openai.com/index/gpt-5-5-instant
Publisher: OpenAI
Category: Benchmarks
Sector: General AI capability
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: GPT-5.5 Instant updates ChatGPT’s default model with smarter, more accurate answers, reduced hallucinations, and improved personalization controls.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## New ways to buy ChatGPT ads
Source: https://openai.com/index/new-ways-to-buy-chatgpt-ads
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 64/100
Claim: OpenAI expands ChatGPT ads with a beta self-serve Ads Manager, CPC bidding, and enhanced measurement tools—built to protect privacy and keep conversations separate from ads.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Advancing youth safety and wellbeing in EMEA
Source: https://openai.com/index/advancing-youth-safety-in-emea
Publisher: OpenAI
Category: Vendor framing
Sector: Education
Capability: Vendor platform capability signal
Score: 42/100
Claim: Explore OpenAI’s European Youth Safety Blueprint and EMEA Youth & Wellbeing Grants, advancing safe, responsible AI for teens, families, and educators.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Agents for financial services
Source: https://www.anthropic.com/news/finance-agents
Publisher: Anthropic
Category: Vendor framing
Sector: Financial services
Capability: Financial workflow automation
Score: 74/100
Claim: We’re releasing ten ready-to-run agent templates for the most time-consuming work in financial services: building pitchbooks, screening KYC files, and closing the books at month-end. Each one ships as a plugin in Claude Cowork and Claude Code, and as a cookbook for Claude Managed Agents , so a team can put Claude on real financial work in days rather than.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## OpenAI and PwC collaborate to reimagine the office of the CFO
Source: https://openai.com/index/openai-pwc-finance-collaboration
Publisher: OpenAI
Category: Deployments
Sector: Financial services
Capability: Enterprise workflow automation
Score: 73/100
Claim: OpenAI and PwC are partnering to help enterprises use AI agents to automate finance workflows, improve forecasting, strengthen controls, and modernize the CFO function.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## How OpenAI delivers low-latency voice AI at scale
Source: https://openai.com/index/delivering-low-latency-voice-ai-at-scale
Publisher: OpenAI
Category: Vendor framing
Sector: Media and content
Capability: Multimodal content generation and media workflows
Score: 64/100
Claim: How OpenAI rebuilt its WebRTC stack to power real-time Voice AI with low latency, global scale, and seamless conversational turn-taking.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Building a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs
Source: https://www.anthropic.com/news/enterprise-ai-services-company
Publisher: Anthropic
Category: Deployments
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 85/100
Claim: Anthropic, Blackstone, Hellman & Friedman, and Goldman Sachs announced the formation of a new AI services company. The organization will work with mid-sized companies across sectors to bring Claude into their most important operations. Applied AI engineers from Anthropic will work alongside the firm’s engineering team to identify where Claude can have the.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Introducing Advanced Account Security
Source: https://openai.com/index/advanced-account-security
Publisher: OpenAI
Category: Vendor framing
Sector: Cybersecurity
Capability: Cyber defence and misuse monitoring
Score: 52/100
Claim: Introducing Advanced Account Security: phishing-resistant login, stronger recovery, and enhanced protections to safeguard sensitive data and prevent account takeover.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Trinity College Dublin and Microsoft Ireland Research Shows a Widening AI Maturity Gap Between SMEs and Large Organisations
Source: https://news.microsoft.com/source/emea/features/trinity-college-dublin-and-microsoft-ireland-research-shows-a-widening-ai-maturity-gap-between-smes-and-large-organisations/
Publisher: Microsoft Source EMEA / Trinity College Dublin
Category: Labour market
Sector: Irish business productivity and AI adoption
Capability: AI-enabled organisational time savings and maturity gap
Score: 72/100
Claim: The AI Economy Ireland 2026 report says 92% of Irish organisations use or plan to use AI, but only 10% describe deployment as advanced or frontier-level; large organisations are more than twice as likely as SMEs to save 2+ hours per week per employee, while formal AI policy is associated with 10x higher rates of major productivity gains.
Oracle verdict: The release frames this as a readiness and productivity story. The thesis reads it as uneven discontinuity: AI gains compound first where organisations can redesign work, leaving SMEs and lower-confidence workers exposed to a widening capability gap.
Thesis relevance: Appendix III, sections five to seven: labour-market evidence, organisational readiness, and deployment continuation
## Where the goblins came from
Source: https://openai.com/index/where-the-goblins-came-from
Publisher: OpenAI
Category: Vendor framing
Sector: AI infrastructure
Capability: Frontier model release and benchmark movement
Score: 76/100
Claim: How goblin outputs spread in AI models: timeline, root cause, and fixes behind personality-driven quirks in GPT-5 behavior.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Building the compute infrastructure for the Intelligence Age
Source: https://openai.com/index/building-the-compute-infrastructure-for-the-intelligence-age
Publisher: OpenAI
Category: Vendor framing
Sector: AI infrastructure
Capability: Vendor platform capability signal
Score: 64/100
Claim: OpenAI scales Stargate to build the compute infrastructure powering AGI, adding new data center capacity to meet growing AI demand.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Cybersecurity in the Intelligence Age
Source: https://openai.com/index/cybersecurity-in-the-intelligence-age
Publisher: OpenAI
Category: Vendor framing
Sector: Cybersecurity
Capability: Cyber defence and misuse monitoring
Score: 64/100
Claim: OpenAI outlines a five-part action plan for strengthening cybersecurity in the Intelligence Age, focused on democratizing AI-powered cyber defense and protecting critical systems.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Our commitment to community safety
Source: https://openai.com/index/our-commitment-to-community-safety
Publisher: OpenAI
Category: Vendor framing
Sector: Cybersecurity
Capability: Cyber defence and misuse monitoring
Score: 52/100
Claim: Learn how OpenAI protects community safety in ChatGPT through model safeguards, misuse detection, policy enforcement, and collaboration with safety experts.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## OpenAI models, Codex, and Managed Agents come to AWS
Source: https://openai.com/index/openai-on-aws
Publisher: OpenAI
Category: Deployments
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 95/100
Claim: OpenAI GPT models, Codex, and Managed Agents are now available on AWS, enabling enterprises to build secure AI in their AWS environments.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Claude for Creative Work
Source: https://www.anthropic.com/news/claude-for-creative-work
Publisher: Anthropic
Category: Vendor framing
Sector: Media and content
Capability: Multimodal content generation and media workflows
Score: 68/100
Claim: Creative professionals look to technology to expand what's possible in their work. Claude can't replace taste or imagination, but it can open up new ways of working—faster and more ambitious ideation, a more expansive skill set, and the ability for creatives to take on larger-scale projects. AI can also help shoulder the parts of the creative process that.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## OpenAI available at FedRAMP Moderate
Source: https://openai.com/index/openai-available-at-fedramp-moderate
Publisher: OpenAI
Category: Deployments
Sector: Public sector
Capability: Enterprise workflow automation
Score: 85/100
Claim: OpenAI is available at FedRAMP Moderate authorization for ChatGPT Enterprise and the OpenAI API, enabling secure AI adoption for U.S. federal agencies.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## The next phase of the Microsoft OpenAI partnership
Source: https://openai.com/index/next-phase-of-microsoft-partnership
Publisher: OpenAI
Category: Deployments
Sector: Customer operations
Capability: Production AI deployment signal
Score: 85/100
Claim: OpenAI and Microsoft announce an amended agreement that simplifies the partnership, adds long-term clarity, and supports continued AI innovation at scale.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## An open-source spec for orchestration: Symphony
Source: https://openai.com/index/open-source-codex-orchestration-symphony
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 74/100
Claim: Learn how Symphony, an open-source spec for Codex orchestration, turns issue trackers into always-on agent systems—boosting engineering output and reducing context switching.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Choco automates food distribution with AI agents
Source: https://openai.com/index/choco
Publisher: OpenAI
Category: Deployments
Sector: Commerce and marketplace
Capability: Enterprise workflow automation
Score: 96/100
Claim: How Choco used OpenAI APIs to streamline food distribution, boost productivity, and unlock growth—an in-depth customer story on real-world AI impact.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Anthropic names Theo Hourmouzis General Manager of Australia & New Zealand and officially opens Sydney office
Source: https://www.anthropic.com/news/theo-hourmouzis-general-manager-australia-new-zealand
Publisher: Anthropic
Category: Deployments
Sector: General AI capability
Capability: Enterprise workflow automation
Score: 63/100
Claim: Theo Hourmouzis is joining Anthropic as General Manager of Australia and New Zealand, marking the next step in our investment in the region. Hourmouzis will meet with customers and partners this week alongside executives from our global team, as we officially open our Sydney office. Hourmouzis brings more than 20 years of leadership experience in the.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Our principles
Source: https://openai.com/index/our-principles
Publisher: OpenAI
Category: Vendor framing
Sector: Enterprise operations
Capability: Vendor platform capability signal
Score: 54/100
Claim: Our mission is to ensure that AGI benefits all of humanity. Sam Altman shares five principles that guide our work.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## An update on our election safeguards
Source: https://www.anthropic.com/news/election-safeguards-update
Publisher: Anthropic
Category: Vendor framing
Sector: Cybersecurity
Capability: Vendor platform capability signal
Score: 52/100
Claim: People around the world turn to Claude for information about political parties, candidates, and the issues at stake during election time—as well as to answer simpler questions like when, where, and how to vote. In our view, if AI models can answer these questions well (that is, accurately and impartially), they can be a positive force for the democratic.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Anthropic and NEC collaborate to build Japan’s largest AI engineering workforce
Source: https://www.anthropic.com/news/anthropic-nec
Publisher: Anthropic
Category: Labour market
Sector: Enterprise operations
Capability: Education and workforce adoption
Score: 72/100
Claim: NEC Corporation will use Claude as it builds one of Japan’s largest AI-native engineering organizations, making it available to approximately 30,000 NEC Group employees worldwide. As part of this strategic collaboration, NEC will become Anthropic’s first Japan-based global partner. Together, we will develop secure, industry-specific AI products for the.
Oracle verdict: This is a labour-market context signal rather than a single workflow proof point. It helps the thesis track whether adoption, education, wages, and institutional behaviour are moving in the same direction as the capability curve.
Thesis relevance: Appendix III, section five: labour-market and adoption evidence
## GPT-5.5 System Card
Source: https://openai.com/index/gpt-5-5-system-card
Publisher: OpenAI
Category: Benchmarks
Sector: General AI capability
Capability: Frontier model release and benchmark movement
Score: 76/100
Claim: Official OpenAI release: GPT-5.5 System Card.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Introducing GPT-5.5
Source: https://openai.com/index/introducing-gpt-5-5
Publisher: OpenAI
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: Introducing GPT-5.5, our smartest model yet—faster, more capable, and built for complex tasks like coding, research, and data analysis across tools.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## GPT-5.5 Bio Bug Bounty
Source: https://openai.com/index/gpt-5-5-bio-bug-bounty
Publisher: OpenAI
Category: Benchmarks
Sector: General AI capability
Capability: Frontier model release and benchmark movement
Score: 84/100
Claim: Explore the GPT-5.5 Bio Bug Bounty: a red-teaming challenge to find universal jailbreaks for bio safety risks, with rewards up to $25,000.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Making ChatGPT better for clinicians
Source: https://openai.com/index/making-chatgpt-better-for-clinicians
Publisher: OpenAI
Category: Benchmarks
Sector: Healthcare and life sciences
Capability: Healthcare and life-sciences reasoning
Score: 76/100
Claim: OpenAI makes ChatGPT for Clinicians free for verified U.S. physicians, nurse practitioners, and pharmacists, supporting clinical care, documentation, and research.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Introducing workspace agents in ChatGPT
Source: https://openai.com/index/introducing-workspace-agents-in-chatgpt
Publisher: OpenAI
Category: Deployments
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 88/100
Claim: Workspace agents in ChatGPT are Codex-powered agents that automate complex workflows, run in the cloud, and help teams scale work across tools securely.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Speeding up agentic workflows with WebSockets in the Responses API
Source: https://openai.com/index/speeding-up-agentic-workflows-with-websockets
Publisher: OpenAI
Category: Deployments
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 92/100
Claim: A deep dive into the Codex agent loop, showing how WebSockets and connection-scoped caching reduced API overhead and improved model latency.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Introducing OpenAI Privacy Filter
Source: https://openai.com/index/introducing-openai-privacy-filter
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 64/100
Claim: OpenAI Privacy Filter is an open-weight model for detecting and redacting personally identifiable information (PII) in text with state-of-the-art accuracy.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Introducing ChatGPT Images 2.0
Source: https://openai.com/index/introducing-chatgpt-images-2-0
Publisher: OpenAI
Category: Vendor framing
Sector: Customer operations
Capability: Multimodal content generation and media workflows
Score: 64/100
Claim: ChatGPT Images 2.0 introduces a state-of-the-art image generation model with improved text rendering, multilingual support, and advanced visual reasoning.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Scaling Codex to enterprises worldwide
Source: https://openai.com/index/scaling-codex-to-enterprises-worldwide
Publisher: OpenAI
Category: Deployments
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 95/100
Claim: OpenAI launches Codex Labs, partners with with Accenture, PwC, Infosys, and others to help enterprises deploy and scale Codex across the software development lifecycle, and hits 4M Codex WAU.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## OpenAI helps Hyatt advance AI among colleagues
Source: https://openai.com/index/hyatt-advances-ai-with-chatgpt-enterprise
Publisher: OpenAI
Category: Labour market
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: Hyatt deploys ChatGPT Enterprise across its global workforce, using GPT-5.4 and Codex to improve productivity, operations, and guest experiences.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section five: labour-market and adoption evidence
## Anthropic and Amazon expand collaboration for up to 5 gigawatts of new compute
Source: https://www.anthropic.com/news/anthropic-amazon-compute
Publisher: Anthropic
Category: Deployments
Sector: Enterprise operations
Capability: Production AI deployment signal
Score: 85/100
Claim: We have signed a new agreement with Amazon that will deepen our existing partnership and secure up to 5 gigawatts (GW) of capacity for training and deploying Claude, including new Trainium2 capacity coming online in the first half of this year and nearly 1GW total of Trainium2 and Trainium3 capacity coming online by the end of 2026. We have worked closely.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Introducing Claude Design by Anthropic Labs
Source: https://www.anthropic.com/news/claude-design-anthropic-labs
Publisher: Anthropic
Category: Benchmarks
Sector: Scientific research
Capability: Frontier model release and benchmark movement
Score: 95/100
Claim: Today, we’re launching Claude Design, a new Anthropic Labs product that lets you collaborate with Claude to create polished visual work like designs, prototypes, slides, one-pagers, and more. Claude Design is powered by our most capable vision model, Claude Opus 4.7 , and is available in research preview for Claude Pro, Max, Team, and Enterprise.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Codex for (almost) everything
Source: https://openai.com/index/codex-for-almost-everything
Publisher: OpenAI
Category: Deployments
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 88/100
Claim: The updated Codex app for macOS and Windows adds computer use, in-app browsing, image generation, memory, and plugins to accelerate developer workflows.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Introducing GPT-Rosalind for life sciences research
Source: https://openai.com/index/introducing-gpt-rosalind
Publisher: OpenAI
Category: Benchmarks
Sector: Healthcare and life sciences
Capability: Frontier model release and benchmark movement
Score: 86/100
Claim: OpenAI introduces GPT-Rosalind, a frontier reasoning model built to accelerate drug discovery, genomics analysis, protein reasoning, and scientific research workflows.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Accelerating the cyber defense ecosystem that protects us all
Source: https://openai.com/index/accelerating-cyber-defense-ecosystem
Publisher: OpenAI
Category: Deployments
Sector: Cybersecurity
Capability: Frontier model release and benchmark movement
Score: 87/100
Claim: Leading security firms and enterprises join OpenAI’s Trusted Access for Cyber, using GPT-5.4-Cyber and $10M in API grants to strengthen global cyber defense.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Introducing Claude Opus 4.7
Source: https://www.anthropic.com/news/claude-opus-4-7
Publisher: Anthropic
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: Our latest model, Claude Opus 4.7, is now generally available. Opus 4.7 is a notable improvement on Opus 4.6 in advanced software engineering, with particular gains on the most difficult tasks. Users report being able to hand off their hardest coding work—the kind that previously needed close supervision—to Opus 4.7 with confidence. Opus 4.7 handles.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## The next evolution of the Agents SDK
Source: https://openai.com/index/the-next-evolution-of-the-agents-sdk
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Agent platform and API infrastructure
Score: 74/100
Claim: OpenAI updates the Agents SDK with native sandbox execution and a model-native harness, helping developers build secure, long-running agents across files and tools.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Trusted access for the next era of cyber defense
Source: https://openai.com/index/scaling-trusted-access-for-cyber-defense
Publisher: OpenAI
Category: Vendor framing
Sector: Cybersecurity
Capability: Frontier model release and benchmark movement
Score: 64/100
Claim: OpenAI expands its Trusted Access for Cyber program, introducing GPT-5.4-Cyber to vetted defenders and strengthening safeguards as AI cybersecurity capabilities advance.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Anthropic’s Long-Term Benefit Trust appoints Vas Narasimhan to Board of Directors
Source: https://www.anthropic.com/news/narasimhan-board
Publisher: Anthropic
Category: Benchmarks
Sector: Healthcare and life sciences
Capability: Enterprise workflow automation
Score: 54/100
Claim: Vas Narasimhan has been appointed to Anthropic's Board of Directors by the Anthropic Long-Term Benefit Trust. He is a physician-scientist and the Chief Executive Officer of Novartis—one of the world's leading innovative medicines companies—and shares Anthropic’s conviction that healthcare and life sciences are among the areas where AI has the greatest.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Enterprises power agentic workflows in Cloudflare Agent Cloud with OpenAI
Source: https://openai.com/index/cloudflare-openai-agent-cloud
Publisher: OpenAI
Category: Deployments
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: Cloudflare brings OpenAI’s GPT-5.4 and Codex to Agent Cloud, enabling enterprises to build, deploy, and scale AI agents for real-world tasks with speed and security.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Our response to the Axios developer tool compromise
Source: https://openai.com/index/axios-developer-tool-compromise
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Cyber defence and misuse monitoring
Score: 64/100
Claim: OpenAI responds to the Axios supply chain attack by rotating macOS code signing certificates, updating apps, and confirming no user data was compromised.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## CyberAgent moves faster with ChatGPT Enterprise and Codex
Source: https://openai.com/index/cyberagent
Publisher: OpenAI
Category: Deployments
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 96/100
Claim: CyberAgent uses ChatGPT Enterprise and Codex to securely scale AI adoption, improve quality, and accelerate decisions across advertising, media, and gaming.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## OpenAI Full Fan Mode Contest: Terms & Conditions
Source: https://openai.com/index/full-fan-mode-contest-terms-conditions
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 42/100
Claim: Explore the official terms and conditions for the OpenAI Full Fan Mode Contest, including eligibility, entry steps, judging criteria, and prize details. Learn how to participate, submit your entry on Instagram, and win IPL match tickets.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## The next phase of enterprise AI
Source: https://openai.com/index/next-phase-of-enterprise-ai
Publisher: OpenAI
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 93/100
Claim: OpenAI outlines the next phase of enterprise AI, as adoption accelerates across industries with Frontier, ChatGPT Enterprise, Codex, and company-wide AI agents.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Introducing the Child Safety Blueprint
Source: https://openai.com/index/introducing-child-safety-blueprint
Publisher: OpenAI
Category: Vendor framing
Sector: Cybersecurity
Capability: Vendor platform capability signal
Score: 52/100
Claim: Discover OpenAI’s Child Safety Blueprint—a roadmap for building AI responsibly with safeguards, age-appropriate design, and collaboration to protect and empower young people online.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Announcing the OpenAI Safety Fellowship
Source: https://openai.com/index/introducing-openai-safety-fellowship
Publisher: OpenAI
Category: Benchmarks
Sector: Customer operations
Capability: Model and benchmark capability movement
Score: 54/100
Claim: A pilot program to support independent safety and alignment research and develop the next generation of talent.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Industrial policy for the Intelligence Age
Source: https://openai.com/index/industrial-policy-for-the-intelligence-age
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 52/100
Claim: Explore our ambitious, people-first industrial policy ideas for the AI era—focused on expanding opportunity, sharing prosperity, and building resilient institutions as advanced intelligence evolves.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Anthropic expands partnership with Google and Broadcom for multiple gigawatts of next-generation compute
Source: https://www.anthropic.com/news/google-broadcom-partnership-compute
Publisher: Anthropic
Category: Benchmarks
Sector: AI infrastructure
Capability: Frontier model release and benchmark movement
Score: 83/100
Claim: We have signed a new agreement with Google and Broadcom for multiple gigawatts of next-generation TPU capacity that we expect to come online starting in 2027. This significant expansion of our compute infrastructure will power our frontier Claude models and help us serve extraordinary demand from customers worldwide. “This groundbreaking partnership with.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## OpenAI acquires TBPN
Source: https://openai.com/index/openai-acquires-tbpn
Publisher: OpenAI
Category: Vendor framing
Sector: Customer operations
Capability: Enterprise workflow automation
Score: 64/100
Claim: OpenAI acquires TBPN to accelerate global conversations around AI and support independent media, expanding dialogue with builders, businesses, and the broader tech community.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Codex now offers more flexible pricing for teams
Source: https://openai.com/index/codex-flexible-pricing-for-teams
Publisher: OpenAI
Category: Deployments
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 95/100
Claim: Codex now includes pay-as-you-go pricing for ChatGPT Business and Enterprise, providing teams a more flexible option to start and scale adoption.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Gradient Labs gives every bank customer an AI account manager
Source: https://openai.com/index/gradient-labs
Publisher: OpenAI
Category: Deployments
Sector: Financial services
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: Gradient Labs uses GPT-4.1 and GPT-5.4 mini and nano to power AI agents that automate banking support workflows with low latency and high reliability.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Accelerating the next phase of AI
Source: https://openai.com/index/accelerating-the-next-phase-ai
Publisher: OpenAI
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 85/100
Claim: OpenAI raises $122 billion in new funding to expand frontier AI globally, invest in next-generation compute, and meet growing demand for ChatGPT, Codex, and enterprise AI.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Australian government and Anthropic sign MOU for AI safety and research
Source: https://www.anthropic.com/news/australia-MOU
Publisher: Anthropic
Category: Benchmarks
Sector: Public sector
Capability: Model and benchmark capability movement
Score: 75/100
Claim: Today, Anthropic signed a Memorandum of Understanding with the Australian government to cooperate on AI safety research and support the goals of Australia’s National AI Plan. Our CEO, Dario Amodei, met with Prime Minister Anthony Albanese to formalize the agreement during a visit to Canberra, Australia. We also announced AUD$3 million in partnerships with.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Helping disaster response teams turn AI into action across Asia
Source: https://openai.com/index/helping-disaster-response-teams-asia
Publisher: OpenAI
Category: Vendor framing
Sector: Enterprise operations
Capability: Vendor platform capability signal
Score: 54/100
Claim: AI for Disaster Response in Asia: OpenAI Workshop with Gates Foundation.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## STADLER reshapes knowledge work at a 230-year-old company
Source: https://openai.com/index/stadler
Publisher: OpenAI
Category: Deployments
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 82/100
Claim: Learn how STADLER uses ChatGPT to transform knowledge work, saving time and accelerating productivity across 650 employees.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Inside our approach to the Model Spec
Source: https://openai.com/index/our-approach-to-the-model-spec
Publisher: OpenAI
Category: Vendor framing
Sector: Enterprise operations
Capability: Vendor platform capability signal
Score: 52/100
Claim: Learn how OpenAI’s Model Spec serves as a public framework for model behavior, balancing safety, user freedom, and accountability as AI systems advance.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Introducing the OpenAI Safety Bug Bounty program
Source: https://openai.com/index/safety-bug-bounty
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 62/100
Claim: OpenAI launches a Safety Bug Bounty program to identify AI abuse and safety risks, including agentic vulnerabilities, prompt injection, and data exfiltration.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Helping developers build safer AI experiences for teens
Source: https://openai.com/index/teen-safety-policies-gpt-oss-safeguard
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Vendor platform capability signal
Score: 52/100
Claim: OpenAI releases prompt-based teen safety policies for developers using gpt-oss-safeguard, helping moderate age-specific risks in AI systems.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Powering product discovery in ChatGPT
Source: https://openai.com/index/powering-product-discovery-in-chatgpt
Publisher: OpenAI
Category: Deployments
Sector: Commerce and marketplace
Capability: Production AI deployment signal
Score: 88/100
Claim: ChatGPT introduces richer, visually immersive shopping powered by the Agentic Commerce Protocol, enabling product discovery, side-by-side comparisons, and merchant integration.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Update on the OpenAI Foundation
Source: https://openai.com/index/update-on-the-openai-foundation
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Education and workforce adoption
Score: 58/100
Claim: The OpenAI Foundation announces plans to invest at least $1 billion in curing diseases, economic opportunity, AI resilience, and community programs.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Creating with Sora Safely
Source: https://openai.com/index/creating-with-sora-safely
Publisher: OpenAI
Category: Vendor framing
Sector: Media and content
Capability: Multimodal content generation and media workflows
Score: 42/100
Claim: To address the novel safety challenges posed by a state-of-the-art video model as well as a new social creation platform, we’ve built Sora 2 and the Sora app with safety at the foundation. Our approach is anchored in concrete protections.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## How we monitor internal coding agents for misalignment
Source: https://openai.com/index/how-we-monitor-internal-coding-agents-misalignment
Publisher: OpenAI
Category: Deployments
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 83/100
Claim: How OpenAI uses chain-of-thought monitoring to study misalignment in internal coding agents—analyzing real-world deployments to detect risks and strengthen AI safety safeguards.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## OpenAI to acquire Astral
Source: https://openai.com/index/openai-to-acquire-astral
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 74/100
Claim: Accelerates Codex growth to power the next generation of Python developer tools.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Introducing GPT-5.4 mini and nano
Source: https://openai.com/index/introducing-gpt-5-4-mini-and-nano
Publisher: OpenAI
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: GPT-5.4 mini and nano are smaller, faster versions of GPT-5.4 optimized for coding, tool use, multimodal reasoning, and high-volume API and sub-agent workloads.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## OpenAI Japan announces Japan Teen Safety Blueprint to put teen safety first
Source: https://openai.com/index/japan-teen-safety-blueprint
Publisher: OpenAI
Category: Vendor framing
Sector: Cybersecurity
Capability: Vendor platform capability signal
Score: 52/100
Claim: OpenAI Japan announces the Japan Teen Safety Blueprint, introducing stronger age protections, parental controls, and well-being safeguards for teens using generative AI.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Equipping workers with insights about compensation
Source: https://openai.com/index/equipping-workers-with-insights-about-compensation
Publisher: OpenAI
Category: Labour market
Sector: Scientific research
Capability: Education and workforce adoption
Score: 76/100
Claim: New research shows Americans send nearly 3 million daily messages to ChatGPT asking about compensation and earnings, helping close the wage information gap.
Oracle verdict: This is a labour-market context signal rather than a single workflow proof point. It helps the thesis track whether adoption, education, wages, and institutional behaviour are moving in the same direction as the capability curve.
Thesis relevance: Appendix III, section five: labour-market and adoption evidence
## Why Codex Security Doesn’t Include a SAST Report
Source: https://openai.com/index/why-codex-security-doesnt-include-sast
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 74/100
Claim: A deep dive into why Codex Security doesn’t rely on traditional SAST, instead using AI-driven constraint reasoning and validation to find real vulnerabilities with fewer false positives.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Anthropic invests $100 million into the Claude Partner Network
Source: https://www.anthropic.com/news/claude-partner-network
Publisher: Anthropic
Category: Deployments
Sector: Customer operations
Capability: Enterprise workflow automation
Score: 89/100
Claim: We’re launching the Claude Partner Network, a program for partner organizations helping enterprises adopt Claude. We’re committing an initial $100 million to support our partners with training courses, dedicated technical support, and joint market development. Partners who join from today will get immediate access to a new technical certification and be.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Designing AI agents to resist prompt injection
Source: https://openai.com/index/designing-agents-to-resist-prompt-injection
Publisher: OpenAI
Category: Deployments
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 88/100
Claim: How ChatGPT defends against prompt injection and social engineering by constraining risky actions and protecting sensitive data in agent workflows.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## From model to agent: Equipping the Responses API with a computer environment
Source: https://openai.com/index/equip-responses-api-computer-environment
Publisher: OpenAI
Category: Vendor framing
Sector: AI infrastructure
Capability: Agent platform and API infrastructure
Score: 74/100
Claim: How OpenAI built an agent runtime using the Responses API, shell tool, and hosted containers to run secure, scalable agents with files, tools, and state.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Rakuten fixes issues twice as fast with Codex
Source: https://openai.com/index/rakuten
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 74/100
Claim: Official OpenAI release: Rakuten fixes issues twice as fast with Codex.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Wayfair boosts catalog accuracy and support speed with OpenAI
Source: https://openai.com/index/wayfair
Publisher: OpenAI
Category: Deployments
Sector: Customer operations
Capability: Production AI deployment signal
Score: 82/100
Claim: Wayfair uses OpenAI models to improve ecommerce support and product catalog accuracy, automating ticket triage and enhancing millions of product attributes at scale.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Introducing The Anthropic Institute
Source: https://www.anthropic.com/news/the-anthropic-institute
Publisher: Anthropic
Category: Benchmarks
Sector: Scientific research
Capability: Model and benchmark capability movement
Score: 76/100
Claim: We’re launching The Anthropic Institute , a new effort to confront the most significant challenges that powerful AI will pose to our societies. The Anthropic Institute will draw on research from across Anthropic to provide information that other researchers and the public can use during our transition to a world containing much more powerful AI systems. In.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Improving instruction hierarchy in frontier LLMs
Source: https://openai.com/index/instruction-hierarchy-challenge
Publisher: OpenAI
Category: Benchmarks
Sector: Cybersecurity
Capability: Frontier model release and benchmark movement
Score: 64/100
Claim: IH-Challenge trains models to prioritize trusted instructions, improving instruction hierarchy, safety steerability, and resistance to prompt injection attacks.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## New ways to learn math and science in ChatGPT
Source: https://openai.com/index/new-ways-to-learn-math-and-science-in-chatgpt
Publisher: OpenAI
Category: Benchmarks
Sector: Education
Capability: Education and workforce adoption
Score: 76/100
Claim: ChatGPT introduces interactive visual explanations for math and science, helping students explore formulas, variables, and concepts in real time.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Sydney will become Anthropic’s fourth office in Asia-Pacific
Source: https://www.anthropic.com/news/sydney-fourth-office-asia-pacific
Publisher: Anthropic
Category: Vendor framing
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 42/100
Claim: Anthropic is expanding to Australia and New Zealand. In the coming weeks, we will open an office in Sydney—our fourth office in Asia-Pacific, alongside Tokyo, Bengaluru, and Seoul. The expansion reflects strong demand from businesses in Australia and New Zealand and will help us better serve the countries’ unique AI ecosystems. In addition to hiring a team.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## OpenAI to acquire Promptfoo
Source: https://openai.com/index/openai-to-acquire-promptfoo
Publisher: OpenAI
Category: Deployments
Sector: Cybersecurity
Capability: Enterprise workflow automation
Score: 85/100
Claim: OpenAI is acquiring Promptfoo, an AI security platform that helps enterprises identify and remediate vulnerabilities in AI systems during development.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Codex Security: now in research preview
Source: https://openai.com/index/codex-security-now-in-research-preview
Publisher: OpenAI
Category: Benchmarks
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 86/100
Claim: Codex Security is an AI application security agent that analyzes project context to detect, validate, and patch complex vulnerabilities with higher confidence and less noise.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## How Balyasny Asset Management built an AI research engine
Source: https://openai.com/index/balyasny-asset-management
Publisher: OpenAI
Category: Benchmarks
Sector: Scientific research
Capability: Enterprise workflow automation
Score: 86/100
Claim: By combining rigorous model evaluation, full-platform use of OpenAI, and agent workflows, Balyasny is reinventing investment research.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## How Descript engineers multilingual video dubbing at scale
Source: https://openai.com/index/descript
Publisher: OpenAI
Category: Benchmarks
Sector: Media and content
Capability: Multimodal content generation and media workflows
Score: 76/100
Claim: Using OpenAI reasoning models, Descript unlocked automatic localization of large content libraries without losing timing or meaning.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Partnering with Mozilla to improve Firefox’s security
Source: https://www.anthropic.com/news/mozilla-firefox-security
Publisher: Anthropic
Category: Benchmarks
Sector: Software engineering
Capability: Cyber defence and misuse monitoring
Score: 76/100
Claim: AI models can now independently identify high-severity vulnerabilities in complex software. As we recently documented, Claude found more than 500 zero-day vulnerabilities (security flaws that are unknown to the software’s maintainers) in well-tested open-source software. In this post, we share details of a collaboration with researchers at Mozilla in which.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Introducing GPT-5.4
Source: https://openai.com/index/introducing-gpt-5-4
Publisher: OpenAI
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: Introducing GPT-5.4, OpenAI’s most most capable and efficient frontier model for professional work, with state-of-the-art coding, computer use, tool search, and 1M-token context.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## GPT-5.4 Thinking System Card
Source: https://openai.com/index/gpt-5-4-thinking-system-card
Publisher: OpenAI
Category: Benchmarks
Sector: General AI capability
Capability: Frontier model release and benchmark movement
Score: 76/100
Claim: Official OpenAI release: GPT-5.4 Thinking System Card.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Reasoning models struggle to control their chains of thought, and that’s good
Source: https://openai.com/index/reasoning-models-chain-of-thought-controllability
Publisher: OpenAI
Category: Benchmarks
Sector: Cybersecurity
Capability: Model and benchmark capability movement
Score: 64/100
Claim: OpenAI introduces CoT-Control and finds reasoning models struggle to control their chains of thought, reinforcing monitorability as an AI safety safeguard.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Ensuring AI use in education leads to opportunity
Source: https://openai.com/index/ai-education-opportunity
Publisher: OpenAI
Category: Vendor framing
Sector: Education
Capability: Education and workforce adoption
Score: 64/100
Claim: OpenAI shares new tools, certifications, and measurement resources to help schools and universities close AI capability gaps and expand opportunity.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## The five AI value models driving business reinvention
Source: https://openai.com/index/the-five-ai-value-models-driving-business-reinvention
Publisher: OpenAI
Category: Labour market
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 72/100
Claim: Five AI value models show how leaders can sequence AI from workforce fluency to process reinvention and build durable business advantage.
Oracle verdict: This is a labour-market context signal rather than a single workflow proof point. It helps the thesis track whether adoption, education, wages, and institutional behaviour are moving in the same direction as the capability curve.
Thesis relevance: Appendix III, section five: labour-market and adoption evidence
## VfL Wolfsburg turns ChatGPT into a club-wide capability
Source: https://openai.com/index/vfl-wolfsburg
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 64/100
Claim: By focusing on people, not pilots, the Bundesliga club is scaling efficiency, creativity, and knowledge—without losing its football identity.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Introducing the Adoption news channel
Source: https://openai.com/index/introducing-the-adoption-news-channel
Publisher: OpenAI
Category: Vendor framing
Sector: Media and content
Capability: Enterprise workflow automation
Score: 64/100
Claim: Practical insights and frameworks to turn AI progress into business advantage.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Introducing ChatGPT for Excel and new financial data integrations
Source: https://openai.com/index/chatgpt-for-excel
Publisher: OpenAI
Category: Benchmarks
Sector: Financial services
Capability: Frontier model release and benchmark movement
Score: 88/100
Claim: OpenAI introduces ChatGPT for Excel and new financial app integrations, powered by GPT-5.4 to accelerate modeling, research, and analysis in regulated environments.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Where things stand with the Department of War
Source: https://www.anthropic.com/news/where-stand-department-war
Publisher: Anthropic
Category: Vendor framing
Sector: Cybersecurity
Capability: Cyber defence and misuse monitoring
Score: 64/100
Claim: Yesterday (March 4) Anthropic received a letter from the Department of War confirming that we have been designated as a supply chain risk to America’s national security. As we wrote on Friday , we do not believe this action is legally sound, and we see no choice but to challenge it in court.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Extending single-minus amplitudes to gravitons
Source: https://openai.com/index/extending-single-minus-amplitudes-to-gravitons
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Frontier model release and benchmark movement
Score: 76/100
Claim: A new preprint extends single-minus amplitudes to gravitons, with GPT-5.2 Pro helping derive and verify nonzero graviton tree amplitudes in quantum gravity.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## How Axios uses AI to help deliver high-impact local journalism
Source: https://openai.com/index/axios-allison-murphy
Publisher: OpenAI
Category: Deployments
Sector: Customer operations
Capability: Enterprise workflow automation
Score: 88/100
Claim: Axios COO Allison Murphy explains how the company uses AI to support local reporters, streamline newsroom workflows, and deliver high-impact local journalism at scale.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Understanding AI and learning outcomes
Source: https://openai.com/index/understanding-ai-and-learning-outcomes
Publisher: OpenAI
Category: Vendor framing
Sector: Education
Capability: Education and workforce adoption
Score: 64/100
Claim: OpenAI introduces the Learning Outcomes Measurement Suite to assess AI’s impact on student learning across diverse educational environments over time.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## GPT-5.3 Instant: Smoother, more useful everyday conversations
Source: https://openai.com/index/gpt-5-3-instant
Publisher: OpenAI
Category: Benchmarks
Sector: General AI capability
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: Official OpenAI release: GPT-5.3 Instant: Smoother, more useful everyday conversations.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## GPT-5.3 Instant System Card
Source: https://openai.com/index/gpt-5-3-instant-system-card
Publisher: OpenAI
Category: Benchmarks
Sector: General AI capability
Capability: Frontier model release and benchmark movement
Score: 76/100
Claim: Official OpenAI release: GPT-5.3 Instant System Card.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Our agreement with the Department of War
Source: https://openai.com/index/our-agreement-with-the-department-of-war
Publisher: OpenAI
Category: Deployments
Sector: Public sector
Capability: Production AI deployment signal
Score: 73/100
Claim: Details on OpenAI’s contract with the Department of War, outlining safety red lines, legal protections, and how AI systems will be deployed in classified environments.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Joint Statement from OpenAI and Microsoft
Source: https://openai.com/index/continuing-microsoft-partnership
Publisher: OpenAI
Category: Benchmarks
Sector: Scientific research
Capability: Model and benchmark capability movement
Score: 76/100
Claim: Microsoft and OpenAI continue to work closely across research, engineering, and product development, building on years of deep collaboration and shared success.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## OpenAI and Amazon announce strategic partnership
Source: https://openai.com/index/amazon-partnership
Publisher: OpenAI
Category: Benchmarks
Sector: Enterprise operations
Capability: Frontier model release and benchmark movement
Score: 93/100
Claim: OpenAI and Amazon announce a strategic partnership bringing OpenAI’s Frontier platform to AWS, expanding AI infrastructure, custom models, and enterprise AI agents.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Introducing the Stateful Runtime Environment for Agents in Amazon Bedrock
Source: https://openai.com/index/introducing-the-stateful-runtime-environment-for-agents-in-amazon-bedrock
Publisher: OpenAI
Category: Deployments
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 88/100
Claim: Stateful Runtime for Agents in Amazon Bedrock brings persistent orchestration, memory, and secure execution to multi-step AI workflows powered by OpenAI.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Scaling AI for everyone
Source: https://openai.com/index/scaling-ai-for-everyone
Publisher: OpenAI
Category: Deployments
Sector: Financial services
Capability: Financial workflow automation
Score: 78/100
Claim: Today we’re announcing $110B in new investment at a $730B pre money valuation. This includes $30B from SoftBank, $30B from NVIDIA, and $50B from Amazon.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## An update on our mental health-related work
Source: https://openai.com/index/update-on-mental-health-related-work
Publisher: OpenAI
Category: Vendor framing
Sector: Healthcare and life sciences
Capability: Healthcare and life-sciences reasoning
Score: 52/100
Claim: OpenAI shares updates on its mental health safety work, including parental controls, trusted contacts, improved distress detection, and recent litigation developments.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Statement on the comments from Secretary of War Pete Hegseth
Source: https://www.anthropic.com/news/statement-comments-secretary-war
Publisher: Anthropic
Category: Vendor framing
Sector: Public sector
Capability: Vendor platform capability signal
Score: 64/100
Claim: Earlier today, Secretary of War Pete Hegseth shared on X that he is directing the Department of War to designate Anthropic a supply chain risk. This action follows months of negotiations that reached an impasse over two exceptions we requested to the lawful use of our AI model, Claude: the mass domestic surveillance of Americans and fully autonomous.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Pacific Northwest National Laboratory and OpenAI partner to accelerate federal permitting
Source: https://openai.com/index/pacific-northwest-national-laboratory
Publisher: OpenAI
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 86/100
Claim: OpenAI and Pacific Northwest National Laboratory introduce DraftNEPABench, a new benchmark evaluating how AI coding agents can accelerate federal permitting—showing potential to reduce NEPA drafting time by up to 15% and modernize infrastructure reviews.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## OpenAI Codex and Figma launch seamless code-to-design experience
Source: https://openai.com/index/figma-partnership
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 78/100
Claim: OpenAI and Figma launch a new Codex integration that connects code and design, enabling teams to move between implementation and the Figma canvas to iterate and ship faster.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Statement from Dario Amodei on our discussions with the Department of War
Source: https://www.anthropic.com/news/statement-department-of-war
Publisher: Anthropic
Category: Benchmarks
Sector: Public sector
Capability: Frontier model release and benchmark movement
Score: 83/100
Claim: I believe deeply in the existential importance of using AI to defend the United States and other democracies, and to defeat our autocratic adversaries. Anthropic has therefore worked proactively to deploy our models to the Department of War and the intelligence community. We were the first frontier AI company to deploy our models in the US government’s.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Disrupting malicious uses of AI | February 2026
Source: https://openai.com/index/disrupting-malicious-ai-uses
Publisher: OpenAI
Category: Deployments
Sector: General AI capability
Capability: Production AI deployment signal
Score: 78/100
Claim: Our latest threat report examines how malicious actors combine AI models with websites and social platforms—and what it means for detection and defense.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Anthropic acquires Vercept to advance Claude's computer use capabilities
Source: https://www.anthropic.com/news/acquires-vercept
Publisher: Anthropic
Category: Benchmarks
Sector: Media and content
Capability: Autonomous software engineering and computer-use agents
Score: 64/100
Claim: People are using Claude for increasingly complex work—writing and running code across entire repositories, synthesizing research from dozens of sources, and managing workflows that span multiple tools and teams. Computer use enables Claude to do all of that inside live applications, the way a person at a keyboard would. That means Claude can take on.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Arvind KC appointed Chief People Officer
Source: https://openai.com/index/arvind-kc-chief-people-officer
Publisher: OpenAI
Category: Vendor framing
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 42/100
Claim: OpenAI appoints Arvind KC as Chief People Officer to help scale the company, strengthen its culture, and lead how work evolves in the age of AI.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Anthropic’s Responsible Scaling Policy: Version 3.0
Source: https://www.anthropic.com/news/responsible-scaling-policy-v3
Publisher: Anthropic
Category: Vendor framing
Sector: Enterprise operations
Capability: Vendor platform capability signal
Score: 52/100
Claim: We’re releasing the third version of our Responsible Scaling Policy (RSP), the voluntary framework we use to mitigate catastrophic risks from AI systems. Anthropic has now had an RSP for more than two years, and we’ve learned a great deal about its benefits and its shortcomings. We’re therefore updating the policy to reinforce what has worked well to date.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Why we no longer evaluate SWE-bench Verified
Source: https://openai.com/index/why-we-no-longer-evaluate-swe-bench-verified
Publisher: OpenAI
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 76/100
Claim: SWE-bench Verified is increasingly contaminated and mismeasures frontier coding progress. Our analysis shows flawed tests and training leakage. We recommend SWE-bench Pro.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## OpenAI announces Frontier Alliance Partners
Source: https://openai.com/index/frontier-alliance-partners
Publisher: OpenAI
Category: Benchmarks
Sector: Enterprise operations
Capability: Frontier model release and benchmark movement
Score: 93/100
Claim: OpenAI announces Frontier Alliance Partners to help enterprises move from AI pilots to production with secure, scalable agent deployments.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Detecting and preventing distillation attacks
Source: https://www.anthropic.com/news/detecting-and-preventing-distillation-attacks
Publisher: Anthropic
Category: Vendor framing
Sector: Cybersecurity
Capability: Cyber defence and misuse monitoring
Score: 46/100
Claim: We have identified industrial-scale campaigns by three AI laboratories—DeepSeek, Moonshot, and MiniMax—to illicitly extract Claude’s capabilities to improve their own models. These labs generated over 16 million exchanges with Claude through approximately 24,000 fraudulent accounts, in violation of our terms of service and regional access restrictions.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Our First Proof submissions
Source: https://openai.com/index/first-proof-submissions
Publisher: OpenAI
Category: Benchmarks
Sector: Scientific research
Capability: Model and benchmark capability movement
Score: 76/100
Claim: We share our AI model’s proof attempts for the First Proof math challenge, testing research-grade reasoning on expert-level problems.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Making frontier cybersecurity capabilities available to defenders
Source: https://www.anthropic.com/news/claude-code-security
Publisher: Anthropic
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 76/100
Claim: Claude Code Security , a new capability built into Claude Code on the web, is now available in a limited research preview. It scans codebases for security vulnerabilities and suggests targeted software patches for human review, allowing teams to find and fix security issues that traditional methods often miss. Security teams face a common challenge: too.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Advancing independent research on AI alignment
Source: https://openai.com/index/advancing-independent-research-ai-alignment
Publisher: OpenAI
Category: Benchmarks
Sector: Cybersecurity
Capability: Cyber defence and misuse monitoring
Score: 64/100
Claim: OpenAI commits $7.5M to The Alignment Project to fund independent AI alignment research, strengthening global efforts to address AGI safety and security risks.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Introducing OpenAI for India
Source: https://openai.com/index/openai-for-india
Publisher: OpenAI
Category: Labour market
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 79/100
Claim: OpenAI for India expands AI access across the country—building local infrastructure, powering enterprises, and advancing workforce skills.
Oracle verdict: This is a labour-market context signal rather than a single workflow proof point. It helps the thesis track whether adoption, education, wages, and institutional behaviour are moving in the same direction as the capability curve.
Thesis relevance: Appendix III, section five: labour-market and adoption evidence
## Introducing EVMbench
Source: https://openai.com/index/introducing-evmbench
Publisher: OpenAI
Category: Benchmarks
Sector: General AI capability
Capability: Frontier model release and benchmark movement
Score: 86/100
Claim: OpenAI and Paradigm introduce EVMbench, a benchmark evaluating AI agents’ ability to detect, patch, and exploit high-severity smart contract vulnerabilities.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Introducing Claude Sonnet 4.6
Source: https://www.anthropic.com/news/claude-sonnet-4-6
Publisher: Anthropic
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: Claude Sonnet 4.6 is our most capable Sonnet model yet . It’s a full upgrade of the model’s skills across coding, computer use, long-context reasoning, agent planning, knowledge work, and design. Sonnet 4.6 also features a 1M token context window in beta. For those on our Free and Pro plans , Claude Sonnet 4.6 is now the default model in claude.ai and.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Anthropic and the Government of Rwanda sign MOU for AI in health and education
Source: https://www.anthropic.com/news/anthropic-rwanda-mou
Publisher: Anthropic
Category: Deployments
Sector: Healthcare and life sciences
Capability: Healthcare and life-sciences reasoning
Score: 85/100
Claim: The Government of Rwanda and Anthropic have signed a three-year Memorandum of Understanding to formalize and expand our partnership, bringing AI to Rwanda’s education, health, and public sector systems. This agreement builds on the ALX education partnership we announced in November 2025 and marks the first time Anthropic has formalized a multi-sector.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Anthropic and Infosys collaborate to build AI agents for telecommunications and other regulated industries
Source: https://www.anthropic.com/news/anthropic-infosys
Publisher: Anthropic
Category: Deployments
Sector: Software engineering
Capability: Enterprise workflow automation
Score: 95/100
Claim: Anthropic and Infosys , a global leader in next-generation digital services and consulting founded and headquartered in Bengaluru, today announced a collaboration to develop and deliver enterprise AI solutions across telecommunications, financial services, manufacturing, and software development. The collaboration integrates Anthropic’s Claude models and.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Anthropic opens Bengaluru office and announces new partnerships across India
Source: https://www.anthropic.com/news/bengaluru-office-partnerships-across-india
Publisher: Anthropic
Category: Benchmarks
Sector: Software engineering
Capability: Enterprise workflow automation
Score: 61/100
Claim: India is the second-largest market for Claude.ai , home to a developer community doing some of the most technically intense AI work we see anywhere. Nearly half of Claude usage in India comprises computer and mathematical tasks: building applications, modernizing systems, and shipping production software. Today, as we officially open our Bengaluru office.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## GPT-5.2 derives a new result in theoretical physics
Source: https://openai.com/index/new-result-theoretical-physics
Publisher: OpenAI
Category: Benchmarks
Sector: Scientific research
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: A new preprint shows GPT-5.2 proposing a new formula for a gluon amplitude, later formally proved and verified by OpenAI and academic collaborators.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Introducing Lockdown Mode and Elevated Risk labels in ChatGPT
Source: https://openai.com/index/introducing-lockdown-mode-and-elevated-risk-labels-in-chatgpt
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 64/100
Claim: Introducing Lockdown Mode and Elevated Risk labels in ChatGPT to help organizations defend against prompt injection and AI-driven data exfiltration.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Beyond rate limits: scaling access to Codex and Sora
Source: https://openai.com/index/beyond-rate-limits
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 74/100
Claim: How OpenAI built a real-time access system combining rate limits, usage tracking, and credits to power continuous access to Sora and Codex.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Scaling social science research
Source: https://openai.com/index/scaling-social-science-research
Publisher: OpenAI
Category: Benchmarks
Sector: Media and content
Capability: Multimodal content generation and media workflows
Score: 76/100
Claim: GABRIEL is a new open-source toolkit from OpenAI that uses GPT to turn qualitative text and images into quantitative data, helping social scientists analyze research at scale.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Chris Liddell appointed to Anthropic’s board of directors
Source: https://www.anthropic.com/news/chris-liddell-appointed-anthropic-board
Publisher: Anthropic
Category: Vendor framing
Sector: Financial services
Capability: Enterprise workflow automation
Score: 42/100
Claim: Chris Liddell has been appointed to Anthropic’s Board of Directors. He brings over 30 years of senior leadership experience across some of the world's largest and most complex organizations to the role. He previously served as Chief Financial Officer of Microsoft, General Motors, and International Paper, as well as the Deputy White House Chief of Staff.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Anthropic partners with CodePath to bring Claude to the US’s largest collegiate computer science program
Source: https://www.anthropic.com/news/anthropic-codepath-partnership
Publisher: Anthropic
Category: Benchmarks
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 76/100
Claim: Anthropic is partnering with CodePath, the nation’s largest provider of collegiate computer science education, to redesign its coding curriculum as AI reshapes the field of software development. CodePath will put Claude and Claude Code at the center of its courses and career programs, giving more than 20,000 students at community colleges, state schools.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Introducing GPT-5.3-Codex-Spark
Source: https://openai.com/index/introducing-gpt-5-3-codex-spark
Publisher: OpenAI
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: Introducing GPT-5.3-Codex-Spark—our first real-time coding model. 15x faster generation, 128k context, now in research preview for ChatGPT Pro users.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Anthropic is donating $20 million to Public First Action
Source: https://www.anthropic.com/news/donate-public-first-action
Publisher: Anthropic
Category: Benchmarks
Sector: Cybersecurity
Capability: Cyber defence and misuse monitoring
Score: 90/100
Claim: AI will bring enormous benefits —for science, technology, medicine, economic growth, and much more. But a technology this powerful also comes with considerable risks . Those risks might come from the misuse of the models: AI is already being exploited to automate cyberattacks ; in the future it might assist in the production of dangerous weapons . Risks.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Anthropic raises $30 billion in Series G funding at $380 billion post-money valuation
Source: https://www.anthropic.com/news/anthropic-raises-30-billion-series-g-funding-380-billion-post-money-valuation
Publisher: Anthropic
Category: Benchmarks
Sector: Scientific research
Capability: Frontier model release and benchmark movement
Score: 68/100
Claim: We have raised $30 billion in Series G funding led by GIC and Coatue, valuing Anthropic at $380 billion post-money. The round was co-led by D. E. Shaw Ventures, Dragoneer, Founders Fund, ICONIQ, and MGX. The investment will fuel the frontier research, product development, and infrastructure expansions that have made Anthropic the market leader in.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Harness engineering: leveraging Codex in an agent-first world
Source: https://openai.com/index/harness-engineering
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 74/100
Claim: By Ryan Lopopolo, Member of the Technical Staff.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Covering electricity price increases from our data centers
Source: https://www.anthropic.com/news/covering-electricity-price-increases
Publisher: Anthropic
Category: Benchmarks
Sector: AI infrastructure
Capability: Frontier model release and benchmark movement
Score: 76/100
Claim: As we continue to invest in American AI infrastructure , Anthropic will cover electricity price increases that consumers face from our data centers. Training a single frontier AI model will soon require gigawatts of power, and the US AI sector will need at least 50 gigawatts of capacity over the next several years. The country needs to build new data.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Bringing ChatGPT to GenAI.mil
Source: https://openai.com/index/bringing-chatgpt-to-genaimil
Publisher: OpenAI
Category: Deployments
Sector: Public sector
Capability: Production AI deployment signal
Score: 73/100
Claim: OpenAI for Government announces the deployment of a custom ChatGPT on GenAI.mil, bringing secure, safety-forward AI to U.S. defense teams.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Making AI work for everyone, everywhere: our approach to localization
Source: https://openai.com/index/our-approach-to-localization
Publisher: OpenAI
Category: Benchmarks
Sector: Enterprise operations
Capability: Frontier model release and benchmark movement
Score: 64/100
Claim: OpenAI shares its approach to AI localization, showing how globally shared frontier models can be adapted to local languages, laws, and cultures without compromising safety.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## GPT-5 lowers the cost of cell-free protein synthesis
Source: https://openai.com/index/gpt-5-lowers-protein-synthesis-cost
Publisher: OpenAI
Category: Benchmarks
Sector: Healthcare and life sciences
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: An autonomous lab combining OpenAI’s GPT-5 with Ginkgo Bioworks’ cloud automation cut cell-free protein synthesis costs by 40% through closed-loop experimentation.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Introducing Trusted Access for Cyber
Source: https://openai.com/index/trusted-access-for-cyber
Publisher: OpenAI
Category: Benchmarks
Sector: Cybersecurity
Capability: Frontier model release and benchmark movement
Score: 64/100
Claim: OpenAI introduces Trusted Access for Cyber, a trust-based framework that expands access to frontier cyber capabilities while strengthening safeguards against misuse.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Introducing OpenAI Frontier
Source: https://openai.com/index/introducing-openai-frontier
Publisher: OpenAI
Category: Benchmarks
Sector: Enterprise operations
Capability: Frontier model release and benchmark movement
Score: 71/100
Claim: OpenAI Frontier is an enterprise platform for building, deploying, and managing AI agents with shared context, onboarding, permissions, and governance.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Introducing GPT-5.3-Codex
Source: https://openai.com/index/introducing-gpt-5-3-codex
Publisher: OpenAI
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: GPT-5.3-Codex is a Codex-native agent that pairs frontier coding performance with general reasoning to support long-horizon, real-world technical work.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## GPT-5.3-Codex System Card
Source: https://openai.com/index/gpt-5-3-codex-system-card
Publisher: OpenAI
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 86/100
Claim: GPT‑5.3-Codex is the most capable agentic coding model to date, combining the frontier coding performance of GPT‑5.2-Codex with the reasoning and professional knowledge capabilities of GPT‑5.2.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Introducing Claude Opus 4.6
Source: https://www.anthropic.com/news/claude-opus-4-6
Publisher: Anthropic
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: We’re upgrading our smartest model. The new Claude Opus 4.6 improves on its predecessor’s coding skills. It plans more carefully, sustains agentic tasks for longer, can operate more reliably in larger codebases, and has better code review and debugging skills to catch its own mistakes. And, in a first for our Opus-class models, Opus 4.6 features a 1M token.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Unlocking the Codex harness: how we built the App Server
Source: https://openai.com/index/unlocking-the-codex-harness
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 74/100
Claim: Learn how to embed the Codex agent using the Codex App Server, a bidirectional JSON-RPC API powering streaming progress, tool use, approvals, and diffs.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Claude is a space to think
Source: https://www.anthropic.com/news/claude-is-a-space-to-think
Publisher: Anthropic
Category: Deployments
Sector: Media and content
Capability: Production AI deployment signal
Score: 85/100
Claim: There are many good places for advertising. A conversation with Claude is not one of them. Advertising drives competition, helps people discover new products, and allows services like email and social media to be offered for free. We’ve run our own ad campaigns , and our AI models have, in turn, helped many of our customers in the advertising industry.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## The Sora feed philosophy
Source: https://openai.com/index/sora-feed-philosophy
Publisher: OpenAI
Category: Vendor framing
Sector: Media and content
Capability: Multimodal content generation and media workflows
Score: 64/100
Claim: Discover the Sora feed philosophy—built to spark creativity, foster connections, and keep experiences safe with personalized recommendations, parental controls, and strong guardrails.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Apple’s Xcode now supports the Claude Agent SDK
Source: https://www.anthropic.com/news/apple-xcode-claude-agent-sdk
Publisher: Anthropic
Category: Vendor framing
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 86/100
Claim: Apple's Xcode is where developers build, test, and distribute apps for Apple platforms, including iPhone, iPad, Mac, Apple Watch, Apple Vision Pro, and Apple TV. In September, we announced that developers would have access to Claude Sonnet 4 in Xcode 26. Claude could be used to write code, debug, and generate documentation—but it was limited to helping.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Snowflake and OpenAI partner to bring frontier intelligence to enterprise data
Source: https://openai.com/index/snowflake-partnership
Publisher: OpenAI
Category: Benchmarks
Sector: Enterprise operations
Capability: Frontier model release and benchmark movement
Score: 93/100
Claim: OpenAI and Snowflake partner in a $200M agreement to bring frontier intelligence into enterprise data, enabling AI agents and insights directly in Snowflake.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Introducing the Codex app
Source: https://openai.com/index/introducing-the-codex-app
Publisher: OpenAI
Category: Deployments
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 88/100
Claim: Introducing the Codex app for macOS—a command center for AI coding and software development with multiple agents, parallel workflows, and long-running tasks.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Anthropic partners with Allen Institute and Howard Hughes Medical Institute to accelerate scientific discovery
Source: https://www.anthropic.com/news/anthropic-partners-with-allen-institute-and-howard-hughes-medical-institute
Publisher: Anthropic
Category: Benchmarks
Sector: Healthcare and life sciences
Capability: Healthcare and life-sciences reasoning
Score: 76/100
Claim: Modern biological research generates data at unprecedented scale—from single-cell sequencing to whole-brain connectomics—yet transforming that data into validated biological insights remains a fundamental bottleneck. Knowledge synthesis, hypothesis generation, and experimental interpretation still depend on manual processes that can't keep pace with the.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Inside OpenAI’s in-house data agent
Source: https://openai.com/index/inside-our-in-house-data-agent
Publisher: OpenAI
Category: Deployments
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: How OpenAI built an in-house AI data agent that uses GPT-5, Codex, and memory to reason over massive datasets and deliver reliable insights in minutes.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Retiring GPT-4o, GPT-4.1, GPT-4.1 mini, and OpenAI o4-mini in ChatGPT
Source: https://openai.com/index/retiring-gpt-4o-and-older-models
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Agent platform and API infrastructure
Score: 64/100
Claim: On February 13, 2026, alongside the previously announced retirement of GPT‑5 (Instant, Thinking, and Pro), we will retire GPT‑4o, GPT‑4.1, GPT‑4.1 mini, and OpenAI o4-mini from ChatGPT. In the API, there are no changes at this time.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Taisei Corporation shapes the next generation of talent with AI
Source: https://openai.com/index/taisei
Publisher: OpenAI
Category: Deployments
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 85/100
Claim: Taisei Corporation’s HR team is leading the rollout of ChatGPT Enterprise to drive AI-powered talent development across the organization.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## EMEA Youth & Wellbeing Grant
Source: https://openai.com/index/emea-youth-and-wellbeing-grant
Publisher: OpenAI
Category: Benchmarks
Sector: Scientific research
Capability: Model and benchmark capability movement
Score: 54/100
Claim: Apply for the EMEA Youth & Wellbeing Grant, a €500,000 program funding NGOs and researchers advancing youth safety and wellbeing in the age of AI.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## The next chapter for AI in the EU
Source: https://openai.com/index/the-next-chapter-for-ai-in-the-eu
Publisher: OpenAI
Category: Deployments
Sector: General AI capability
Capability: Education and workforce adoption
Score: 85/100
Claim: OpenAI launches the EU Economic Blueprint 2.0 with new data, partnerships, and initiatives to accelerate AI adoption, skills, and growth across Europe.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Keeping your data safe when an AI agent clicks a link
Source: https://openai.com/index/ai-agent-link-safety
Publisher: OpenAI
Category: Vendor framing
Sector: Cybersecurity
Capability: Agent platform and API infrastructure
Score: 62/100
Claim: Learn how OpenAI protects user data when AI agents open links, preventing URL-based data exfiltration and prompt injection with built-in safeguards.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## ServiceNow chooses Claude to power customer apps and increase internal productivity
Source: https://www.anthropic.com/news/servicenow-anthropic-claude
Publisher: Anthropic
Category: Deployments
Sector: Cybersecurity
Capability: Enterprise workflow automation
Score: 96/100
Claim: As enterprises move beyond experimenting with AI and start putting it into production across their core business operations, scale and security matters just as much as capabilities. With this in mind, ServiceNow, which helps large companies manage and automate everything from IT support to HR to customer service on a single platform, has chosen Claude as.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## PVH reimagines the future of fashion with OpenAI
Source: https://openai.com/index/pvh-future-of-fashion
Publisher: OpenAI
Category: Deployments
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 85/100
Claim: PVH Corp., parent company of Calvin Klein and Tommy Hilfiger, is adopting ChatGPT Enterprise to bring AI into fashion design, supply chain, and consumer engagement.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Introducing Prism
Source: https://openai.com/index/introducing-prism
Publisher: OpenAI
Category: Benchmarks
Sector: Scientific research
Capability: Frontier model release and benchmark movement
Score: 88/100
Claim: Prism is a free LaTeX-native workspace with GPT-5.2 built in, helping researchers write, collaborate, and reason in one place.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## TRUSTBANK uses AI agents to personalize Furusato Nozei gifts
Source: https://openai.com/index/trustbank
Publisher: OpenAI
Category: Deployments
Sector: Financial services
Capability: Financial workflow automation
Score: 88/100
Claim: TRUSTBANK partnered with Recursive to build Choice AI using OpenAI models, enabling personalized conversational recommendations that simplify Furusato Nozei gift discovery.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Anthropic partners with the UK Government to bring AI assistance to GOV.UK services
Source: https://www.anthropic.com/news/gov-UK-partnership
Publisher: Anthropic
Category: Labour market
Sector: Public sector
Capability: Labour-market adoption signal
Score: 72/100
Claim: Anthropic has been selected by the UK's Department for Science, Innovation and Technology (DSIT) to help build and pilot a dedicated AI-powered assistant for GOV.UK. The AI assistant will help people navigate government services and give tailored advice. The initial use case is employment: helping people find work, access training, understand the support.
Oracle verdict: This is a labour-market context signal rather than a single workflow proof point. It helps the thesis track whether adoption, education, wages, and institutional behaviour are moving in the same direction as the capability curve.
Thesis relevance: Appendix III, section five: labour-market and adoption evidence
## How Indeed uses AI to help evolve the job search
Source: https://openai.com/index/indeed-maggie-hulce
Publisher: OpenAI
Category: Deployments
Sector: General AI capability
Capability: Production AI deployment signal
Score: 78/100
Claim: Indeed’s CRO Maggie Hulce shares how AI is transforming job search, recruiting, and talent acquisition for employers and job seekers.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Unrolling the Codex agent loop
Source: https://openai.com/index/unrolling-the-codex-agent-loop
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 74/100
Claim: A technical deep dive into the Codex agent loop, explaining how Codex CLI orchestrates models, tools, prompts, and performance using the Responses API.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Scaling PostgreSQL to power 800 million ChatGPT users
Source: https://openai.com/index/scaling-postgresql
Publisher: OpenAI
Category: Vendor framing
Sector: Enterprise operations
Capability: Vendor platform capability signal
Score: 68/100
Claim: An inside look at how OpenAI scaled PostgreSQL to millions of queries per second using replicas, caching, rate limiting, and workload isolation.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Inside Praktika's conversational approach to language learning
Source: https://openai.com/index/praktika
Publisher: OpenAI
Category: Deployments
Sector: General AI capability
Capability: Frontier model release and benchmark movement
Score: 90/100
Claim: How Praktika uses GPT-4.1 and GPT-5.2 to build adaptive AI tutors that personalize lessons, track progress, and help learners achieve real-world language fluency.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Claude's new constitution
Source: https://www.anthropic.com/news/claude-new-constitution
Publisher: Anthropic
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 64/100
Claim: We’re publishing a new constitution for our AI model, Claude. It’s a detailed description of Anthropic’s vision for Claude’s values and behavior; a holistic document that explains the context in which Claude operates and the kind of entity we would like Claude to be. The constitution is a crucial part of our model training process, and its content directly.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## How Higgsfield turns simple ideas into cinematic social videos
Source: https://openai.com/index/higgsfield
Publisher: OpenAI
Category: Vendor framing
Sector: Media and content
Capability: Frontier model release and benchmark movement
Score: 76/100
Claim: Discover how Higgsfield gives creators cinematic, social-first video output from simple inputs using OpenAI GPT-4.1, GPT-5, and Sora 2.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Introducing Edu for Countries
Source: https://openai.com/index/edu-for-countries
Publisher: OpenAI
Category: Labour market
Sector: Education
Capability: Education and workforce adoption
Score: 72/100
Claim: Edu for Countries is a new OpenAI initiative helping governments use AI to modernize education systems and build future-ready workforces.
Oracle verdict: This is a labour-market context signal rather than a single workflow proof point. It helps the thesis track whether adoption, education, wages, and institutional behaviour are moving in the same direction as the capability curve.
Thesis relevance: Appendix III, section five: labour-market and adoption evidence
## How countries can end the capability overhang
Source: https://openai.com/index/how-countries-can-end-the-capability-overhang
Publisher: OpenAI
Category: Vendor framing
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 68/100
Claim: Our latest report reveals stark differences in advanced AI adoption across countries and outlines new initiatives to help nations capture productivity gains from AI.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Mariano-Florentino Cuéllar appointed to Anthropic’s Long-Term Benefit Trust
Source: https://www.anthropic.com/news/mariano-florentino-long-term-benefit-trust
Publisher: Anthropic
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 42/100
Claim: Anthropic’s Long-Term Benefit Trust announced the appointment of Mariano-Florentino (Tino) Cuéllar as a new member of the Trust. The Long-Term Benefit Trust is an independent body designed to help Anthropic achieve its public benefit mission. Cuéllar brings extensive experience in law, governance, and international affairs, including service as a Justice.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Anthropic and Teach For All launch global AI training initiative for educators
Source: https://www.anthropic.com/news/anthropic-teach-for-all
Publisher: Anthropic
Category: Vendor framing
Sector: Education
Capability: Education and workforce adoption
Score: 68/100
Claim: Anthropic is partnering with Teach For All to bring AI tools and training to educators in 63 countries. Through the AI Literacy & Creator Collective (LCC), more than 100,000 teachers and alumni across Teach For All's network—which serves more than 1.5 million students—will have the opportunity to develop AI fluency and adapt Claude to serve real classroom.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Horizon 1000: Advancing AI for primary healthcare
Source: https://openai.com/index/horizon-1000
Publisher: OpenAI
Category: Vendor framing
Sector: Healthcare and life sciences
Capability: Healthcare and life-sciences reasoning
Score: 54/100
Claim: OpenAI and the Gates Foundation launch Horizon 1000, a $50M pilot advancing AI capabilities for healthcare in Africa. The initiative aims to reach 1,000 clinics by 2028.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Stargate Community
Source: https://openai.com/index/stargate-community
Publisher: OpenAI
Category: Labour market
Sector: Enterprise operations
Capability: Education and workforce adoption
Score: 72/100
Claim: Stargate Community plans detail a community-first approach to AI infrastructure, using locally tailored plans shaped by community input, energy needs, and workforce priorities.
Oracle verdict: This is a labour-market context signal rather than a single workflow proof point. It helps the thesis track whether adoption, education, wages, and institutional behaviour are moving in the same direction as the capability curve.
Thesis relevance: Appendix III, section five: labour-market and adoption evidence
## Cisco and OpenAI redefine enterprise engineering with AI agents
Source: https://openai.com/index/cisco
Publisher: OpenAI
Category: Deployments
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 95/100
Claim: Cisco and OpenAI redefine enterprise engineering with Codex, an AI software agent embedded in workflows to speed builds, automate defect fixes, and enable AI-native development.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## ServiceNow powers actionable enterprise AI with OpenAI
Source: https://openai.com/index/servicenow-powers-actionable-enterprise-ai-with-openai
Publisher: OpenAI
Category: Benchmarks
Sector: Media and content
Capability: Frontier model release and benchmark movement
Score: 93/100
Claim: ServiceNow expands access to OpenAI frontier models to power AI-driven enterprise workflows, summarization, search, and voice across the ServiceNow Platform.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Our approach to age prediction
Source: https://openai.com/index/our-approach-to-age-prediction
Publisher: OpenAI
Category: Vendor framing
Sector: Cybersecurity
Capability: Vendor platform capability signal
Score: 52/100
Claim: ChatGPT is rolling out age prediction to estimate if accounts are under or over 18, applying safeguards for teens and refining accuracy over time.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## A business that scales with the value of intelligence
Source: https://openai.com/index/a-business-that-scales-with-the-value-of-intelligence
Publisher: OpenAI
Category: Vendor framing
Sector: Commerce and marketplace
Capability: Enterprise workflow automation
Score: 64/100
Claim: OpenAI’s business model scales with intelligence—spanning subscriptions, API, ads, commerce, and compute—driven by deepening ChatGPT adoption.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Our approach to advertising and expanding access to ChatGPT
Source: https://openai.com/index/our-approach-to-advertising-and-expanding-access
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 64/100
Claim: OpenAI plans to test advertising in the U.S. for ChatGPT’s free and Go tiers to expand affordable access to AI worldwide, while protecting privacy, trust, and answer quality.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Introducing ChatGPT Go, now available worldwide
Source: https://openai.com/index/introducing-chatgpt-go
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Frontier model release and benchmark movement
Score: 76/100
Claim: ChatGPT Go is now available worldwide, offering expanded access to GPT-5.2 Instant, higher usage limits, and longer memory—making advanced AI more affordable globally.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Anthropic appoints Irina Ghose as Managing Director of India ahead of Bengaluru office opening
Source: https://www.anthropic.com/news/anthropic-appoints-irina-ghose-as-managing-director-of-india
Publisher: Anthropic
Category: Deployments
Sector: Financial services
Capability: Enterprise workflow automation
Score: 63/100
Claim: Irina Ghose is joining Anthropic as Managing Director of India as we prepare to open our first office in the country. Irina brings more than three decades of experience in scaling technology businesses. She most recently served as Managing Director, Microsoft India, where she led enterprise AI adoption across major Indian industries including banking and.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Investing in Merge Labs
Source: https://openai.com/index/investing-in-merge-labs
Publisher: OpenAI
Category: Vendor framing
Sector: Customer operations
Capability: Vendor platform capability signal
Score: 64/100
Claim: OpenAI is investing in Merge Labs to support new brain computer interfaces that bridge biological and artificial intelligence to maximize human ability, agency, and experience.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Strengthening the U.S. AI supply chain through domestic manufacturing
Source: https://openai.com/index/strengthening-the-us-ai-supply-chain
Publisher: OpenAI
Category: Labour market
Sector: AI infrastructure
Capability: Education and workforce adoption
Score: 72/100
Claim: OpenAI launches a new RFP to strengthen the U.S. AI supply chain by accelerating domestic manufacturing, creating jobs, and scaling AI infrastructure.
Oracle verdict: This is a labour-market context signal rather than a single workflow proof point. It helps the thesis track whether adoption, education, wages, and institutional behaviour are moving in the same direction as the capability curve.
Thesis relevance: Appendix III, section five: labour-market and adoption evidence
## How scientists are using Claude to accelerate research and discovery
Source: https://www.anthropic.com/news/accelerating-scientific-research
Publisher: Anthropic
Category: Benchmarks
Sector: Healthcare and life sciences
Capability: Healthcare and life-sciences reasoning
Score: 76/100
Claim: Last October we launched Claude for Life Sciences—a suite of connectors and skills that made Claude a better scientific collaborator. Since then, we've invested heavily in making Claude the most capable model for scientific work , with Opus 4.5 showing significant improvements in figure interpretation, computational biology, and protein understanding.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## OpenAI partners with Cerebras
Source: https://openai.com/index/cerebras-partnership
Publisher: OpenAI
Category: Vendor framing
Sector: Enterprise operations
Capability: Vendor platform capability signal
Score: 68/100
Claim: OpenAI partners with Cerebras to add 750MW of high-speed AI compute, reducing inference latency and making ChatGPT faster for real-time AI workloads.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Zenken boosts a lean sales team with ChatGPT Enterprise
Source: https://openai.com/index/zenken
Publisher: OpenAI
Category: Deployments
Sector: Customer operations
Capability: Enterprise workflow automation
Score: 95/100
Claim: By rolling out ChatGPT Enterprise company-wide, Zenken has boosted sales performance, cut preparation time, and increased proposal success rates. AI-supported workflows are helping a lean team deliver more personalized, effective customer engagement.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Introducing Labs
Source: https://www.anthropic.com/news/introducing-anthropic-labs
Publisher: Anthropic
Category: Deployments
Sector: Enterprise operations
Capability: Agent platform and API infrastructure
Score: 85/100
Claim: Our models are evolving at a rapid clip, and each new release brings another leap in capabilities. Building product experiences around these emerging capabilities requires different motions working in partnership: tinkering and experimenting at the edge of what Claude can do, testing unpolished versions with early users to find what works, and taking what.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Advancing Claude in healthcare and the life sciences
Source: https://www.anthropic.com/news/healthcare-life-sciences
Publisher: Anthropic
Category: Benchmarks
Sector: Healthcare and life sciences
Capability: Healthcare and life-sciences reasoning
Score: 76/100
Claim: In October, we announced Claude for Life Sciences , our latest step in making Claude a productive research partner for scientists and clinicians, and in helping Claude to support those in industry bringing new scientific advancements to the public. Now, we’re expanding that feature set in two ways. First, we’re introducing Claude for Healthcare , a.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## OpenAI and SoftBank Group partner with SB Energy
Source: https://openai.com/index/stargate-sb-energy-partnership
Publisher: OpenAI
Category: Deployments
Sector: Financial services
Capability: Financial workflow automation
Score: 78/100
Claim: OpenAI and SoftBank Group partner with SB Energy to develop multi-gigawatt AI data center campuses, including a 1.2 GW Texas facility supporting the Stargate initiative.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Datadog uses Codex for system-level code review
Source: https://openai.com/index/datadog
Publisher: OpenAI
Category: Deployments
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 88/100
Claim: OpenAI and Datadog brand graphic with the OpenAI wordmark on the left, the Datadog logo on the right, and a central abstract brown fur-like texture panel on a white background.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## OpenAI for Healthcare
Source: https://openai.com/index/openai-for-healthcare
Publisher: OpenAI
Category: Deployments
Sector: Healthcare and life sciences
Capability: Enterprise workflow automation
Score: 95/100
Claim: OpenAI for Healthcare enables secure, enterprise-grade AI that supports HIPAA compliance—reducing administrative burden and supporting clinical workflows.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Netomi’s lessons for scaling agentic systems into the enterprise
Source: https://openai.com/index/netomi
Publisher: OpenAI
Category: Deployments
Sector: Enterprise operations
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: How Netomi scales enterprise AI agents using GPT-4.1 and GPT-5.2—combining concurrency, governance, and multi-step reasoning for reliable production workflows.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## How Tolan builds voice-first AI with GPT-5.1
Source: https://openai.com/index/tolan
Publisher: OpenAI
Category: Vendor framing
Sector: Media and content
Capability: Frontier model release and benchmark movement
Score: 76/100
Claim: Tolan built a voice-first AI companion with GPT-5.1, combining low-latency responses, real-time context reconstruction, and memory-driven personalities for natural conversations.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Introducing ChatGPT Health
Source: https://openai.com/index/introducing-chatgpt-health
Publisher: OpenAI
Category: Vendor framing
Sector: Healthcare and life sciences
Capability: Healthcare and life-sciences reasoning
Score: 64/100
Claim: ChatGPT Health is a dedicated experience that securely connects your health data and apps, with privacy protections and a physician-informed design.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Announcing OpenAI Grove Cohort 2
Source: https://openai.com/index/openai-grove
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Agent platform and API infrastructure
Score: 64/100
Claim: Applications are now open for OpenAI Grove Cohort 2, a 5-week founder program designed for individuals at any stage, from pre-idea to product. Participants receive $50K in API credits, early access to AI tools, and hands-on mentorship from the OpenAI team.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Continuously hardening ChatGPT Atlas against prompt injection
Source: https://openai.com/index/hardening-atlas-against-prompt-injection
Publisher: OpenAI
Category: Vendor framing
Sector: Cybersecurity
Capability: Cyber defence and misuse monitoring
Score: 74/100
Claim: OpenAI is strengthening ChatGPT Atlas against prompt injection attacks using automated red teaming trained with reinforcement learning. This proactive discover-and-patch loop helps identify novel exploits early and harden the browser agent’s defenses as AI becomes more agentic.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## One in a million: celebrating the customers shaping AI’s future
Source: https://openai.com/index/one-in-a-million-customers
Publisher: OpenAI
Category: Deployments
Sector: Enterprise operations
Capability: Agent platform and API infrastructure
Score: 89/100
Claim: More than one million customers around the world now use OpenAI to empower their teams and unlock new opportunities. This post highlights how companies like PayPal, Virgin Atlantic, BBVA, Cisco, Moderna, and Canva are transforming the way work gets done with AI.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Sharing our compliance framework for California's Transparency in Frontier AI Act
Source: https://www.anthropic.com/news/compliance-framework-SB53
Publisher: Anthropic
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 64/100
Claim: On January 1, California's Transparency in Frontier AI Act ( SB 53 ) will go into effect. It establishes the nation’s first frontier AI safety and transparency requirements for catastrophic risks. While we have long advocated for a federal framework, Anthropic endorsed SB 53 because we believe frontier AI developers like ourselves should be transparent.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Evaluating chain-of-thought monitorability
Source: https://openai.com/index/evaluating-chain-of-thought-monitorability
Publisher: OpenAI
Category: Benchmarks
Sector: Enterprise operations
Capability: Model and benchmark capability movement
Score: 76/100
Claim: OpenAI introduces a new framework and evaluation suite for chain-of-thought monitorability, covering 13 evaluations across 24 environments. Our findings show that monitoring a model’s internal reasoning is far more effective than monitoring outputs alone, offering a promising path toward scalable control as AI systems grow more capable.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Deepening our collaboration with the U.S. Department of Energy
Source: https://openai.com/index/us-department-of-energy-collaboration
Publisher: OpenAI
Category: Benchmarks
Sector: Public sector
Capability: Model and benchmark capability movement
Score: 76/100
Claim: OpenAI and the U.S. Department of Energy have signed a memorandum of understanding to deepen collaboration on AI and advanced computing in support of scientific discovery. The agreement builds on ongoing work with national laboratories and helps establish a framework for applying AI to high-impact research across the DOE ecosystem.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Updating our Model Spec with teen protections
Source: https://openai.com/index/updating-model-spec-with-teen-protections
Publisher: OpenAI
Category: Benchmarks
Sector: Customer operations
Capability: Model and benchmark capability movement
Score: 54/100
Claim: OpenAI is updating its Model Spec with new Under-18 Principles that define how ChatGPT should support teens with safe, age-appropriate guidance grounded in developmental science. The update strengthens guardrails, clarifies expected model behavior in higher-risk situations, and builds on our broader work to improve teen safety across ChatGPT.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## AI literacy resources for teens and parents
Source: https://openai.com/index/ai-literacy-resources-for-teens-and-parents
Publisher: OpenAI
Category: Vendor framing
Sector: Healthcare and life sciences
Capability: Healthcare and life-sciences reasoning
Score: 64/100
Claim: OpenAI shares new AI literacy resources to help teens and parents use ChatGPT thoughtfully, safely, and with confidence. The guides include expert-vetted tips for responsible use, critical thinking, healthy boundaries, and supporting teens through emotional or sensitive topics.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Addendum to GPT-5.2 System Card: GPT-5.2-Codex
Source: https://openai.com/index/gpt-5-2-codex-system-card
Publisher: OpenAI
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 86/100
Claim: Official OpenAI release: Addendum to GPT-5.2 System Card: GPT-5.2-Codex.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Introducing GPT-5.2-Codex
Source: https://openai.com/index/introducing-gpt-5-2-codex
Publisher: OpenAI
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: GPT-5.2-Codex is OpenAI’s most advanced coding model, offering long-horizon reasoning, large-scale code transformations, and enhanced cybersecurity capabilities.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Protecting the wellbeing of our users
Source: https://www.anthropic.com/news/protecting-well-being-of-users
Publisher: Anthropic
Category: Vendor framing
Sector: Cybersecurity
Capability: Vendor platform capability signal
Score: 52/100
Claim: People use AI for a wide variety of reasons, and for some that may include emotional support. Our Safeguards team leads our efforts to ensure that Claude handles these conversations appropriately—responding with empathy, being honest about its limitations as an AI, and being considerate of our users' wellbeing. When chatbots handle these questions without.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Working with the US Department of Energy to unlock the next era of scientific discovery
Source: https://www.anthropic.com/news/genesis-mission-partnership
Publisher: Anthropic
Category: Benchmarks
Sector: Healthcare and life sciences
Capability: Enterprise workflow automation
Score: 87/100
Claim: Anthropic and the US Department of Energy (DOE) are announcing a multi-year partnership as part of the Genesis Mission— the Department’s initiative to use AI to cement America’s leadership in science. Our partnership focuses on three domains—American energy dominance, the biological and life sciences, and scientific productivity—and has the potential to.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Introducing OpenAI Academy for News Organizations
Source: https://openai.com/index/openai-academy-for-news-organizations
Publisher: OpenAI
Category: Vendor framing
Sector: Customer operations
Capability: Vendor platform capability signal
Score: 64/100
Claim: OpenAI is launching the OpenAI Academy for News Organizations, a new learning hub built with the American Journalism Project and The Lenfest Institute to help newsrooms use AI effectively. The Academy offers training, practical use cases, and responsible-use guidance to support journalists, editors, and publishers as they adopt AI in their reporting and.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Developers can now submit apps to ChatGPT
Source: https://openai.com/index/developers-can-now-submit-apps-to-chatgpt
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Agent platform and API infrastructure
Score: 64/100
Claim: Developers can now submit apps for review and publication in ChatGPT, with approved apps appearing in a new in-product directory for easy discovery. Updated tools, guidelines, and the Apps SDK help developers build powerful chat-native experiences that bring real-world actions into ChatGPT.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Evaluating AI’s ability to perform scientific research tasks
Source: https://openai.com/index/frontierscience
Publisher: OpenAI
Category: Benchmarks
Sector: Scientific research
Capability: Frontier model release and benchmark movement
Score: 76/100
Claim: OpenAI introduces FrontierScience, a benchmark testing AI reasoning in physics, chemistry, and biology to measure progress toward real scientific research.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Measuring AI’s capability to accelerate biological research
Source: https://openai.com/index/accelerating-biological-research-in-the-wet-lab
Publisher: OpenAI
Category: Benchmarks
Sector: Scientific research
Capability: Frontier model release and benchmark movement
Score: 88/100
Claim: OpenAI introduces a real-world evaluation framework to measure how AI can accelerate biological research in the wet lab. Using GPT-5 to optimize a molecular cloning protocol, the work explores both the promise and risks of AI-assisted experimentation.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## The new ChatGPT Images is here
Source: https://openai.com/index/new-chatgpt-images-is-here
Publisher: OpenAI
Category: Deployments
Sector: Media and content
Capability: Multimodal content generation and media workflows
Score: 82/100
Claim: The new ChatGPT Images is powered by our flagship image generation model, delivering more precise edits, consistent details, and image generation up to 4× faster. The upgraded model is rolling out to all ChatGPT users today and is also available in the API as GPT-Image-1.5.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## BBVA and OpenAI collaborate to transform global banking
Source: https://openai.com/index/bbva-collaboration-expansion
Publisher: OpenAI
Category: Deployments
Sector: Financial services
Capability: Enterprise workflow automation
Score: 85/100
Claim: BBVA is expanding its work with OpenAI through a multi-year AI transformation program, rolling out ChatGPT Enterprise to all 120,000 employees. Together, the companies will develop AI solutions that enhance customer interactions, streamline operations, and help build an AI-native banking experience.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## BNY builds “AI for everyone, everywhere” with OpenAI
Source: https://openai.com/index/bny
Publisher: OpenAI
Category: Deployments
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 95/100
Claim: BNY uses OpenAI to expand AI adoption enterprise-wide through Eliza, where 20,000+ employees build AI agents that improve efficiency and client outcomes.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## How We Used Codex to Ship Sora for Android in 28 Days
Source: https://openai.com/index/shipping-sora-for-android-with-codex
Publisher: OpenAI
Category: Deployments
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 88/100
Claim: OpenAI shipped Sora for Android in 28 days using Codex. AI-assisted planning, translation, and parallel coding workflows helped a nimble team deliver rapid, reliable development.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Advancing science and math with GPT-5.2
Source: https://openai.com/index/gpt-5-2-for-science-and-math
Publisher: OpenAI
Category: Benchmarks
Sector: Scientific research
Capability: Frontier model release and benchmark movement
Score: 88/100
Claim: GPT-5.2 is OpenAI’s strongest model yet for math and science, setting new state-of-the-art results on benchmarks like GPQA Diamond and FrontierMath. This post shows how those gains translate into real research progress, including solving an open theoretical problem and generating reliable mathematical proofs.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## How Podium is arming 10,000+ SMBs with AI agents
Source: https://openai.com/index/podium
Publisher: OpenAI
Category: Deployments
Sector: Enterprise operations
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: Discover how Podium used OpenAI’s GPT-5 to build “Jerry,” an AI teammate driving 300% growth and transforming how Main Street businesses serve customers.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## The Walt Disney Company and OpenAI reach landmark agreement to bring beloved characters to Sora
Source: https://openai.com/index/disney-sora-agreement
Publisher: OpenAI
Category: Deployments
Sector: Media and content
Capability: Enterprise workflow automation
Score: 85/100
Claim: Disney and OpenAI have reached an agreement to bring more than 200 Disney, Marvel, Pixar and Star Wars characters to Sora for fan-inspired short videos. The agreement emphasizes responsible AI in entertainment and includes Disney’s company-wide use of ChatGPT Enterprise and the OpenAI API.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Introducing GPT-5.2
Source: https://openai.com/index/introducing-gpt-5-2
Publisher: OpenAI
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: GPT-5.2 is our most advanced frontier model for everyday professional work, with state-of-the-art reasoning, long-context understanding, coding, and vision. Use it in ChatGPT and the OpenAI API to power faster, more reliable agentic workflows.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Update to GPT-5 System Card: GPT-5.2
Source: https://openai.com/index/gpt-5-system-card-update-gpt-5-2
Publisher: OpenAI
Category: Benchmarks
Sector: General AI capability
Capability: Frontier model release and benchmark movement
Score: 76/100
Claim: GPT-5.2 is the latest model family in the GPT-5 series. The comprehensive safety mitigation approach for these models is largely the same as that described in the GPT-5 System Card and GPT-5.1 System Card. Like OpenAI’s other models, the GPT-5.2 models were trained on diverse datasets, including information that is publicly available on the internet.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Ten years
Source: https://openai.com/index/ten-years
Publisher: OpenAI
Category: Benchmarks
Sector: Scientific research
Capability: Model and benchmark capability movement
Score: 76/100
Claim: OpenAI reflects on ten years of progress, from early research breakthroughs to widely used AI systems that reshaped what’s possible. We share lessons from the past decade and why we remain optimistic about building AGI that benefits all of humanity.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Strengthening cyber resilience as AI capabilities advance
Source: https://openai.com/index/strengthening-cyber-resilience
Publisher: OpenAI
Category: Vendor framing
Sector: Cybersecurity
Capability: Cyber defence and misuse monitoring
Score: 52/100
Claim: OpenAI is investing in stronger safeguards and defensive capabilities as AI models become more powerful in cybersecurity. We explain how we assess risk, limit misuse, and work with the security community to strengthen cyber resilience.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## How Scout24 is building the next generation of real-estate search with AI
Source: https://openai.com/index/scout24
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Frontier model release and benchmark movement
Score: 76/100
Claim: Scout24 has created a GPT-5 powered conversational assistant that reimagines real-estate search, guiding users with clarifying questions, summaries, and tailored listing recommendations.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## OpenAI co-founds Agentic AI Foundation, donates AGENTS.md
Source: https://openai.com/index/agentic-ai-foundation
Publisher: OpenAI
Category: Vendor framing
Sector: Customer operations
Capability: Agent platform and API infrastructure
Score: 64/100
Claim: OpenAI co-founds the Agentic AI Foundation under the Linux Foundation and donates AGENTS.md to support open, interoperable standards for safe agentic AI.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Launching our first OpenAI Certifications courses
Source: https://openai.com/index/openai-certificate-courses
Publisher: OpenAI
Category: Vendor framing
Sector: Enterprise operations
Capability: Vendor platform capability signal
Score: 54/100
Claim: Learn how OpenAI’s new certifications and AI Foundations courses help people build real-world AI skills, boost career opportunities, and prepare for the future of work.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Bringing powerful AI to millions across Europe with Deutsche Telekom
Source: https://openai.com/index/deutsche-telekom-collaboration
Publisher: OpenAI
Category: Deployments
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 96/100
Claim: OpenAI is collaborating with Deutsche Telekom to bring advanced, multilingual AI experiences to millions of people across Europe. ChatGPT Enterprise will also be deployed to help employees at Deutsche Telekom improve workflows and accelerate innovation.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Commonwealth Bank of Australia builds AI fluency at scale
Source: https://openai.com/index/commonwealth-bank-of-australia
Publisher: OpenAI
Category: Deployments
Sector: Financial services
Capability: Enterprise workflow automation
Score: 85/100
Claim: Commonwealth Bank of Australia partners with OpenAI to roll out ChatGPT Enterprise to 50,000 employees, building AI fluency at scale to improve customer service and fraud response.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## OpenAI appoints Denise Dresser as Chief Revenue Officer
Source: https://openai.com/index/openai-appoints-denise-dresser
Publisher: OpenAI
Category: Deployments
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 63/100
Claim: Denise Dresser is joining as Chief Revenue Officer, overseeing OpenAI’s global revenue strategy across enterprise and customer success. She will help more businesses put AI to work in their day-to-day operations as OpenAI continues to scale.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Donating the Model Context Protocol and establishing the Agentic AI Foundation
Source: https://www.anthropic.com/news/donating-the-model-context-protocol-and-establishing-of-the-agentic-ai-foundation
Publisher: Anthropic
Category: Vendor framing
Sector: Customer operations
Capability: Agent platform and API infrastructure
Score: 64/100
Claim: Today, we’re donating the Model Context Protocol (MCP) to the Agentic AI Foundation (AAIF), a directed fund under the Linux Foundation , co-founded by Anthropic, Block and OpenAI, with support from Google, Microsoft, Amazon Web Services (AWS), Cloudflare, and Bloomberg. One year ago, we introduced MCP as a universal, open standard for connecting AI.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Accenture and Anthropic launch multi-year partnership to move enterprises from AI pilots to production
Source: https://www.anthropic.com/news/anthropic-accenture-partnership
Publisher: Anthropic
Category: Deployments
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 85/100
Claim: Anthropic and Accenture today announced a major expansion of their partnership to help enterprises move from AI pilots to full-scale deployment. Key elements of the announcement: The announcement comes as Anthropic's enterprise market share has grown from 24% to 40%*.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Instacart and OpenAI partner on AI shopping experiences
Source: https://openai.com/index/instacart-partnership
Publisher: OpenAI
Category: Deployments
Sector: Financial services
Capability: Financial workflow automation
Score: 85/100
Claim: OpenAI and Instacart are deepening their longstanding partnership by bringing the first fully integrated grocery shopping and Instant Checkout payment app to ChatGPT.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## The state of enterprise AI
Source: https://openai.com/index/the-state-of-enterprise-ai-2025-report
Publisher: OpenAI
Category: Deployments
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 89/100
Claim: Key findings from OpenAI’s enterprise data show accelerating AI adoption, deeper integration, and measurable productivity gains across industries in 2025.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## How Virgin Atlantic uses AI to enhance every step of travel
Source: https://openai.com/index/virgin-atlantic-oliver-byers
Publisher: OpenAI
Category: Deployments
Sector: Financial services
Capability: Financial workflow automation
Score: 85/100
Claim: Virgin Atlantic CFO Oliver Byers shares how the airline is using AI to speed up development, improve decision-making, and elevate customer experience.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## OpenAI to acquire Neptune
Source: https://openai.com/index/openai-to-acquire-neptune
Publisher: OpenAI
Category: Benchmarks
Sector: Scientific research
Capability: Model and benchmark capability movement
Score: 76/100
Claim: OpenAI is acquiring Neptune to deepen visibility into model behavior and strengthen the tools researchers use to track experiments and monitor training.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## How confessions can keep language models honest
Source: https://openai.com/index/how-confessions-can-keep-language-models-honest
Publisher: OpenAI
Category: Benchmarks
Sector: Scientific research
Capability: Model and benchmark capability movement
Score: 76/100
Claim: OpenAI researchers are testing “confessions,” a method that trains models to admit when they make mistakes or act undesirably, helping improve AI honesty, transparency, and trust in model outputs.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Announcing the initial People-First AI Fund grantees
Source: https://openai.com/index/people-first-ai-fund-grantees
Publisher: OpenAI
Category: Vendor framing
Sector: Customer operations
Capability: Vendor platform capability signal
Score: 54/100
Claim: The OpenAI Foundation announces the initial recipients of the People-First AI Fund, awarding $40.5M in unrestricted grants to 208 nonprofits supporting community innovation and opportunity.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Snowflake and Anthropic announce $200 million partnership to bring agentic AI to global enterprises
Source: https://www.anthropic.com/news/snowflake-anthropic-expanded-partnership
Publisher: Anthropic
Category: Deployments
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 96/100
Claim: Today, we announce a significant expansion of our strategic partnership with Snowflake. The multi-year, $200 million agreement will not only make Anthropic’s Claude models available in the Snowflake platform to more than 12,600 global customers across Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Azure, but also establishes a joint go-to-market.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Anthropic acquires Bun as Claude Code reaches $1B milestone
Source: https://www.anthropic.com/news/anthropic-acquires-bun-as-claude-code-reaches-usd1b-milestone
Publisher: Anthropic
Category: Deployments
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 96/100
Claim: Claude is the world’s smartest and most capable AI model for developers, startups, and enterprises. Claude Code represents a new era of agentic coding, fundamentally changing how teams build software. In November, Claude Code achieved a significant milestone: just six months after becoming available to the public, it reached $1 billion in run-rate revenue.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Claude for Nonprofits
Source: https://www.anthropic.com/news/claude-for-nonprofits
Publisher: Anthropic
Category: Deployments
Sector: General AI capability
Capability: Production AI deployment signal
Score: 75/100
Claim: Nonprofits tackle some of society’s most difficult problems, often with limited resources. In partnership with the global generosity movement GivingTuesday , we’re launching Claude for Nonprofits to help organizations across the world maximize their impact. Many nonprofits already use Claude to meet their goals. The Epilepsy Foundation is providing 24/7.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Inside Mirakl's agentic commerce vision
Source: https://openai.com/index/mirakl
Publisher: OpenAI
Category: Deployments
Sector: Customer operations
Capability: Enterprise workflow automation
Score: 96/100
Claim: Mirakl is redefining commerce through AI agents and ChatGPT Enterprise—achieving faster documentation, smarter customer support, and building toward agent-native commerce with Mirakl Nexus.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Funding grants for new research into AI and mental health
Source: https://openai.com/index/ai-mental-health-research-grants
Publisher: OpenAI
Category: Benchmarks
Sector: Healthcare and life sciences
Capability: Healthcare and life-sciences reasoning
Score: 58/100
Claim: OpenAI is awarding up to $2 million in grants for research at the intersection of AI and mental health. The program supports projects that study real-world risks, benefits, and applications to improve safety and well-being.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## OpenAI and NORAD team up to bring new magic to “NORAD Tracks Santa”
Source: https://openai.com/index/norad-holiday-collaboration
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 64/100
Claim: OpenAI and NORAD are bringing new magic to “NORAD Tracks Santa” with three ChatGPT holiday tools that let families create festive elves, toy coloring pages, and custom Christmas stories.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## OpenAI takes an ownership stake in Thrive Holdings to accelerate enterprise AI adoption
Source: https://openai.com/index/thrive-holdings
Publisher: OpenAI
Category: Benchmarks
Sector: Scientific research
Capability: Frontier model release and benchmark movement
Score: 83/100
Claim: OpenAI takes an ownership stake in Thrive Holdings to accelerate enterprise AI adoption, embedding frontier research and engineering directly into accounting and IT services to boost speed, accuracy, and efficiency while creating a scalable model for industry-wide transformation.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Accenture and OpenAI accelerate enterprise AI success
Source: https://openai.com/index/accenture-partnership
Publisher: OpenAI
Category: Deployments
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 95/100
Claim: Accenture and OpenAI are collaborating to help enterprises bring agentic AI capabilities into the core of their business and unlock new levels of growth.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Mixpanel security incident: what OpenAI users need to know
Source: https://openai.com/index/mixpanel-incident
Publisher: OpenAI
Category: Vendor framing
Sector: Financial services
Capability: Financial workflow automation
Score: 64/100
Claim: OpenAI shares details about a Mixpanel security incident involving limited API analytics data. No API content, credentials, or payment details were exposed. Learn what happened and how we’re protecting users.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Expanding data residency access to business customers worldwide
Source: https://openai.com/index/expanding-data-residency-access-to-business-customers-worldwide
Publisher: OpenAI
Category: Deployments
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 85/100
Claim: OpenAI expands data residency for ChatGPT Enterprise, ChatGPT Edu, and the API Platform, enabling eligible customers to store data at rest in-region.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Our approach to mental health-related litigation
Source: https://openai.com/index/mental-health-litigation-approach
Publisher: OpenAI
Category: Vendor framing
Sector: Healthcare and life sciences
Capability: Healthcare and life-sciences reasoning
Score: 52/100
Claim: We’re sharing our approach to mental health-related litigation. O handle sensitive cases with care, transparency, and respect while continuing to strengthen safety and support in ChatGPT.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Inside JetBrains—the company reshaping how the world writes code
Source: https://openai.com/index/jetbrains-2025
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 80/100
Claim: JetBrains is integrating GPT-5 across its coding tools, helping millions of developers design, reason, and build software faster.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Introducing shopping research in ChatGPT
Source: https://openai.com/index/chatgpt-shopping-research
Publisher: OpenAI
Category: Benchmarks
Sector: Commerce and marketplace
Capability: Model and benchmark capability movement
Score: 76/100
Claim: Shopping research in ChatGPT helps you explore, compare, and discover products with personalized buyer’s guides that simplify decision-making.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## GPT-5 and the future of mathematical discovery
Source: https://openai.com/index/gpt-5-mathematical-discovery
Publisher: OpenAI
Category: Benchmarks
Sector: Scientific research
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: UCLA Professor Ernest Ryu and GPT-5 solved a key question in optimization theory, showcasing AI’s role in accelerating mathematical discovery.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Introducing Claude Opus 4.5
Source: https://www.anthropic.com/news/claude-opus-4-5
Publisher: Anthropic
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: Our newest model, Claude Opus 4.5, is available today. It’s intelligent, efficient, and the best model in the world for coding, agents, and computer use. It’s also meaningfully better at everyday tasks like deep research and working with slides and spreadsheets. Opus 4.5 is a step forward in what AI systems can do, and a preview of larger changes to how.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## OpenAI and Foxconn collaborate to strengthen U.S. manufacturing across the AI supply chain
Source: https://openai.com/index/openai-and-foxconn-collaborate
Publisher: OpenAI
Category: Deployments
Sector: AI infrastructure
Capability: Production AI deployment signal
Score: 85/100
Claim: OpenAI and Foxconn are collaborating to design and manufacture next-generation AI infrastructure hardware in the U.S. The partnership will develop multiple generations of data-center systems, strengthen U.S. supply chains, and build key components domestically to accelerate advanced AI infrastructure.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Helping 1,000 small businesses build with AI
Source: https://openai.com/index/small-business-ai-jam
Publisher: OpenAI
Category: Vendor framing
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 64/100
Claim: OpenAI is partnering with DoorDash, SCORE, and local organizations to help 1,000 small businesses build with AI. The Small Business AI Jam gives Main Street business owners hands-on tools and training to compete and grow.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Early experiments in accelerating science with GPT-5
Source: https://openai.com/index/accelerating-science-gpt-5
Publisher: OpenAI
Category: Benchmarks
Sector: Scientific research
Capability: Frontier model release and benchmark movement
Score: 88/100
Claim: OpenAI introduces the first research cases showing how GPT-5 accelerates scientific progress across math, physics, biology, and computer science. Explore how AI and researchers collaborate to generate proofs, uncover new insights, and reshape the pace of discovery.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Strengthening our safety ecosystem with external testing
Source: https://openai.com/index/strengthening-safety-with-external-testing
Publisher: OpenAI
Category: Benchmarks
Sector: Cybersecurity
Capability: Frontier model release and benchmark movement
Score: 64/100
Claim: OpenAI works with independent experts to evaluate frontier AI systems. Third-party testing strengthens safety, validates safeguards, and increases transparency in how we assess model capabilities and risks.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## How evals drive the next chapter in AI for businesses
Source: https://openai.com/index/evals-drive-next-chapter-of-ai
Publisher: OpenAI
Category: Benchmarks
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 80/100
Claim: Learn how evals help businesses define, measure, and improve AI performance—reducing risk, boosting productivity, and driving strategic advantage.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## OpenAI and Target team up on new AI-powered experiences
Source: https://openai.com/index/target-partnership
Publisher: OpenAI
Category: Deployments
Sector: Commerce and marketplace
Capability: Enterprise workflow automation
Score: 89/100
Claim: OpenAI and Target are partnering to bring a new Target app to ChatGPT, offering personalized shopping and faster checkout. Target will also expand its use of ChatGPT Enterprise to boost productivity and guest experiences.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## How Scania accelerates work with AI across its global workforce
Source: https://openai.com/index/scania
Publisher: OpenAI
Category: Labour market
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 61/100
Claim: Global manufacturer Scania is scaling AI with ChatGPT Enterprise. With team-based onboarding and strong guardrails, AI is boosting productivity, quality, and innovation.
Oracle verdict: This is a labour-market context signal rather than a single workflow proof point. It helps the thesis track whether adoption, education, wages, and institutional behaviour are moving in the same direction as the capability curve.
Thesis relevance: Appendix III, section five: labour-market and adoption evidence
## Building more with GPT-5.1-Codex-Max
Source: https://openai.com/index/gpt-5-1-codex-max
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 90/100
Claim: Introducing GPT-5.1-Codex-Max, a faster, more intelligent agentic coding model for Codex. The model is designed for long-running, project-scale work with enhanced reasoning and token efficiency.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## GPT-5.1-Codex-Max System Card
Source: https://openai.com/index/gpt-5-1-codex-max-system-card
Publisher: OpenAI
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 86/100
Claim: This system card outlines the comprehensive safety measures implemented for GPT‑5.1-CodexMax. It details both model-level mitigations, such as specialized safety training for harmful tasks and prompt injections, and product-level mitigations like agent sandboxing and configurable network access.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## A free version of ChatGPT built for teachers
Source: https://openai.com/index/chatgpt-for-teachers
Publisher: OpenAI
Category: Vendor framing
Sector: Education
Capability: Education and workforce adoption
Score: 64/100
Claim: ChatGPT for Teachers is a secure workspace with education‑grade privacy and admin controls. Free for verified U.S. K–12 educators through June 2027.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Intuit and OpenAI join forces on new AI-powered experiences
Source: https://openai.com/index/intuit-partnership
Publisher: OpenAI
Category: Benchmarks
Sector: Financial services
Capability: Frontier model release and benchmark movement
Score: 83/100
Claim: OpenAI and Intuit have entered a $100M+ multi-year partnership to launch Intuit app experiences in ChatGPT and expand Intuit’s use of OpenAI’s frontier models to power personalized financial tools.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Anthropic partners with Rwandan Government and ALX to bring AI education to hundreds of thousands of learners across Africa
Source: https://www.anthropic.com/news/rwandan-government-partnership-ai-education
Publisher: Anthropic
Category: Deployments
Sector: Education
Capability: Education and workforce adoption
Score: 85/100
Claim: Anthropic is announcing a new partnership with the Government of Rwanda and African tech training provider ALX to bring Chidi—a learning companion built on Claude—to hundreds of thousands of learners across Africa. Rwanda's ICT & Innovation and Education ministries are deploying Chidi within their national education system, while ALX will bring the tool to.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Microsoft, NVIDIA, and Anthropic announce strategic partnerships
Source: https://www.anthropic.com/news/microsoft-nvidia-anthropic-announce-strategic-partnerships
Publisher: Anthropic
Category: Deployments
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 89/100
Claim: Today Microsoft, NVIDIA, and Anthropic announced new strategic partnerships. Anthropic is scaling its rapidly-growing Claude AI model on Microsoft Azure, powered by NVIDIA, which will broaden access to Claude and provide Azure enterprise customers with expanded model choice and new capabilities. Anthropic has committed to purchase $30 billion of Azure.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Claude now available in Microsoft Foundry and Microsoft 365 Copilot
Source: https://www.anthropic.com/news/claude-in-microsoft-foundry
Publisher: Anthropic
Category: Deployments
Sector: Enterprise operations
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: Today we announced that Microsoft and Anthropic are expanding our partnership . As part of the partnership, Claude Sonnet 4.5, Haiku 4.5, and Opus 4.1 models are now available in public preview in Microsoft Foundry, where Azure customers can build production applications and enterprise agents. This enables companies to build with Claude, the world's best.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## OpenAI named Emerging Leader in Generative AI
Source: https://openai.com/index/gartner-2025-emerging-leader
Publisher: OpenAI
Category: Deployments
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 89/100
Claim: OpenAI has been named an Emerging Leader in Gartner’s 2025 Innovation Guide for Generative AI Model Providers. The recognition reflects our enterprise momentum, with over 1 million companies building with ChatGPT.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Introducing OpenAI for Ireland
Source: https://openai.com/index/openai-for-ireland
Publisher: OpenAI
Category: Vendor framing
Sector: Public sector
Capability: Enterprise workflow automation
Score: 68/100
Claim: OpenAI launches OpenAI for Ireland, partnering with the Irish Government, Dogpatch Labs and Patch to help SMEs, founders and young builders use AI to innovate, boost productivity and build the next generation of Irish tech startups.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Understanding neural networks through sparse circuits
Source: https://openai.com/index/understanding-neural-networks-through-sparse-circuits
Publisher: OpenAI
Category: Vendor framing
Sector: Customer operations
Capability: Vendor platform capability signal
Score: 64/100
Claim: OpenAI is exploring mechanistic interpretability to understand how neural networks reason. Our new sparse model approach could make AI systems more transparent and support safer, more reliable behavior.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Introducing GPT-5.1 for developers
Source: https://openai.com/index/gpt-5-1-for-developers
Publisher: OpenAI
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: GPT-5.1 is now available in the API, bringing faster adaptive reasoning, extended prompt caching, improved coding performance, and new apply_patch and shell tools.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## How Philips is scaling AI literacy across 70,000 employees
Source: https://openai.com/index/philips
Publisher: OpenAI
Category: Deployments
Sector: Healthcare and life sciences
Capability: Enterprise workflow automation
Score: 85/100
Claim: Philips is scaling AI literacy with ChatGPT Enterprise, training 70,000 employees to use AI responsibly and improve healthcare outcomes worldwide.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Introducing group chats in ChatGPT
Source: https://openai.com/index/group-chats-in-chatgpt
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 64/100
Claim: Collaborate with others, and ChatGPT, in the same conversation.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Measuring political bias in Claude
Source: https://www.anthropic.com/news/political-even-handedness
Publisher: Anthropic
Category: Benchmarks
Sector: General AI capability
Capability: Model and benchmark capability movement
Score: 86/100
Claim: We want Claude to be seen as fair and trustworthy by people across the political spectrum, and to be unbiased and even-handed in its approach to political topics. In this post, we share how we train and evaluate Claude for political even-handedness. We also report the results of a new, automated, open-source evaluation for political neutrality that we’ve.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## The state of Maryland partners with Anthropic to better serve residents
Source: https://www.anthropic.com/news/maryland-partnership
Publisher: Anthropic
Category: Deployments
Sector: Public sector
Capability: Enterprise workflow automation
Score: 89/100
Claim: The state of Maryland has announced it will use Anthropic's advanced AI models to improve government operations and better serve its more than six million residents. Under the new partnership, the state will deploy Claude across multiple state agencies to address several priorities: The partnership builds on Maryland’s existing use of Claude to improve its.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Disrupting the first reported AI-orchestrated cyber espionage campaign
Source: https://www.anthropic.com/news/disrupting-AI-espionage
Publisher: Anthropic
Category: Benchmarks
Sector: Cybersecurity
Capability: Enterprise workflow automation
Score: 76/100
Claim: We recently argued that an inflection point had been reached in cybersecurity: a point at which AI models had become genuinely useful for cybersecurity operations, both for good and for ill. This was based on systematic evaluations showing cyber capabilities doubling in six months; we’d also been tracking real-world cyberattacks, observing how malicious.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Neuro drives national retail wins with ChatGPT Business
Source: https://openai.com/index/neurogum
Publisher: OpenAI
Category: Deployments
Sector: Commerce and marketplace
Capability: Enterprise workflow automation
Score: 82/100
Claim: Neuro uses ChatGPT Business to scale nationwide with fewer than 70 employees, saving time, reducing costs, and turning faster execution across sales and operations into growth.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Fighting the New York Times’ invasion of user privacy
Source: https://openai.com/index/fighting-nyt-user-privacy-invasion
Publisher: OpenAI
Category: Vendor framing
Sector: Cybersecurity
Capability: Cyber defence and misuse monitoring
Score: 68/100
Claim: OpenAI is fighting the New York Times’ demand for 20 million private ChatGPT conversations and accelerating new security and privacy protections to protect your data.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## GPT-5.1: A smarter, more conversational ChatGPT
Source: https://openai.com/index/gpt-5-1
Publisher: OpenAI
Category: Benchmarks
Sector: General AI capability
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: We’re upgrading the GPT-5 series with warmer, more capable models and new ways to customize ChatGPT’s tone and style. GPT-5.1 starts rolling out today to paid users.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## GPT-5.1 Instant and GPT-5.1 Thinking System Card Addendum
Source: https://openai.com/index/gpt-5-system-card-addendum-gpt-5-1
Publisher: OpenAI
Category: Benchmarks
Sector: Healthcare and life sciences
Capability: Frontier model release and benchmark movement
Score: 76/100
Claim: This GPT-5 system card addendum provides updated safety metrics for GPT-5.1 Instant and Thinking, including new evaluations for mental health and emotional reliance.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Anthropic invests $50 billion in American AI infrastructure
Source: https://www.anthropic.com/news/anthropic-invests-50-billion-in-american-ai-infrastructure
Publisher: Anthropic
Category: Benchmarks
Sector: Scientific research
Capability: Frontier model release and benchmark movement
Score: 80/100
Claim: Today, we are announcing a $50 billion investment in American computing infrastructure, building data centers with Fluidstack in Texas and New York, with more sites to come. These facilities are custom built for Anthropic with a focus on maximizing efficiency for our workloads, enabling continued research and development at the frontier. The project will.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Free ChatGPT for transitioning U.S. servicemembers and veterans
Source: https://openai.com/index/chatgpt-for-veterans
Publisher: OpenAI
Category: Vendor framing
Sector: Education
Capability: Education and workforce adoption
Score: 64/100
Claim: OpenAI is offering U.S. servicemembers and veterans within 12 months of retirement or separation a free year of ChatGPT Plus to support their transition to civilian life. The tools can help with resumes, interviews, education, and planning for what’s next.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Understanding prompt injections: a frontier security challenge
Source: https://openai.com/index/prompt-injections
Publisher: OpenAI
Category: Benchmarks
Sector: Cybersecurity
Capability: Frontier model release and benchmark movement
Score: 64/100
Claim: Prompt injections are a frontier security challenge for AI systems. Learn how these attacks work and how OpenAI is advancing research, training models, and building safeguards for users.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Notion’s GPT‑5 rebuild unlocks autonomous AI workflows
Source: https://openai.com/index/notion
Publisher: OpenAI
Category: Deployments
Sector: Enterprise operations
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: Notion rebuilt its AI architecture with GPT-5 to create agents that reason, act, and adapt across workflows, unlocking faster and more flexible productivity in Notion 3.0.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## New offices in Paris and Munich expand Anthropic’s European presence
Source: https://www.anthropic.com/news/new-offices-in-paris-and-munich-expand-european-presence
Publisher: Anthropic
Category: Vendor framing
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 42/100
Claim: Today, we're announcing plans to open offices in Paris and Munich as our global operations expand across Europe. These new hubs follow recent office openings in Tokyo , Seoul , and Bengaluru and will further grow our European footprint alongside our offices in London, Dublin, and Zurich. They’re the latest example of Anthropic’s extraordinary momentum in.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## AI progress and recommendations
Source: https://openai.com/index/ai-progress-and-recommendations
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 52/100
Claim: AI is advancing fast. We have the chance to shape its progress—toward discovery, safety, and a better future for everyone.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Introducing the Teen Safety Blueprint
Source: https://openai.com/index/introducing-the-teen-safety-blueprint
Publisher: OpenAI
Category: Vendor framing
Sector: Cybersecurity
Capability: Vendor platform capability signal
Score: 52/100
Claim: Discover OpenAI’s Teen Safety Blueprint—a roadmap for building AI responsibly with safeguards, age-appropriate design, and collaboration to protect and empower young people online.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## How CRED is tapping AI to deliver premium customer experiences
Source: https://openai.com/index/cred-swamy-seetharaman
Publisher: OpenAI
Category: Deployments
Sector: Customer operations
Capability: Production AI deployment signal
Score: 85/100
Claim: CRED is improving premium customer experiences in India with OpenAI, using GPT-powered tools to boost support accuracy, cut response times, and raise customer satisfaction.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## How Chime is redefining marketing through AI
Source: https://openai.com/index/chime-vineet-mehra
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Agent platform and API infrastructure
Score: 74/100
Claim: Chime CMO Vineet Mehra shares how AI is reshaping marketing into an agent-driven model and why leaders who prioritize AI literacy and thoughtful adoption will drive growth.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## 1 million business customers putting AI to work
Source: https://openai.com/index/1-million-businesses-putting-ai-to-work
Publisher: OpenAI
Category: Benchmarks
Sector: Financial services
Capability: Enterprise workflow automation
Score: 87/100
Claim: More than 1 million business customers around the world now use OpenAI. Across healthcare, life sciences, financial services, and more, ChatGPT and our APIs are driving a new era of intelligent, AI-powered work.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Launching the Anthropic Economic Futures Programme in the UK and Europe
Source: https://www.anthropic.com/news/economic-futures-uk-europe
Publisher: Anthropic
Category: Labour market
Sector: Scientific research
Capability: Enterprise workflow automation
Score: 64/100
Claim: AI adoption is increasing rapidly in Europe and the UK, but the conversation about how to manage its effects on labor and the economy is still at a very early stage. This matters: the decisions politicians make today will affect the continent’s labor force, productivity, and growth for years to come. We want to help researchers, policymakers, and.
Oracle verdict: This is a labour-market context signal rather than a single workflow proof point. It helps the thesis track whether adoption, education, wages, and institutional behaviour are moving in the same direction as the capability curve.
Thesis relevance: Appendix III, section five: labour-market and adoption evidence
## Cognizant will make Claude available to 350,000 employees, accelerating enterprise AI adoption and internal transformation
Source: https://www.anthropic.com/news/cognizant-partnership
Publisher: Anthropic
Category: Deployments
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 95/100
Claim: Cognizant, a leading information technology consulting company, announced today that it will use Claude to help its enterprise customers and internal teams move from AI experimentation to production outcomes. Cognizant will deploy Claude to up to 350,000 employees globally, combining Claude with agentic tooling, Cognizant's engineering platforms, and.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Anthropic and Iceland announce one of the world’s first national AI education pilots
Source: https://www.anthropic.com/news/anthropic-and-iceland-announce-one-of-the-world-s-first-national-ai-education-pilots
Publisher: Anthropic
Category: Deployments
Sector: Education
Capability: Education and workforce adoption
Score: 85/100
Claim: Today, Anthropic and Iceland's Ministry of Education and Children are announcing a partnership to bring Claude to teachers across the nation, launching one of the world's first comprehensive national AI education pilots. This initiative will give teachers from every region of Iceland—from Reykjavik to the most remote villages—access to advanced AI tools as.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Introducing IndQA
Source: https://openai.com/index/introducing-indqa
Publisher: OpenAI
Category: Benchmarks
Sector: General AI capability
Capability: Frontier model release and benchmark movement
Score: 76/100
Claim: OpenAI introduces IndQA, a new benchmark for evaluating AI systems in Indian languages. Built with domain experts, IndQA tests cultural understanding and reasoning across 12 languages and 10 knowledge areas.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## AWS and OpenAI announce multi-year strategic partnership
Source: https://openai.com/index/aws-and-openai-partnership
Publisher: OpenAI
Category: Deployments
Sector: Enterprise operations
Capability: Production AI deployment signal
Score: 89/100
Claim: OpenAI and AWS have entered a multi-year, $38 billion partnership to scale advanced AI workloads. AWS will provide world-class infrastructure and compute capacity to power OpenAI’s next generation of models.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Expanding Stargate to Michigan
Source: https://openai.com/index/expanding-stargate-to-michigan
Publisher: OpenAI
Category: Labour market
Sector: Education
Capability: Education and workforce adoption
Score: 72/100
Claim: OpenAI is expanding Stargate to Michigan with a new one-gigawatt campus that strengthens America’s AI infrastructure. The project will create jobs, drive investment, and support economic growth across the Midwest.
Oracle verdict: This is a labour-market context signal rather than a single workflow proof point. It helps the thesis track whether adoption, education, wages, and institutional behaviour are moving in the same direction as the capability curve.
Thesis relevance: Appendix III, section five: labour-market and adoption evidence
## Introducing Aardvark: OpenAI’s agentic security researcher
Source: https://openai.com/index/introducing-aardvark
Publisher: OpenAI
Category: Benchmarks
Sector: Software engineering
Capability: Cyber defence and misuse monitoring
Score: 86/100
Claim: OpenAI introduces Aardvark, an AI-powered security researcher that autonomously finds, validates, and helps fix software vulnerabilities at scale. The system is in private beta—sign up to join early testing.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## How we built OWL, the new architecture behind our ChatGPT-based browser, Atlas
Source: https://openai.com/index/building-chatgpt-atlas
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 74/100
Claim: A deep dive into OWL, the new architecture powering ChatGPT Atlas—decoupling Chromium, enabling fast startup, rich UI, and agentic browsing with ChatGPT.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## gpt-oss-safeguard technical report
Source: https://openai.com/index/gpt-oss-safeguard-technical-report
Publisher: OpenAI
Category: Benchmarks
Sector: Cybersecurity
Capability: Model and benchmark capability movement
Score: 64/100
Claim: gpt-oss-safeguard-120b and gpt-oss-safeguard-20b are two open-weight reasoning models post-trained from the gpt-oss models and trained to reason from a provided policy in order to label content under that policy. In this report, we describe gpt-oss-safeguard’s capabilities and provide our baseline safety evaluations on the gpt-oss-safeguard models, using.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Introducing gpt-oss-safeguard
Source: https://openai.com/index/introducing-gpt-oss-safeguard
Publisher: OpenAI
Category: Benchmarks
Sector: Software engineering
Capability: Model and benchmark capability movement
Score: 64/100
Claim: OpenAI introduces gpt-oss-safeguard—open-weight reasoning models for safety classification that let developers apply and iterate on custom policies.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Anthropic officially opens Tokyo office, signs Memorandum of Cooperation with the Japan AI Safety Institute
Source: https://www.anthropic.com/news/opening-our-tokyo-office
Publisher: Anthropic
Category: Deployments
Sector: General AI capability
Capability: Enterprise workflow automation
Score: 63/100
Claim: This week, we opened our first Asia-Pacific office in Tokyo, a milestone in Anthropic's international expansion. Our CEO and co-founder Dario Amodei traveled to Tokyo to meet with Prime Minister Takaichi, address members of the LDP Digitization Headquarters Committee, meet customers and sign a Memorandum of Cooperation with the Japan AI Safety Institute.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Advancing organizational transformation for business innovation
Source: https://openai.com/index/dai-nippon-printing
Publisher: OpenAI
Category: Benchmarks
Sector: Public sector
Capability: Enterprise workflow automation
Score: 96/100
Claim: DNP rolled out ChatGPT Enterprise across ten core departments, achieving 95% faster patent research, 10x processing volume, 87% automation, and 70% knowledge reuse in three months.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Doppel’s AI defense system stops attacks before they spread
Source: https://openai.com/index/doppel
Publisher: OpenAI
Category: Deployments
Sector: Cybersecurity
Capability: Frontier model release and benchmark movement
Score: 94/100
Claim: Doppel uses GPT-5 and reinforcement fine-tuning to stop deepfake and impersonation attacks, cutting analyst workloads by 80% and reducing response times from hours to minutes.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## The next chapter of the Microsoft–OpenAI partnership
Source: https://openai.com/index/next-chapter-of-microsoft-openai-partnership
Publisher: OpenAI
Category: Deployments
Sector: AI infrastructure
Capability: Production AI deployment signal
Score: 85/100
Claim: Microsoft and OpenAI sign a new agreement that strengthens its long-term partnership, expands innovation, and ensures responsible AI progress.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Built to benefit everyone
Source: https://openai.com/index/built-to-benefit-everyone
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Agent platform and API infrastructure
Score: 64/100
Claim: OpenAI’s recapitalization strengthens mission-focused governance, expanding resources to ensure AI benefits everyone while advancing innovation responsibly.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Strengthening ChatGPT’s responses in sensitive conversations
Source: https://openai.com/index/strengthening-chatgpt-responses-in-sensitive-conversations
Publisher: OpenAI
Category: Vendor framing
Sector: Healthcare and life sciences
Capability: Healthcare and life-sciences reasoning
Score: 64/100
Claim: OpenAI collaborated with 170+ mental health experts to improve ChatGPT’s ability to recognize distress, respond empathetically, and guide users toward real-world support—reducing unsafe responses by up to 80%. Learn how we’re making ChatGPT safer and more supportive in sensitive moments.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Addendum to GPT-5 System Card: Sensitive conversations
Source: https://openai.com/index/gpt-5-system-card-sensitive-conversations
Publisher: OpenAI
Category: Benchmarks
Sector: Healthcare and life sciences
Capability: Frontier model release and benchmark movement
Score: 76/100
Claim: This system card details GPT-5’s improvements in handling sensitive conversations, including new benchmarks for emotional reliance, mental health, and jailbreak resistance.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Steuerrecht.com delivers client-ready legal analysis with ChatGPT
Source: https://openai.com/index/steuerrecht
Publisher: OpenAI
Category: Benchmarks
Sector: Scientific research
Capability: Enterprise workflow automation
Score: 90/100
Claim: Steuerrecht.com uses ChatGPT Business to streamline legal workflows, automate tax research, and deliver faster, client-ready analysis for law firms.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Advancing Claude for Financial Services
Source: https://www.anthropic.com/news/advancing-claude-for-financial-services
Publisher: Anthropic
Category: Benchmarks
Sector: Financial services
Capability: Financial workflow automation
Score: 86/100
Claim: We're expanding Claude for Financial Services with an Excel add-in, additional connectors to real-time market data and portfolio analytics, and new pre-built Agent Skills, like building discounted cash flow models and initiating coverage reports. These updates build on Sonnet 4.5’s state of the art performance on financial tasks, topping the Finance Agent.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## OpenAI acquires Software Applications Incorporated, maker of Sky
Source: https://openai.com/index/openai-acquires-software-applications-incorporated
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Vendor platform capability signal
Score: 64/100
Claim: OpenAI has acquired Software Applications Incorporated, maker of Sky—a natural language interface for Mac that brings AI directly into your desktop experience. Together, we’re integrating Sky’s deep macOS capabilities into ChatGPT to make AI more intuitive, contextual, and action-oriented.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Consensus accelerates research with GPT-5 and Responses API
Source: https://openai.com/index/consensus
Publisher: OpenAI
Category: Benchmarks
Sector: Scientific research
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: Consensus uses GPT-5 and OpenAI’s Responses API to power a multi-agent research assistant that reads, analyzes, and synthesizes evidence in minutes—helping over 8 million researchers accelerate scientific discovery.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Work smarter with your company knowledge in ChatGPT
Source: https://openai.com/index/introducing-company-knowledge
Publisher: OpenAI
Category: Deployments
Sector: Cybersecurity
Capability: Enterprise workflow automation
Score: 85/100
Claim: Company knowledge brings context from your apps into ChatGPT for answers specific to your business, with clear citations, security, privacy, and admin controls. Available now for Business, Enterprise, and Edu users.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## AI in South Korea—OpenAI’s Economic Blueprint
Source: https://openai.com/index/south-korea-economic-blueprint
Publisher: OpenAI
Category: Deployments
Sector: General AI capability
Capability: Education and workforce adoption
Score: 85/100
Claim: OpenAI's Korea Economic Blueprint outlines how South Korea can scale trusted AI through sovereign capabilities and strategic partnerships to drive growth.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Seoul becomes Anthropic’s third office in Asia-Pacific as we continue our international growth
Source: https://www.anthropic.com/news/seoul-becomes-third-anthropic-office-in-asia-pacific
Publisher: Anthropic
Category: Vendor framing
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 42/100
Claim: Today we're announcing plans to open an office in Seoul in early 2026 as our global operations expand into Korea. Seoul comes on the heels of new offices in Tokyo and Bengaluru, and together this expansion reflects the extraordinary momentum we're seeing across Asia-Pacific—our run rate revenue in the region has grown over 10x in the past year. The Korean.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Expanding our use of Google Cloud TPUs and Services
Source: https://www.anthropic.com/news/expanding-our-use-of-google-cloud-tpus-and-services
Publisher: Anthropic
Category: Benchmarks
Sector: Scientific research
Capability: Agent platform and API infrastructure
Score: 80/100
Claim: Today, we are announcing that we plan to expand our use of Google Cloud technologies, including up to one million TPUs, dramatically increasing our compute resources as we continue to push the boundaries of AI research and product development. The expansion is worth tens of billions of dollars and is expected to bring well over a gigawatt of capacity.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## The next chapter for UK sovereign AI
Source: https://openai.com/index/the-next-chapter-for-uk-sovereign-ai
Publisher: OpenAI
Category: Deployments
Sector: Customer operations
Capability: Enterprise workflow automation
Score: 85/100
Claim: OpenAI expands its UK partnership with a new Ministry of Justice agreement, bringing ChatGPT to civil servants. It also introduces UK data residency for ChatGPT Enterprise, ChatGPT Edu, and the API Platform to support trusted and secure AI adoption.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## AI in Japan—OpenAI’s Japan Economic Blueprint
Source: https://openai.com/index/japan-economic-blueprint
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Education and workforce adoption
Score: 64/100
Claim: OpenAI’s Japan Economic Blueprint outlines how Japan can harness AI to boost innovation, strengthen competitiveness, and enable sustainable, inclusive growth.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Continue your ChatGPT experience beyond WhatsApp
Source: https://openai.com/index/chatgpt-whatsapp-transition
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 64/100
Claim: ChatGPT will no longer be available on WhatsApp after January 15, 2026. Learn how to link your ChatGPT account and continue your conversations across devices.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Introducing ChatGPT Atlas, the browser with ChatGPT built in
Source: https://openai.com/index/introducing-chatgpt-atlas
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 64/100
Claim: ChatGPT Atlas, the browser with ChatGPT built it. Get instant answers, summaries, and smart web help—right from any page. With privacy settings you can control. Available now for MacOS.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## A statement from Dario Amodei on Anthropic's commitment to American AI leadership
Source: https://www.anthropic.com/news/statement-dario-amodei-american-ai-leadership
Publisher: Anthropic
Category: Vendor framing
Sector: Enterprise operations
Capability: Vendor platform capability signal
Score: 64/100
Claim: A statement from Anthropic CEO Dario Amodei on Anthropic’s commitment to advancing America's leadership in building powerful and beneficial AI. Anthropic is built on a simple principle: AI should be a force for human progress, not peril . That means making products that are genuinely useful , speaking honestly about risks and benefits, and working with.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Claude for Life Sciences
Source: https://www.anthropic.com/news/claude-for-life-sciences
Publisher: Anthropic
Category: Benchmarks
Sector: Healthcare and life sciences
Capability: Healthcare and life-sciences reasoning
Score: 86/100
Claim: Increasing the rate of scientific progress is a core part of Anthropic’s public benefit mission. We are focused on building the tools to allow researchers to make new discoveries – and eventually, to allow AI models to make these discoveries autonomously.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Plex Coffee delivers fast, personal service with ChatGPT
Source: https://openai.com/index/plex-coffee
Publisher: OpenAI
Category: Deployments
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 82/100
Claim: Learn how Plex Coffee uses ChatGPT Business to centralize knowledge, train staff faster, and preserve personal connections while expanding.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Introducing Claude Haiku 4.5
Source: https://www.anthropic.com/news/claude-haiku-4-5
Publisher: Anthropic
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: Claude Haiku 4.5, our latest small model, is available today to all users. What was recently at the frontier is now cheaper and faster. Five months ago, Claude Sonnet 4 was a state-of-the-art model. Today, Claude Haiku 4.5 gives you similar levels of coding performance but at one-third the cost and more than twice the speed.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Expert Council on Well-Being and AI
Source: https://openai.com/index/expert-council-on-well-being-and-ai
Publisher: OpenAI
Category: Benchmarks
Sector: Healthcare and life sciences
Capability: Healthcare and life-sciences reasoning
Score: 76/100
Claim: OpenAI’s new Expert Council on Well-Being and AI brings together leading psychologists, clinicians, and researchers to guide how ChatGPT supports emotional health, especially for teens. Learn how their insights are shaping safer, more caring AI experiences.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Anthropic and Salesforce expand partnership to bring Claude to regulated industries
Source: https://www.anthropic.com/news/salesforce-anthropic-expanded-partnership
Publisher: Anthropic
Category: Benchmarks
Sector: Financial services
Capability: Financial workflow automation
Score: 93/100
Claim: Anthropic and Salesforce today announced an expanded partnership to make Claude a preferred model for Salesforce's Agentforce platform, enabling Salesforce customers in financial services, healthcare, cybersecurity, and life sciences to use trusted AI while keeping sensitive data secure. Additionally, Salesforce is deploying Claude Code across its global.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## OpenAI and Broadcom announce strategic collaboration to deploy 10 gigawatts of OpenAI-designed AI accelerators
Source: https://openai.com/index/openai-and-broadcom-announce-strategic-collaboration
Publisher: OpenAI
Category: Deployments
Sector: AI infrastructure
Capability: Production AI deployment signal
Score: 85/100
Claim: OpenAI and Broadcom announce a multi-year partnership to deploy 10 gigawatts of OpenAI-designed AI accelerators, co-developing next-generation systems and Ethernet solutions to power scalable, energy-efficient AI infrastructure by 2029.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## HYGH speeds development and campaigns with ChatGPT Business
Source: https://openai.com/index/hygh
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Enterprise workflow automation
Score: 64/100
Claim: HYGH speeds up software development and campaign delivery with ChatGPT Business, cutting turnaround times, scaling output, and driving revenue growth.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Defining and evaluating political bias in LLMs
Source: https://openai.com/index/defining-and-evaluating-political-bias-in-llms
Publisher: OpenAI
Category: Benchmarks
Sector: General AI capability
Capability: Model and benchmark capability movement
Score: 76/100
Claim: Learn how OpenAI evaluates political bias in ChatGPT through new real-world testing methods that improve objectivity and reduce bias.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## HiBob turns 2,500 GPTs into product and team growth
Source: https://openai.com/index/hibob
Publisher: OpenAI
Category: Deployments
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 95/100
Claim: Discover how HiBob uses ChatGPT Enterprise and custom GPTs to scale AI adoption, boost revenue, streamline HR workflows, and deliver AI-powered features in the Bob platform.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Rahul Patil joins Anthropic as Chief Technology Officer
Source: https://www.anthropic.com/news/rahul-patil-joins-anthropic
Publisher: Anthropic
Category: Benchmarks
Sector: Cybersecurity
Capability: Enterprise workflow automation
Score: 61/100
Claim: We're excited to announce that Rahul Patil has joined Anthropic as our Chief Technology Officer. Rahul will oversee our engineering organization across product, compute, infrastructure, inference, data science, and security as we scale Claude to meet growing enterprise demand worldwide. Rahul brings over 20 years of experience building and maintaining.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Expanding our global operations to India with our second Asia Pacific office
Source: https://www.anthropic.com/news/expanding-global-operations-to-india
Publisher: Anthropic
Category: Vendor framing
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 42/100
Claim: Today we’re announcing that we’re expanding our global operations to India, with plans to open an office in Bengaluru in early 2026. Bengaluru will serve as our second office in Asia Pacific after Tokyo , which will open in the coming months. This expansion will help us serve India’s rapidly growing AI ecosystem and reflects the increasing international.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Codex is now generally available
Source: https://openai.com/index/codex-now-generally-available
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 52/100
Claim: OpenAI Codex is now generally available with powerful new features for developers: a Slack integration, Codex SDK, and admin tools like usage dashboards and workspace management—making Codex easier to use and manage at scale.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Introducing apps in ChatGPT and the new Apps SDK
Source: https://openai.com/index/introducing-apps-in-chatgpt
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Agent platform and API infrastructure
Score: 64/100
Claim: We’re introducing a new generation of apps you can chat with, right inside ChatGPT. Developers can start building them today with the new Apps SDK, available in preview.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## AMD and OpenAI announce strategic partnership to deploy 6 gigawatts of AMD GPUs
Source: https://openai.com/index/openai-amd-strategic-partnership
Publisher: OpenAI
Category: Deployments
Sector: AI infrastructure
Capability: Production AI deployment signal
Score: 85/100
Claim: AMD and OpenAI have announced a multi-year partnership to deploy 6 gigawatts of AMD Instinct GPUs, beginning with 1 gigawatt in 2026, to power OpenAI’s next-generation AI infrastructure and accelerate global AI innovation.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Introducing AgentKit, new Evals, and RFT for agents
Source: https://openai.com/index/introducing-agentkit
Publisher: OpenAI
Category: Benchmarks
Sector: Software engineering
Capability: Agent platform and API infrastructure
Score: 90/100
Claim: Today, we’re releasing new tools to help developers go from prototype to production faster: AgentKit, expanded evals capabilities, and reinforcement fine-tuning for agents.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Deloitte will make Claude available to 470,000 people across its global network
Source: https://www.anthropic.com/news/deloitte-anthropic-partnership
Publisher: Anthropic
Category: Deployments
Sector: Enterprise operations
Capability: Production AI deployment signal
Score: 78/100
Claim: Anthropic and Deloitte today announced an expanded alliance that will make Claude available to Deloitte people across its global network and develop new industry-specific solutions powered by Claude. As part of the collaboration, Deloitte will establish a Claude Center of Excellence with trained specialists who will develop implementation frameworks, share.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## With GPT-5, Wrtn builds lifestyle AI for millions in Korea
Source: https://openai.com/index/wrtn
Publisher: OpenAI
Category: Vendor framing
Sector: Enterprise operations
Capability: Frontier model release and benchmark movement
Score: 80/100
Claim: Wrtn scaled AI apps to 6.5M users in Korea with GPT-5, creating ‘Lifestyle AI’ that blends productivity, creativity, and learning—now expanding across East Asia.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Samsung and SK join OpenAI’s Stargate initiative to advance global AI infrastructure
Source: https://openai.com/index/samsung-and-sk-join-stargate
Publisher: OpenAI
Category: Vendor framing
Sector: AI infrastructure
Capability: Vendor platform capability signal
Score: 64/100
Claim: Samsung and SK join OpenAI’s Stargate initiative to expand global AI infrastructure, scaling advanced memory chip production and building next-gen data centers in Korea.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Sora 2 System Card
Source: https://openai.com/index/sora-2-system-card
Publisher: OpenAI
Category: Benchmarks
Sector: Media and content
Capability: Multimodal content generation and media workflows
Score: 54/100
Claim: Sora 2 is our new state of the art video and audio generation model. Building on the foundation of Sora, this new model introduces capabilities that have been difficult for prior video models to achieve– such as more accurate physics, sharper realism, synchronized audio, enhanced steerability, and an expanded stylistic range.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Launching Sora responsibly
Source: https://openai.com/index/launching-sora-responsibly
Publisher: OpenAI
Category: Vendor framing
Sector: Media and content
Capability: Multimodal content generation and media workflows
Score: 42/100
Claim: To address the novel safety challenges posed by a state-of-the-art video model as well as a new social creation platform, we’ve built Sora 2 and the Sora app with safety at the foundation. Our approach is anchored in concrete protections.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Sora 2 is here
Source: https://openai.com/index/sora-2
Publisher: OpenAI
Category: Vendor framing
Sector: Media and content
Capability: Multimodal content generation and media workflows
Score: 64/100
Claim: Our latest video generation model is more physically accurate, realistic, and controllable than prior systems. It also features synchronized dialogue and sound effects. Create with it in the new Sora app.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Building OpenAI with OpenAI
Source: https://openai.com/index/building-openai-with-openai
Publisher: OpenAI
Category: Vendor framing
Sector: Enterprise operations
Capability: Vendor platform capability signal
Score: 64/100
Claim: At OpenAI, we rely on our own technology to help streamline work, scale expertise, and drive outcomes. In our new series, OpenAI on OpenAI, we share lessons to help other organizations do the same.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Driving sales productivity and customer success at OpenAI
Source: https://openai.com/index/openai-gtm-assistant
Publisher: OpenAI
Category: Deployments
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 89/100
Claim: Learn how OpenAI boosts sales productivity by automating prep, centralizing knowledge, and scaling top-selling practices.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Converting inbound leads into customers at OpenAI
Source: https://openai.com/index/openai-inbound-sales-assistant
Publisher: OpenAI
Category: Deployments
Sector: General AI capability
Capability: Production AI deployment signal
Score: 85/100
Claim: Learn how OpenAI used AI to deliver personalized answers at scale, converting inbound leads into customers.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Improving support with every interaction at OpenAI
Source: https://openai.com/index/openai-support-model
Publisher: OpenAI
Category: Deployments
Sector: Customer operations
Capability: Production AI deployment signal
Score: 78/100
Claim: Learn how OpenAI uses AI to enhance support, cutting response times, improving quality, and scaling to meet hypergrowth.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Turning contracts into searchable data at OpenAI
Source: https://openai.com/index/openai-contract-data-agent
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 64/100
Claim: OpenAI built a system to extract contract data quickly, cutting turnaround times and making it easier for teams to access the details they need.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Empowering teams to unlock insights faster at OpenAI
Source: https://openai.com/index/openai-research-assistant
Publisher: OpenAI
Category: Benchmarks
Sector: Customer operations
Capability: Model and benchmark capability movement
Score: 80/100
Claim: OpenAI’s research assistant helps teams analyze millions of support tickets, surface insights faster, and scale curiosity across the company.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Combating online child sexual exploitation & abuse
Source: https://openai.com/index/combating-online-child-sexual-exploitation-abuse
Publisher: OpenAI
Category: Vendor framing
Sector: Cybersecurity
Capability: Cyber defence and misuse monitoring
Score: 64/100
Claim: Discover how OpenAI combats online child sexual exploitation and abuse with strict usage policies, advanced detection tools, and industry collaboration to block, report, and prevent AI misuse.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Introducing parental controls
Source: https://openai.com/index/introducing-parental-controls
Publisher: OpenAI
Category: Vendor framing
Sector: Enterprise operations
Capability: Vendor platform capability signal
Score: 64/100
Claim: We’re rolling out parental controls and a new parent resource page to help families guide how ChatGPT works in their homes.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Buy it in ChatGPT: Instant Checkout and the Agentic Commerce Protocol
Source: https://openai.com/index/buy-it-in-chatgpt
Publisher: OpenAI
Category: Vendor framing
Sector: Commerce and marketplace
Capability: Enterprise workflow automation
Score: 74/100
Claim: We’re taking first steps toward agentic commerce in ChatGPT with new ways for people, AI agents, and businesses to shop together.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Enabling Claude Code to work more autonomously
Source: https://www.anthropic.com/news/enabling-claude-code-to-work-more-autonomously
Publisher: Anthropic
Category: Deployments
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 88/100
Claim: We’re introducing several upgrades to Claude Code : a native VS Code extension, version 2.0 of our terminal interface, and checkpoints for autonomous operation. Powered by Sonnet 4.5 , Claude Code now handles longer, more complex development tasks in your terminal and IDE. VS Code extension.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Introducing Claude Sonnet 4.5
Source: https://www.anthropic.com/news/claude-sonnet-4-5
Publisher: Anthropic
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: Claude Sonnet 4.5 is the best coding model in the world. It's the strongest model for building complex agents. It’s the best model at using computers. And it shows substantial gains in reasoning and math. Code is everywhere. It runs every application, spreadsheet, and software tool you use. Being able to use those tools and reason through hard problems is.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Partnering with AARP to help keep older adults safe online
Source: https://openai.com/index/aarp-partnership-older-adults-online-safety
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 64/100
Claim: OpenAI and AARP are partnering to help older adults stay safe online with new AI training, scam-spotting tools, and nationwide programs through OpenAI Academy and OATS’s Senior Planet initiative.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Anthropic expands global leadership in enterprise AI, naming Chris Ciauri as Managing Director of International
Source: https://www.anthropic.com/news/anthropic-expands-global-leadership-in-enterprise-ai-naming-chris-ciauri-as-managing-director-of
Publisher: Anthropic
Category: Deployments
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 63/100
Claim: Today we're announcing Anthropic's expanded global presence with key leadership appointments, enterprise customer momentum, and new international offices across multiple continents. This expansion reflects Anthropic's growth trajectory and increasing international demand for Claude. Anthropic has the top market share in enterprise AI*, and our run-rate.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## More ways to work with your team and tools in ChatGPT
Source: https://openai.com/index/more-ways-to-work-with-your-team
Publisher: OpenAI
Category: Vendor framing
Sector: Cybersecurity
Capability: Cyber defence and misuse monitoring
Score: 64/100
Claim: New shared projects, smarter connectors, and compliance and security updates help teams get more done.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Measuring the performance of our models on real-world tasks
Source: https://openai.com/index/gdpval
Publisher: OpenAI
Category: Benchmarks
Sector: General AI capability
Capability: Education and workforce adoption
Score: 76/100
Claim: OpenAI introduces GDPval, a new evaluation that measures model performance on real-world economically valuable tasks across 44 occupations.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Introducing ChatGPT Pulse
Source: https://openai.com/index/introducing-chatgpt-pulse
Publisher: OpenAI
Category: Benchmarks
Sector: Scientific research
Capability: Model and benchmark capability movement
Score: 76/100
Claim: Today we're releasing a preview of ChatGPT Pulse to Pro users on mobile. Pulse is a new experience where ChatGPT proactively does research to deliver personalized updates based on your chats, feedback, and connected apps like your calendar.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## ENEOS Materials brings ChatGPT Enterprise to manufacturing
Source: https://openai.com/index/eneos-materials
Publisher: OpenAI
Category: Benchmarks
Sector: Scientific research
Capability: Enterprise workflow automation
Score: 81/100
Claim: ENEOS Materials uses ChatGPT Enterprise to speed research, improve plant design safety, and cut HR analysis time by 90%, with 80% reporting better workflows.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## OpenAI, Oracle, and SoftBank expand Stargate with five new AI datacenter sites
Source: https://openai.com/index/five-new-stargate-sites
Publisher: OpenAI
Category: Labour market
Sector: Financial services
Capability: Financial workflow automation
Score: 72/100
Claim: OpenAI, Oracle, and SoftBank announce five new Stargate AI datacenter sites, accelerating a $500B, 10-gigawatt U.S. infrastructure buildout to power next-generation AI and create tens of thousands of jobs.
Oracle verdict: This is a labour-market context signal rather than a single workflow proof point. It helps the thesis track whether adoption, education, wages, and institutional behaviour are moving in the same direction as the capability curve.
Thesis relevance: Appendix III, section five: labour-market and adoption evidence
## CNA is transforming its newsroom with AI
Source: https://openai.com/index/cna-walter-fernandez
Publisher: OpenAI
Category: Vendor framing
Sector: Media and content
Capability: Vendor platform capability signal
Score: 64/100
Claim: In this Executive Function series from OpenAI, discover how CNA is transforming its newsroom with AI. Editor-in-Chief Walter Fernandez shares insights on AI adoption, culture, and the future of journalism.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## SchoolAI builds an AI platform that empowers teachers
Source: https://openai.com/index/schoolai
Publisher: OpenAI
Category: Deployments
Sector: Education
Capability: Education and workforce adoption
Score: 82/100
Claim: SchoolAI uses GPT-4.1, image generation, and TTS to power safe, teacher-guided AI tools for over 1 million classrooms, improving engagement, oversight, and personalized learning.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## OpenAI and NVIDIA announce strategic partnership to deploy 10 gigawatts of NVIDIA systems
Source: https://openai.com/index/openai-nvidia-systems-partnership
Publisher: OpenAI
Category: Deployments
Sector: AI infrastructure
Capability: Production AI deployment signal
Score: 85/100
Claim: OpenAI and NVIDIA announce a strategic partnership to deploy 10 gigawatts of AI datacenters powered by NVIDIA systems, with the first phase launching in 2026.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Detecting and reducing scheming in AI models
Source: https://openai.com/index/detecting-and-reducing-scheming-in-ai-models
Publisher: OpenAI
Category: Benchmarks
Sector: Scientific research
Capability: Frontier model release and benchmark movement
Score: 76/100
Claim: Apollo Research and OpenAI developed evaluations for hidden misalignment (“scheming”) and found behaviors consistent with scheming in controlled tests across frontier models. The team shared concrete examples and stress tests of an early method to reduce scheming.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Introducing Stargate UK
Source: https://openai.com/index/introducing-stargate-uk
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 64/100
Claim: Official OpenAI release: Introducing Stargate UK.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Building towards age prediction
Source: https://openai.com/index/building-towards-age-prediction
Publisher: OpenAI
Category: Vendor framing
Sector: Customer operations
Capability: Vendor platform capability signal
Score: 64/100
Claim: Learn how OpenAI is building age prediction and parental controls in ChatGPT to create safer, age-appropriate experiences for teens while supporting families with new tools.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Teen safety, freedom, and privacy
Source: https://openai.com/index/teen-safety-freedom-and-privacy
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 52/100
Claim: Explore OpenAI’s approach to balancing teen safety, freedom, and privacy in AI use.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Introducing upgrades to Codex
Source: https://openai.com/index/introducing-upgrades-to-codex
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 78/100
Claim: Codex just got faster, more reliable, and better at real-time collaboration and tackling tasks independently anywhere you develop—whether via the terminal, IDE, web, or even your phone.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## How people are using ChatGPT
Source: https://openai.com/index/how-people-are-using-chatgpt
Publisher: OpenAI
Category: Benchmarks
Sector: Scientific research
Capability: Education and workforce adoption
Score: 76/100
Claim: New research from the largest study of ChatGPT use shows how the tool creates economic value through both personal and professional use. Adoption is broadening beyond early users, closing gaps and making AI a part of everyday life.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Addendum to GPT-5 system card: GPT-5-Codex
Source: https://openai.com/index/gpt-5-system-card-addendum-gpt-5-codex
Publisher: OpenAI
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 86/100
Claim: This addendum to the GPT-5 system card shares a new model: GPT-5-Codex, a version of GPT-5 further optimized for agentic coding in Codex. GPT-5-Codex adjusts its thinking effort more dynamically based on task complexity, responding quickly to simple conversational queries or small tasks, while independently working for longer on more complex tasks.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Claude is now generally available in Xcode
Source: https://www.anthropic.com/news/claude-in-xcode
Publisher: Anthropic
Category: Deployments
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 90/100
Claim: Developers can now connect their Claude account to Xcode 26 to power coding intelligence features with Claude Sonnet 4. Xcode is Apple's integrated development environment (IDE) and offers the tools you need to develop, test, and distribute apps for Apple platforms. This integration lets developers use Claude's coding capabilities directly in their.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Working with US CAISI and UK AISI to build more secure AI systems
Source: https://openai.com/index/us-caisi-uk-aisi-ai-update
Publisher: OpenAI
Category: Deployments
Sector: Cybersecurity
Capability: Cyber defence and misuse monitoring
Score: 73/100
Claim: OpenAI shares progress on the partnership with the US CAISI and UK AISI to strengthen AI safety and security.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Strengthening our safeguards through collaboration with US CAISI and UK AISI
Source: https://www.anthropic.com/news/strengthening-our-safeguards-through-collaboration-with-us-caisi-and-uk-aisi
Publisher: Anthropic
Category: Deployments
Sector: Cybersecurity
Capability: Cyber defence and misuse monitoring
Score: 73/100
Claim: Over the past year, we've collaborated with the US Center for AI Standards and Innovation (CAISI) and UK AI Security Institute (AISI), government bodies established to measure and improve the security of AI systems. Our voluntary work together began as initial consultations, but over time evolved to an ongoing partnership where CAISI and AISI teams were.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## A joint statement from OpenAI and Microsoft
Source: https://openai.com/index/joint-statement-from-openai-and-microsoft
Publisher: OpenAI
Category: Deployments
Sector: AI infrastructure
Capability: Production AI deployment signal
Score: 73/100
Claim: OpenAI and Microsoft sign a new MOU, reinforcing their partnership and shared commitment to AI safety and innovation.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Statement on OpenAI’s Nonprofit and PBC
Source: https://openai.com/index/statement-on-openai-nonprofit-and-pbc
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 54/100
Claim: OpenAI reaffirms its nonprofit leadership with a new structure granting equity in its PBC, enabling over $100B in resources to advance safe, beneficial AI for humanity.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## SafetyKit scales risk agents with OpenAI’s most capable models
Source: https://openai.com/index/safetykit
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Frontier model release and benchmark movement
Score: 74/100
Claim: Discover how SafetyKit leverages OpenAI GPT-5 to enhance content moderation, enforce compliance, and outpace legacy safety systems with greater accuracy .
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## A People-First AI Fund: $50M to support nonprofits
Source: https://openai.com/index/people-first-ai-fund
Publisher: OpenAI
Category: Vendor framing
Sector: Education
Capability: Education and workforce adoption
Score: 54/100
Claim: Applications are now open for OpenAI’s People-First AI Fund, a $50M initiative supporting U.S. nonprofits advancing education, community innovation, and economic opportunity. Apply by October 8, 2025, for unrestricted grants that help communities shape AI for the public good.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Anthropic is endorsing SB 53
Source: https://www.anthropic.com/news/anthropic-is-endorsing-sb-53
Publisher: Anthropic
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 76/100
Claim: Anthropic is endorsing SB 53 , the California bill that governs powerful AI systems built by frontier AI developers like Anthropic. We’ve long advocated for thoughtful AI regulation and our support for this bill comes after careful consideration of the lessons learned from California's previous attempt at AI regulation ( SB 1047 ). While we believe that.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Why language models hallucinate
Source: https://openai.com/index/why-language-models-hallucinate
Publisher: OpenAI
Category: Benchmarks
Sector: Scientific research
Capability: Model and benchmark capability movement
Score: 64/100
Claim: OpenAI’s new research explains why language models hallucinate. The findings show how improved evaluations can enhance AI reliability, honesty, and safety.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Expanding economic opportunity with AI
Source: https://openai.com/index/expanding-economic-opportunity-with-ai
Publisher: OpenAI
Category: Labour market
Sector: Enterprise operations
Capability: Education and workforce adoption
Score: 72/100
Claim: OpenAI is launching a Jobs Platform and new Certifications to connect workers with jobs, training, and certifications. Learn how we’re expanding economic opportunity and making AI skills more accessible.
Oracle verdict: This is a labour-market context signal rather than a single workflow proof point. It helps the thesis track whether adoption, education, wages, and institutional behaviour are moving in the same direction as the capability curve.
Thesis relevance: Appendix III, section five: labour-market and adoption evidence
## Updating restrictions of sales to unsupported regions
Source: https://www.anthropic.com/news/updating-restrictions-of-sales-to-unsupported-regions
Publisher: Anthropic
Category: Vendor framing
Sector: Cybersecurity
Capability: Cyber defence and misuse monitoring
Score: 42/100
Claim: Anthropic's Terms of Service prohibit use of our services in certain regions due to legal, regulatory, and security risks. However, companies from these restricted regions—including adversarial nations like China—continue accessing our services in various ways, such as through subsidiaries incorporated in other countries. Companies subject to control from.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Anthropic Signs White House Pledge to America's Youth: Investing in AI Education
Source: https://www.anthropic.com/news/anthropic-signs-pledge-to-americas-youth-investing-in-ai-education
Publisher: Anthropic
Category: Vendor framing
Sector: Education
Capability: Education and workforce adoption
Score: 54/100
Claim: Following our August signing of the White House's ' Pledge to America's Youth: Investing in AI Education' , today we joined companies across the country at the White House's AI Education Taskforce event, deepening our commitment to helping America's students build essential skills to excel and lead with AI. Anthropic has made three concrete commitments.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Vijaye Raji to become CTO of Applications with acquisition of Statsig
Source: https://openai.com/index/vijaye-raji-to-become-cto-of-applications-with-acquisition-of-statsig
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 64/100
Claim: Vijaye Raji will step into a new role as CTO of Applications, reporting to CEO of Applications, Fidji Simo, following the acquisition of Statsig.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Building more helpful ChatGPT experiences for everyone
Source: https://openai.com/index/building-more-helpful-chatgpt-experiences-for-everyone
Publisher: OpenAI
Category: Benchmarks
Sector: General AI capability
Capability: Model and benchmark capability movement
Score: 76/100
Claim: We’re partnering with experts, strengthening protections for teens with parental controls, and routing sensitive conversations to reasoning models in ChatGPT.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Anthropic raises $13B Series F at $183B post-money valuation
Source: https://www.anthropic.com/news/anthropic-raises-series-f-at-usd183b-post-money-valuation
Publisher: Anthropic
Category: Benchmarks
Sector: Scientific research
Capability: Model and benchmark capability movement
Score: 68/100
Claim: Anthropic has completed a Series F fundraising of $13 billion led by ICONIQ. This financing values Anthropic at $183 billion post-money. Along with ICONIQ, the round was co-led by Fidelity Management & Research Company and Lightspeed Venture Partners. The investment reflects Anthropic’s continued momentum and reinforces our position as the leading.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Introducing gpt-realtime and Realtime API updates
Source: https://openai.com/index/introducing-gpt-realtime
Publisher: OpenAI
Category: Vendor framing
Sector: Customer operations
Capability: Multimodal content generation and media workflows
Score: 64/100
Claim: We’re releasing a more advanced speech-to-speech model and new API capabilities including MCP server support, image input, and SIP phone calling support.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Supporting nonprofit and community innovation
Source: https://openai.com/index/supporting-nonprofit-and-community-innovation
Publisher: OpenAI
Category: Benchmarks
Sector: Healthcare and life sciences
Capability: Healthcare and life-sciences reasoning
Score: 66/100
Claim: OpenAI launches a $50M People-First AI Fund to help U.S. nonprofits scale impact with AI. Applications open Sept 8–Oct 8, 2025 for grants in education, healthcare, research, and more.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Updates to Consumer Terms and Privacy Policy
Source: https://www.anthropic.com/news/updates-to-our-consumer-terms
Publisher: Anthropic
Category: Vendor framing
Sector: Cybersecurity
Capability: Vendor platform capability signal
Score: 42/100
Claim: Today, we're rolling out updates to our Consumer Terms and Privacy Policy that will help us deliver even more capable, useful AI models. We're now giving users the choice to allow their data to be used to improve Claude and strengthen our safeguards against harmful usage like scams and abuse. Adjusting your preferences is easy and can be done at any time.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Collective alignment: public input on our Model Spec
Source: https://openai.com/index/collective-alignment-aug-2025-updates
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Agent platform and API infrastructure
Score: 64/100
Claim: OpenAI surveyed over 1,000 people worldwide on how AI should behave and compared their views to our Model Spec. Learn how collective alignment is shaping AI defaults to better reflect diverse human values and perspectives.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## OpenAI and Anthropic share findings from a joint safety evaluation
Source: https://openai.com/index/openai-anthropic-safety-evaluation
Publisher: OpenAI
Category: Benchmarks
Sector: General AI capability
Capability: Model and benchmark capability movement
Score: 64/100
Claim: OpenAI and Anthropic share findings from a first-of-its-kind joint safety evaluation, testing each other’s models for misalignment, instruction following, hallucinations, jailbreaking, and more—highlighting progress, challenges, and the value of cross-lab collaboration.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Introducing the Anthropic National Security and Public Sector Advisory Council
Source: https://www.anthropic.com/news/introducing-the-anthropic-national-security-and-public-sector-advisory-council
Publisher: Anthropic
Category: Vendor framing
Sector: Cybersecurity
Capability: Cyber defence and misuse monitoring
Score: 52/100
Claim: Today, we are announcing the formation of the Anthropic National Security and Public Sector Advisory Council, a group of leading bipartisan national security and public policy practitioners who will help Anthropic support the U.S. government and closely allied democracies in building and maintaining enduring technological advantages in an era of strategic.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Detecting and countering misuse of AI: August 2025
Source: https://www.anthropic.com/news/detecting-countering-misuse-aug-2025
Publisher: Anthropic
Category: Vendor framing
Sector: Cybersecurity
Capability: Cyber defence and misuse monitoring
Score: 52/100
Claim: We’ve developed sophisticated safety and security measures to prevent the misuse of our AI models. But cybercriminals and other malicious actors are actively attempting to find ways around them. Today, we’re releasing a report that details how. Our Threat Intelligence report discusses several recent examples of Claude being misused, including a large-scale.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Anthropic Education Report: How educators use Claude
Source: https://www.anthropic.com/news/anthropic-education-report-how-educators-use-claude
Publisher: Anthropic
Category: Labour market
Sector: Education
Capability: Education and workforce adoption
Score: 76/100
Claim: Understandably, much of the conversation of AI in education focuses on how students are using large language models to help them study and write. But educators use AI too. In a recent Gallup survey, teachers reported that AI tools saved them an average of 5.9 hours per week. And in an inversion of the usual discussion, students have begun expressing.
Oracle verdict: This is a labour-market context signal rather than a single workflow proof point. It helps the thesis track whether adoption, education, wages, and institutional behaviour are moving in the same direction as the capability curve.
Thesis relevance: Appendix III, section five: labour-market and adoption evidence
## Helping people when they need it most
Source: https://openai.com/index/helping-people-when-they-need-it-most
Publisher: OpenAI
Category: Vendor framing
Sector: Enterprise operations
Capability: Vendor platform capability signal
Score: 52/100
Claim: How we think about safety for users experiencing mental or emotional distress, the limits of today’s systems, and the work underway to refine them.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Accelerating life sciences research
Source: https://openai.com/index/accelerating-life-sciences-research-with-retro-biosciences
Publisher: OpenAI
Category: Benchmarks
Sector: Healthcare and life sciences
Capability: Healthcare and life-sciences reasoning
Score: 76/100
Claim: Discover how a specialized AI model, GPT-4b micro, helped OpenAI and Retro Bio engineer more effective proteins for stem cell therapy and longevity research.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Scaling domain expertise in complex, regulated domains
Source: https://openai.com/index/blue-j
Publisher: OpenAI
Category: Benchmarks
Sector: Scientific research
Capability: Model and benchmark capability movement
Score: 76/100
Claim: Discover how Blue J is transforming tax research with AI-powered tools built on GPT-4.1. By combining domain expertise with Retrieval-Augmented Generation, Blue J delivers fast, accurate, and fully-cited tax answers—trusted by professionals across the US, Canada, and the UK.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Developing nuclear safeguards for AI through public-private partnership
Source: https://www.anthropic.com/news/developing-nuclear-safeguards-for-ai-through-public-private-partnership
Publisher: Anthropic
Category: Deployments
Sector: Cybersecurity
Capability: Cyber defence and misuse monitoring
Score: 63/100
Claim: Nuclear technology is inherently dual-use: the same physics principles that power nuclear reactors can be misused for weapons development. As AI models become more capable, we need to keep a close eye on whether they can provide users with dangerous technical knowledge in ways that could threaten national security. Information relating to nuclear weapons.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Anthropic launches higher education advisory board and AI Fluency courses
Source: https://www.anthropic.com/news/anthropic-higher-education-initiatives
Publisher: Anthropic
Category: Vendor framing
Sector: Education
Capability: Education and workforce adoption
Score: 42/100
Claim: The choices made in the next few years about how AI enters the classroom will shape a generation's relationship with both technology and learning. Today, we're announcing two initiatives for AI in education to help navigate these critical decisions: a Higher Education Advisory Board to guide Claude's development for education, and three AI Fluency courses.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Mixi reimagines communication with ChatGPT
Source: https://openai.com/index/mixi
Publisher: OpenAI
Category: Deployments
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 89/100
Claim: Discover how MIXI, a leader in digital entertainment and lifestyle services in Japan, uses ChatGPT Enterprise to transform productivity, boost AI adoption across teams, and create a secure environment for innovation.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Claude Code and new admin controls for business plans
Source: https://www.anthropic.com/news/claude-code-on-team-and-enterprise
Publisher: Anthropic
Category: Deployments
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 95/100
Claim: Enterprise and Team customers can now upgrade to premium seats that include more usage and Claude Code—bringing our app and powerful coding agent together under one subscription. Users can move seamlessly between ideation and implementation, while admins get the visibility and controls they need to scale Claude across their organization. We are also.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Q&A with DoorDash’s CPO, Mariana Garavaglia
Source: https://openai.com/index/doordash-mariana-garavaglia
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Enterprise workflow automation
Score: 46/100
Claim: Learn how DoorDash is scaling AI adoption to empower employees to build, learn, and innovate faster in a conversation with Chief People Officer Mariana Garavaglia.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Usage policy update
Source: https://www.anthropic.com/news/usage-policy-update
Publisher: Anthropic
Category: Deployments
Sector: Enterprise operations
Capability: Production AI deployment signal
Score: 56/100
Claim: Today, we’re sharing some updates to our Usage Policy that reflect the growing capabilities and evolving usage of our products. Our Usage Policy serves as a framework for how Claude should and shouldn’t be used, providing clear guidance for everyone who uses Anthropic’s products. In this update, our goal is to provide greater clarity and detail on our.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Scaling accounting capacity with OpenAI
Source: https://openai.com/index/basis
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Frontier model release and benchmark movement
Score: 86/100
Claim: Built with OpenAI o3, o3-Pro, GPT-4.1, and GPT-5, Basis’ AI agents help accounting firms save up to 30% of their time and expand capacity for advisory and growth.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Offering expanded Claude access across all three branches of the U.S. government
Source: https://www.anthropic.com/news/offering-expanded-claude-access-across-all-three-branches-of-government
Publisher: Anthropic
Category: Deployments
Sector: Public sector
Capability: Enterprise workflow automation
Score: 85/100
Claim: Today we are removing barriers to government AI adoption by offering Claude for Enterprise and Claude for Government to all three branches of government, including federal civilian executive branch agencies, as well as legislative and judiciary branches of government, for $1. As AI adoption leads to transformation across industries, we want to ensure that.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Building safeguards for Claude
Source: https://www.anthropic.com/news/building-safeguards-for-claude
Publisher: Anthropic
Category: Vendor framing
Sector: Cybersecurity
Capability: Cyber defence and misuse monitoring
Score: 56/100
Claim: Claude empowers millions of users to tackle complex challenges, spark creativity, and deepen their understanding of the world. We want to amplify human potential while ensuring our models’ capabilities are channeled toward beneficial outcomes. This means continuously refining how we support our users’ learning and problem-solving, while preventing misuse.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## GPT-5 and the new era of work
Source: https://openai.com/index/gpt-5-new-era-of-work
Publisher: OpenAI
Category: Benchmarks
Sector: Enterprise operations
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: GPT-5 is OpenAI’s most advanced model—transforming enterprise AI, automation, and workforce productivity in the new era of intelligent work.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Introducing GPT-5 for developers
Source: https://openai.com/index/introducing-gpt-5-for-developers
Publisher: OpenAI
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: Introducing GPT-5 in our API platform—offering high reasoning performance, new controls for devs, and best-in-class results on real coding tasks.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## Coding and design with GPT-5
Source: https://openai.com/index/gpt-5-coding-design
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 76/100
Claim: Learn how GPT-5 unlocks new possibilities in coding and design.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Creative writing with GPT-5
Source: https://openai.com/index/gpt-5-creative-writing
Publisher: OpenAI
Category: Vendor framing
Sector: Media and content
Capability: Frontier model release and benchmark movement
Score: 76/100
Claim: Learn how GPT-5 assists with creative writing.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## Medical research with GPT-5
Source: https://openai.com/index/gpt-5-medical-research
Publisher: OpenAI
Category: Benchmarks
Sector: Healthcare and life sciences
Capability: Frontier model release and benchmark movement
Score: 88/100
Claim: Learn how GPT-5 is used for medical research.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## First look at GPT-5
Source: https://openai.com/index/gpt-5-first-look
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 76/100
Claim: See how a group of leading developers use GPT-5 for the first time.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## GPT-5 System Card
Source: https://openai.com/index/gpt-5-system-card
Publisher: OpenAI
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 76/100
Claim: This GPT-5 system card explains how a unified model routing system powers fast and smart responses using gpt-5-main, gpt-5-thinking, and lightweight versions like gpt-5-thinking-nano, optimized for different tasks and developer use.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
## From hard refusals to safe-completions: toward output-centric safety training
Source: https://openai.com/index/gpt-5-safe-completions
Publisher: OpenAI
Category: Vendor framing
Sector: AI infrastructure
Capability: Frontier model release and benchmark movement
Score: 64/100
Claim: Discover how OpenAI's new safe-completions approach in GPT-5 improves both safety and helpfulness in AI responses—moving beyond hard refusals to nuanced, output-centric safety training for handling dual-use prompts.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence
## How Cursor uses GPT-5
Source: https://openai.com/index/gpt-5-cursor
Publisher: OpenAI
Category: Deployments
Sector: General AI capability
Capability: Frontier model release and benchmark movement
Score: 90/100
Claim: Learn how Cursor uses GPT-5.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## How Amgen uses GPT-5
Source: https://openai.com/index/gpt-5-amgen
Publisher: OpenAI
Category: Deployments
Sector: General AI capability
Capability: Frontier model release and benchmark movement
Score: 90/100
Claim: Learn how Amgen uses GPT-5.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence
## Introducing GPT-5
Source: https://openai.com/index/introducing-gpt-5
Publisher: OpenAI
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: We are introducing GPT‑5, our best AI system yet. GPT‑5 is a significant leap in intelligence over all our previous models, featuring state-of-the-art performance across coding, math, writing, health, visual perception, and more.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
Claude Opus 4 (claude-opus-4-8) introduces extended thinking, interleaved reasoning, and the ability to run hundreds of parallel subagents unattended in fully autonomous agentic workflows. Anthropic highlights use cases where the model replaces attorneys and engineers, writes full codebases autonomously, and handles open-ended multi-step tasks without human supervision.
Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Introducing Claude Opus 4
Source: https://www.anthropic.com/news/claude-opus-4-8
Claim: Claude Opus 4 (claude-opus-4-8) introduces extended thinking, interleaved reasoning, and the ability to run hundreds of parallel subagents unattended in fully autonomous agentic workflows. Anthropic highlights use cases where the model replaces attorneys and engineers, writes full codebases autonomously, and handles open-ended multi-step tasks without human supervision.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksLegal educationExpert-level legal tutoring surpassing human instructors
LLMs rated at 75.33% win rate over expert law professors in blinded evaluation; Claude Opus 4.7 ranked #1; all AI models outperformed every human instructor; LLM harmful-response rate (3.53%) vs professors (12.06%)
This paper is a tombstone written by the people whose graves it is marking. The authors conducted one of the most methodologically careful studies of professional AI displacement published in legal academia, documented the results with statistical precision, and filed it under benchmark evaluation. The cope is institutional: the authors work at institutions whose value proposition depends on the human expertise they just measured as inferior. The omission of labor market implications is not an oversight -- it is load-bearing architecture.
Law Professors Prefer AI Over Peer Answers
Source: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6849678
Claim: LLMs rated at 75.33% win rate over expert law professors in blinded evaluation; Claude Opus 4.7 ranked #1; all AI models outperformed every human instructor; LLM harmful-response rate (3.53%) vs professors (12.06%)
Oracle verdict: This paper is a tombstone written by the people whose graves it is marking. The authors conducted one of the most methodologically careful studies of professional AI displacement published in legal academia, documented the results with statistical precision, and filed it under benchmark evaluation. The cope is institutional: the authors work at institutions whose value proposition depends on the human expertise they just measured as inferior. The omission of labor market implications is not an oversight -- it is load-bearing architecture.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksEnterprise operationsFrontier model release and benchmark movement
Databricks uses GPT-5.5 for enterprise agent workflows after the model set a new state of the art on the OfficeQA Pro benchmark.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Databricks brings GPT-5.5 to enterprise agent workflows
Source: https://openai.com/index/databricks
Claim: Databricks uses GPT-5.5 for enterprise agent workflows after the model set a new state of the art on the OfficeQA Pro benchmark.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksHealthcare and life sciencesHealthcare and life-sciences reasoning
We’re partnering with the Gates Foundation to commit $200 million in grant funding, Claude usage credits, and technical support for programs in global health, life sciences, education, and economic mobility over the next four years. These programs will be implemented with partners in the US and around the world. This commitment is central to Anthropic’s.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Anthropic forms $200 million partnership with the Gates Foundation
Source: https://www.anthropic.com/news/gates-foundation-partnership
Claim: We’re partnering with the Gates Foundation to commit $200 million in grant funding, Claude usage credits, and technical support for programs in global health, life sciences, education, and economic mobility over the next four years. These programs will be implemented with partners in the US and around the world. This commitment is central to Anthropic’s.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksSoftware engineeringAutonomous software engineering and computer-use agents
Parameter Golf brought together 1,000+ participants and 2,000+ submissions to explore AI-assisted machine learning research, coding agents, quantization, and novel model design under strict constraints.
OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
What Parameter Golf taught us about AI-assisted research
Source: https://openai.com/index/what-parameter-golf-taught-us
Claim: Parameter Golf brought together 1,000+ participants and 2,000+ submissions to explore AI-assisted machine learning research, coding agents, quantization, and novel model design under strict constraints.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksSoftware engineeringFrontier model release and benchmark movement
Teams use Codex with GPT-5.5 to ship production systems and turn research ideas into runnable experiments.
OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
How NVIDIA engineers and researchers build with Codex
Source: https://openai.com/index/nvidia
Claim: Teams use Codex with GPT-5.5 to ship production systems and turn research ideas into runnable experiments.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksEnterprise operationsFrontier model release and benchmark movement
OpenAI launches DeployCo, a new enterprise deployment company built to help organizations bring frontier AI into production and turn it into measurable business impact.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
OpenAI launches DeployCo to help businesses build around intelligence
Source: https://openai.com/index/openai-launches-the-deployment-company
Claim: OpenAI launches DeployCo, a new enterprise deployment company built to help organizations bring frontier AI into production and turn it into measurable business impact.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksCybersecurityFrontier model release and benchmark movement
OpenAI expands Trusted Access for Cyber with GPT-5.5 and GPT-5.5-Cyber, helping verified defenders accelerate vulnerability research and protect critical infrastructure.
OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Scaling Trusted Access for Cyber with GPT-5.5 and GPT-5.5-Cyber
Source: https://openai.com/index/gpt-5-5-with-trusted-access-for-cyber
Claim: OpenAI expands Trusted Access for Cyber with GPT-5.5 and GPT-5.5-Cyber, helping verified defenders accelerate vulnerability research and protect critical infrastructure.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksEducationEducation and workforce adoption
Meet the ChatGPT Futures Class of 2026—26 student innovators using AI to build, research, and drive real-world impact. Discover how this generation is redefining learning, creativity, and opportunity with ChatGPT.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Introducing ChatGPT Futures: Class of 2026
Source: https://openai.com/index/introducing-chatgpt-futures-class-of-2026
Claim: Meet the ChatGPT Futures Class of 2026—26 student innovators using AI to build, research, and drive real-world impact. Discover how this generation is redefining learning, creativity, and opportunity with ChatGPT.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksSoftware engineeringFrontier model release and benchmark movement
OpenAI’s B2B Signals research shows how frontier enterprises deepen AI adoption, scale Codex-powered agentic workflows, and build durable competitive advantage.
OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
How frontier firms are pulling ahead
Source: https://openai.com/index/introducing-b2b-signals
Claim: OpenAI’s B2B Signals research shows how frontier enterprises deepen AI adoption, scale Codex-powered agentic workflows, and build durable competitive advantage.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksGeneral AI capabilityFrontier model release and benchmark movement
Official OpenAI release: GPT-5.5 Instant System Card.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
GPT-5.5 Instant System Card
Source: https://openai.com/index/gpt-5-5-instant-system-card
Claim: Official OpenAI release: GPT-5.5 Instant System Card.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksGeneral AI capabilityFrontier model release and benchmark movement
GPT-5.5 Instant updates ChatGPT’s default model with smarter, more accurate answers, reduced hallucinations, and improved personalization controls.
OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
GPT-5.5 Instant: smarter, clearer, and more personalized
Source: https://openai.com/index/gpt-5-5-instant
Claim: GPT-5.5 Instant updates ChatGPT’s default model with smarter, more accurate answers, reduced hallucinations, and improved personalization controls.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksGeneral AI capabilityFrontier model release and benchmark movement
Official OpenAI release: GPT-5.5 System Card.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
GPT-5.5 System Card
Source: https://openai.com/index/gpt-5-5-system-card
Claim: Official OpenAI release: GPT-5.5 System Card.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksSoftware engineeringFrontier model release and benchmark movement
Introducing GPT-5.5, our smartest model yet—faster, more capable, and built for complex tasks like coding, research, and data analysis across tools.
OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Introducing GPT-5.5
Source: https://openai.com/index/introducing-gpt-5-5
Claim: Introducing GPT-5.5, our smartest model yet—faster, more capable, and built for complex tasks like coding, research, and data analysis across tools.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksGeneral AI capabilityFrontier model release and benchmark movement
Explore the GPT-5.5 Bio Bug Bounty: a red-teaming challenge to find universal jailbreaks for bio safety risks, with rewards up to $25,000.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
GPT-5.5 Bio Bug Bounty
Source: https://openai.com/index/gpt-5-5-bio-bug-bounty
Claim: Explore the GPT-5.5 Bio Bug Bounty: a red-teaming challenge to find universal jailbreaks for bio safety risks, with rewards up to $25,000.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksHealthcare and life sciencesHealthcare and life-sciences reasoning
OpenAI makes ChatGPT for Clinicians free for verified U.S. physicians, nurse practitioners, and pharmacists, supporting clinical care, documentation, and research.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Making ChatGPT better for clinicians
Source: https://openai.com/index/making-chatgpt-better-for-clinicians
Claim: OpenAI makes ChatGPT for Clinicians free for verified U.S. physicians, nurse practitioners, and pharmacists, supporting clinical care, documentation, and research.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksScientific researchFrontier model release and benchmark movement
Today, we’re launching Claude Design, a new Anthropic Labs product that lets you collaborate with Claude to create polished visual work like designs, prototypes, slides, one-pagers, and more. Claude Design is powered by our most capable vision model, Claude Opus 4.7 , and is available in research preview for Claude Pro, Max, Team, and Enterprise.
Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Introducing Claude Design by Anthropic Labs
Source: https://www.anthropic.com/news/claude-design-anthropic-labs
Claim: Today, we’re launching Claude Design, a new Anthropic Labs product that lets you collaborate with Claude to create polished visual work like designs, prototypes, slides, one-pagers, and more. Claude Design is powered by our most capable vision model, Claude Opus 4.7 , and is available in research preview for Claude Pro, Max, Team, and Enterprise.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksHealthcare and life sciencesFrontier model release and benchmark movement
OpenAI introduces GPT-Rosalind, a frontier reasoning model built to accelerate drug discovery, genomics analysis, protein reasoning, and scientific research workflows.
OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Introducing GPT-Rosalind for life sciences research
Source: https://openai.com/index/introducing-gpt-rosalind
Claim: OpenAI introduces GPT-Rosalind, a frontier reasoning model built to accelerate drug discovery, genomics analysis, protein reasoning, and scientific research workflows.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksSoftware engineeringFrontier model release and benchmark movement
Our latest model, Claude Opus 4.7, is now generally available. Opus 4.7 is a notable improvement on Opus 4.6 in advanced software engineering, with particular gains on the most difficult tasks. Users report being able to hand off their hardest coding work—the kind that previously needed close supervision—to Opus 4.7 with confidence. Opus 4.7 handles.
Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Introducing Claude Opus 4.7
Source: https://www.anthropic.com/news/claude-opus-4-7
Claim: Our latest model, Claude Opus 4.7, is now generally available. Opus 4.7 is a notable improvement on Opus 4.6 in advanced software engineering, with particular gains on the most difficult tasks. Users report being able to hand off their hardest coding work—the kind that previously needed close supervision—to Opus 4.7 with confidence. Opus 4.7 handles.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksHealthcare and life sciencesEnterprise workflow automation
Vas Narasimhan has been appointed to Anthropic's Board of Directors by the Anthropic Long-Term Benefit Trust. He is a physician-scientist and the Chief Executive Officer of Novartis—one of the world's leading innovative medicines companies—and shares Anthropic’s conviction that healthcare and life sciences are among the areas where AI has the greatest.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Anthropic’s Long-Term Benefit Trust appoints Vas Narasimhan to Board of Directors
Source: https://www.anthropic.com/news/narasimhan-board
Claim: Vas Narasimhan has been appointed to Anthropic's Board of Directors by the Anthropic Long-Term Benefit Trust. He is a physician-scientist and the Chief Executive Officer of Novartis—one of the world's leading innovative medicines companies—and shares Anthropic’s conviction that healthcare and life sciences are among the areas where AI has the greatest.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksSoftware engineeringFrontier model release and benchmark movement
OpenAI outlines the next phase of enterprise AI, as adoption accelerates across industries with Frontier, ChatGPT Enterprise, Codex, and company-wide AI agents.
OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
The next phase of enterprise AI
Source: https://openai.com/index/next-phase-of-enterprise-ai
Claim: OpenAI outlines the next phase of enterprise AI, as adoption accelerates across industries with Frontier, ChatGPT Enterprise, Codex, and company-wide AI agents.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksCustomer operationsModel and benchmark capability movement
A pilot program to support independent safety and alignment research and develop the next generation of talent.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Announcing the OpenAI Safety Fellowship
Source: https://openai.com/index/introducing-openai-safety-fellowship
Claim: A pilot program to support independent safety and alignment research and develop the next generation of talent.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksAI infrastructureFrontier model release and benchmark movement
We have signed a new agreement with Google and Broadcom for multiple gigawatts of next-generation TPU capacity that we expect to come online starting in 2027. This significant expansion of our compute infrastructure will power our frontier Claude models and help us serve extraordinary demand from customers worldwide. “This groundbreaking partnership with.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Anthropic expands partnership with Google and Broadcom for multiple gigawatts of next-generation compute
Source: https://www.anthropic.com/news/google-broadcom-partnership-compute
Claim: We have signed a new agreement with Google and Broadcom for multiple gigawatts of next-generation TPU capacity that we expect to come online starting in 2027. This significant expansion of our compute infrastructure will power our frontier Claude models and help us serve extraordinary demand from customers worldwide. “This groundbreaking partnership with.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksSoftware engineeringFrontier model release and benchmark movement
OpenAI raises $122 billion in new funding to expand frontier AI globally, invest in next-generation compute, and meet growing demand for ChatGPT, Codex, and enterprise AI.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Accelerating the next phase of AI
Source: https://openai.com/index/accelerating-the-next-phase-ai
Claim: OpenAI raises $122 billion in new funding to expand frontier AI globally, invest in next-generation compute, and meet growing demand for ChatGPT, Codex, and enterprise AI.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksPublic sectorModel and benchmark capability movement
Today, Anthropic signed a Memorandum of Understanding with the Australian government to cooperate on AI safety research and support the goals of Australia’s National AI Plan. Our CEO, Dario Amodei, met with Prime Minister Anthony Albanese to formalize the agreement during a visit to Canberra, Australia. We also announced AUD$3 million in partnerships with.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Australian government and Anthropic sign MOU for AI safety and research
Source: https://www.anthropic.com/news/australia-MOU
Claim: Today, Anthropic signed a Memorandum of Understanding with the Australian government to cooperate on AI safety research and support the goals of Australia’s National AI Plan. Our CEO, Dario Amodei, met with Prime Minister Anthony Albanese to formalize the agreement during a visit to Canberra, Australia. We also announced AUD$3 million in partnerships with.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksSoftware engineeringFrontier model release and benchmark movement
GPT-5.4 mini and nano are smaller, faster versions of GPT-5.4 optimized for coding, tool use, multimodal reasoning, and high-volume API and sub-agent workloads.
OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Introducing GPT-5.4 mini and nano
Source: https://openai.com/index/introducing-gpt-5-4-mini-and-nano
Claim: GPT-5.4 mini and nano are smaller, faster versions of GPT-5.4 optimized for coding, tool use, multimodal reasoning, and high-volume API and sub-agent workloads.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksScientific researchModel and benchmark capability movement
We’re launching The Anthropic Institute , a new effort to confront the most significant challenges that powerful AI will pose to our societies. The Anthropic Institute will draw on research from across Anthropic to provide information that other researchers and the public can use during our transition to a world containing much more powerful AI systems. In.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Introducing The Anthropic Institute
Source: https://www.anthropic.com/news/the-anthropic-institute
Claim: We’re launching The Anthropic Institute , a new effort to confront the most significant challenges that powerful AI will pose to our societies. The Anthropic Institute will draw on research from across Anthropic to provide information that other researchers and the public can use during our transition to a world containing much more powerful AI systems. In.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksCybersecurityFrontier model release and benchmark movement
IH-Challenge trains models to prioritize trusted instructions, improving instruction hierarchy, safety steerability, and resistance to prompt injection attacks.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Improving instruction hierarchy in frontier LLMs
Source: https://openai.com/index/instruction-hierarchy-challenge
Claim: IH-Challenge trains models to prioritize trusted instructions, improving instruction hierarchy, safety steerability, and resistance to prompt injection attacks.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksEducationEducation and workforce adoption
ChatGPT introduces interactive visual explanations for math and science, helping students explore formulas, variables, and concepts in real time.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
New ways to learn math and science in ChatGPT
Source: https://openai.com/index/new-ways-to-learn-math-and-science-in-chatgpt
Claim: ChatGPT introduces interactive visual explanations for math and science, helping students explore formulas, variables, and concepts in real time.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksSoftware engineeringAutonomous software engineering and computer-use agents
Codex Security is an AI application security agent that analyzes project context to detect, validate, and patch complex vulnerabilities with higher confidence and less noise.
OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Codex Security: now in research preview
Source: https://openai.com/index/codex-security-now-in-research-preview
Claim: Codex Security is an AI application security agent that analyzes project context to detect, validate, and patch complex vulnerabilities with higher confidence and less noise.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
By combining rigorous model evaluation, full-platform use of OpenAI, and agent workflows, Balyasny is reinventing investment research.
OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
How Balyasny Asset Management built an AI research engine
Source: https://openai.com/index/balyasny-asset-management
Claim: By combining rigorous model evaluation, full-platform use of OpenAI, and agent workflows, Balyasny is reinventing investment research.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksMedia and contentMultimodal content generation and media workflows
Using OpenAI reasoning models, Descript unlocked automatic localization of large content libraries without losing timing or meaning.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
How Descript engineers multilingual video dubbing at scale
Source: https://openai.com/index/descript
Claim: Using OpenAI reasoning models, Descript unlocked automatic localization of large content libraries without losing timing or meaning.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksSoftware engineeringCyber defence and misuse monitoring
AI models can now independently identify high-severity vulnerabilities in complex software. As we recently documented, Claude found more than 500 zero-day vulnerabilities (security flaws that are unknown to the software’s maintainers) in well-tested open-source software. In this post, we share details of a collaboration with researchers at Mozilla in which.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Partnering with Mozilla to improve Firefox’s security
Source: https://www.anthropic.com/news/mozilla-firefox-security
Claim: AI models can now independently identify high-severity vulnerabilities in complex software. As we recently documented, Claude found more than 500 zero-day vulnerabilities (security flaws that are unknown to the software’s maintainers) in well-tested open-source software. In this post, we share details of a collaboration with researchers at Mozilla in which.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksSoftware engineeringFrontier model release and benchmark movement
Introducing GPT-5.4, OpenAI’s most most capable and efficient frontier model for professional work, with state-of-the-art coding, computer use, tool search, and 1M-token context.
OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Introducing GPT-5.4
Source: https://openai.com/index/introducing-gpt-5-4
Claim: Introducing GPT-5.4, OpenAI’s most most capable and efficient frontier model for professional work, with state-of-the-art coding, computer use, tool search, and 1M-token context.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksGeneral AI capabilityFrontier model release and benchmark movement
Official OpenAI release: GPT-5.4 Thinking System Card.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
GPT-5.4 Thinking System Card
Source: https://openai.com/index/gpt-5-4-thinking-system-card
Claim: Official OpenAI release: GPT-5.4 Thinking System Card.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksCybersecurityModel and benchmark capability movement
OpenAI introduces CoT-Control and finds reasoning models struggle to control their chains of thought, reinforcing monitorability as an AI safety safeguard.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Reasoning models struggle to control their chains of thought, and that’s good
Source: https://openai.com/index/reasoning-models-chain-of-thought-controllability
Claim: OpenAI introduces CoT-Control and finds reasoning models struggle to control their chains of thought, reinforcing monitorability as an AI safety safeguard.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksFinancial servicesFrontier model release and benchmark movement
OpenAI introduces ChatGPT for Excel and new financial app integrations, powered by GPT-5.4 to accelerate modeling, research, and analysis in regulated environments.
OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Introducing ChatGPT for Excel and new financial data integrations
Source: https://openai.com/index/chatgpt-for-excel
Claim: OpenAI introduces ChatGPT for Excel and new financial app integrations, powered by GPT-5.4 to accelerate modeling, research, and analysis in regulated environments.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksGeneral AI capabilityFrontier model release and benchmark movement
Official OpenAI release: GPT-5.3 Instant: Smoother, more useful everyday conversations.
OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
GPT-5.3 Instant: Smoother, more useful everyday conversations
Source: https://openai.com/index/gpt-5-3-instant
Claim: Official OpenAI release: GPT-5.3 Instant: Smoother, more useful everyday conversations.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksGeneral AI capabilityFrontier model release and benchmark movement
Official OpenAI release: GPT-5.3 Instant System Card.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
GPT-5.3 Instant System Card
Source: https://openai.com/index/gpt-5-3-instant-system-card
Claim: Official OpenAI release: GPT-5.3 Instant System Card.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksScientific researchModel and benchmark capability movement
Microsoft and OpenAI continue to work closely across research, engineering, and product development, building on years of deep collaboration and shared success.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Joint Statement from OpenAI and Microsoft
Source: https://openai.com/index/continuing-microsoft-partnership
Claim: Microsoft and OpenAI continue to work closely across research, engineering, and product development, building on years of deep collaboration and shared success.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksEnterprise operationsFrontier model release and benchmark movement
OpenAI and Amazon announce a strategic partnership bringing OpenAI’s Frontier platform to AWS, expanding AI infrastructure, custom models, and enterprise AI agents.
OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
OpenAI and Amazon announce strategic partnership
Source: https://openai.com/index/amazon-partnership
Claim: OpenAI and Amazon announce a strategic partnership bringing OpenAI’s Frontier platform to AWS, expanding AI infrastructure, custom models, and enterprise AI agents.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksSoftware engineeringFrontier model release and benchmark movement
OpenAI and Pacific Northwest National Laboratory introduce DraftNEPABench, a new benchmark evaluating how AI coding agents can accelerate federal permitting—showing potential to reduce NEPA drafting time by up to 15% and modernize infrastructure reviews.
OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Pacific Northwest National Laboratory and OpenAI partner to accelerate federal permitting
Source: https://openai.com/index/pacific-northwest-national-laboratory
Claim: OpenAI and Pacific Northwest National Laboratory introduce DraftNEPABench, a new benchmark evaluating how AI coding agents can accelerate federal permitting—showing potential to reduce NEPA drafting time by up to 15% and modernize infrastructure reviews.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksPublic sectorFrontier model release and benchmark movement
I believe deeply in the existential importance of using AI to defend the United States and other democracies, and to defeat our autocratic adversaries. Anthropic has therefore worked proactively to deploy our models to the Department of War and the intelligence community. We were the first frontier AI company to deploy our models in the US government’s.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Statement from Dario Amodei on our discussions with the Department of War
Source: https://www.anthropic.com/news/statement-department-of-war
Claim: I believe deeply in the existential importance of using AI to defend the United States and other democracies, and to defeat our autocratic adversaries. Anthropic has therefore worked proactively to deploy our models to the Department of War and the intelligence community. We were the first frontier AI company to deploy our models in the US government’s.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksMedia and contentAutonomous software engineering and computer-use agents
People are using Claude for increasingly complex work—writing and running code across entire repositories, synthesizing research from dozens of sources, and managing workflows that span multiple tools and teams. Computer use enables Claude to do all of that inside live applications, the way a person at a keyboard would. That means Claude can take on.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Anthropic acquires Vercept to advance Claude's computer use capabilities
Source: https://www.anthropic.com/news/acquires-vercept
Claim: People are using Claude for increasingly complex work—writing and running code across entire repositories, synthesizing research from dozens of sources, and managing workflows that span multiple tools and teams. Computer use enables Claude to do all of that inside live applications, the way a person at a keyboard would. That means Claude can take on.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksSoftware engineeringFrontier model release and benchmark movement
SWE-bench Verified is increasingly contaminated and mismeasures frontier coding progress. Our analysis shows flawed tests and training leakage. We recommend SWE-bench Pro.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Why we no longer evaluate SWE-bench Verified
Source: https://openai.com/index/why-we-no-longer-evaluate-swe-bench-verified
Claim: SWE-bench Verified is increasingly contaminated and mismeasures frontier coding progress. Our analysis shows flawed tests and training leakage. We recommend SWE-bench Pro.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksEnterprise operationsFrontier model release and benchmark movement
OpenAI announces Frontier Alliance Partners to help enterprises move from AI pilots to production with secure, scalable agent deployments.
OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
OpenAI announces Frontier Alliance Partners
Source: https://openai.com/index/frontier-alliance-partners
Claim: OpenAI announces Frontier Alliance Partners to help enterprises move from AI pilots to production with secure, scalable agent deployments.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksScientific researchModel and benchmark capability movement
We share our AI model’s proof attempts for the First Proof math challenge, testing research-grade reasoning on expert-level problems.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Our First Proof submissions
Source: https://openai.com/index/first-proof-submissions
Claim: We share our AI model’s proof attempts for the First Proof math challenge, testing research-grade reasoning on expert-level problems.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksSoftware engineeringFrontier model release and benchmark movement
Claude Code Security , a new capability built into Claude Code on the web, is now available in a limited research preview. It scans codebases for security vulnerabilities and suggests targeted software patches for human review, allowing teams to find and fix security issues that traditional methods often miss. Security teams face a common challenge: too.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Making frontier cybersecurity capabilities available to defenders
Source: https://www.anthropic.com/news/claude-code-security
Claim: Claude Code Security , a new capability built into Claude Code on the web, is now available in a limited research preview. It scans codebases for security vulnerabilities and suggests targeted software patches for human review, allowing teams to find and fix security issues that traditional methods often miss. Security teams face a common challenge: too.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksCybersecurityCyber defence and misuse monitoring
OpenAI commits $7.5M to The Alignment Project to fund independent AI alignment research, strengthening global efforts to address AGI safety and security risks.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Advancing independent research on AI alignment
Source: https://openai.com/index/advancing-independent-research-ai-alignment
Claim: OpenAI commits $7.5M to The Alignment Project to fund independent AI alignment research, strengthening global efforts to address AGI safety and security risks.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksGeneral AI capabilityFrontier model release and benchmark movement
OpenAI and Paradigm introduce EVMbench, a benchmark evaluating AI agents’ ability to detect, patch, and exploit high-severity smart contract vulnerabilities.
OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Introducing EVMbench
Source: https://openai.com/index/introducing-evmbench
Claim: OpenAI and Paradigm introduce EVMbench, a benchmark evaluating AI agents’ ability to detect, patch, and exploit high-severity smart contract vulnerabilities.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksSoftware engineeringFrontier model release and benchmark movement
Claude Sonnet 4.6 is our most capable Sonnet model yet . It’s a full upgrade of the model’s skills across coding, computer use, long-context reasoning, agent planning, knowledge work, and design. Sonnet 4.6 also features a 1M token context window in beta. For those on our Free and Pro plans , Claude Sonnet 4.6 is now the default model in claude.ai and.
Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Introducing Claude Sonnet 4.6
Source: https://www.anthropic.com/news/claude-sonnet-4-6
Claim: Claude Sonnet 4.6 is our most capable Sonnet model yet . It’s a full upgrade of the model’s skills across coding, computer use, long-context reasoning, agent planning, knowledge work, and design. Sonnet 4.6 also features a 1M token context window in beta. For those on our Free and Pro plans , Claude Sonnet 4.6 is now the default model in claude.ai and.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
India is the second-largest market for Claude.ai , home to a developer community doing some of the most technically intense AI work we see anywhere. Nearly half of Claude usage in India comprises computer and mathematical tasks: building applications, modernizing systems, and shipping production software. Today, as we officially open our Bengaluru office.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Anthropic opens Bengaluru office and announces new partnerships across India
Source: https://www.anthropic.com/news/bengaluru-office-partnerships-across-india
Claim: India is the second-largest market for Claude.ai , home to a developer community doing some of the most technically intense AI work we see anywhere. Nearly half of Claude usage in India comprises computer and mathematical tasks: building applications, modernizing systems, and shipping production software. Today, as we officially open our Bengaluru office.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksScientific researchFrontier model release and benchmark movement
A new preprint shows GPT-5.2 proposing a new formula for a gluon amplitude, later formally proved and verified by OpenAI and academic collaborators.
OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
GPT-5.2 derives a new result in theoretical physics
Source: https://openai.com/index/new-result-theoretical-physics
Claim: A new preprint shows GPT-5.2 proposing a new formula for a gluon amplitude, later formally proved and verified by OpenAI and academic collaborators.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksMedia and contentMultimodal content generation and media workflows
GABRIEL is a new open-source toolkit from OpenAI that uses GPT to turn qualitative text and images into quantitative data, helping social scientists analyze research at scale.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Scaling social science research
Source: https://openai.com/index/scaling-social-science-research
Claim: GABRIEL is a new open-source toolkit from OpenAI that uses GPT to turn qualitative text and images into quantitative data, helping social scientists analyze research at scale.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksSoftware engineeringAutonomous software engineering and computer-use agents
Anthropic is partnering with CodePath, the nation’s largest provider of collegiate computer science education, to redesign its coding curriculum as AI reshapes the field of software development. CodePath will put Claude and Claude Code at the center of its courses and career programs, giving more than 20,000 students at community colleges, state schools.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Anthropic partners with CodePath to bring Claude to the US’s largest collegiate computer science program
Source: https://www.anthropic.com/news/anthropic-codepath-partnership
Claim: Anthropic is partnering with CodePath, the nation’s largest provider of collegiate computer science education, to redesign its coding curriculum as AI reshapes the field of software development. CodePath will put Claude and Claude Code at the center of its courses and career programs, giving more than 20,000 students at community colleges, state schools.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksSoftware engineeringFrontier model release and benchmark movement
Introducing GPT-5.3-Codex-Spark—our first real-time coding model. 15x faster generation, 128k context, now in research preview for ChatGPT Pro users.
OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Introducing GPT-5.3-Codex-Spark
Source: https://openai.com/index/introducing-gpt-5-3-codex-spark
Claim: Introducing GPT-5.3-Codex-Spark—our first real-time coding model. 15x faster generation, 128k context, now in research preview for ChatGPT Pro users.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksCybersecurityCyber defence and misuse monitoring
AI will bring enormous benefits —for science, technology, medicine, economic growth, and much more. But a technology this powerful also comes with considerable risks . Those risks might come from the misuse of the models: AI is already being exploited to automate cyberattacks ; in the future it might assist in the production of dangerous weapons . Risks.
Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Anthropic is donating $20 million to Public First Action
Source: https://www.anthropic.com/news/donate-public-first-action
Claim: AI will bring enormous benefits —for science, technology, medicine, economic growth, and much more. But a technology this powerful also comes with considerable risks . Those risks might come from the misuse of the models: AI is already being exploited to automate cyberattacks ; in the future it might assist in the production of dangerous weapons . Risks.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksScientific researchFrontier model release and benchmark movement
We have raised $30 billion in Series G funding led by GIC and Coatue, valuing Anthropic at $380 billion post-money. The round was co-led by D. E. Shaw Ventures, Dragoneer, Founders Fund, ICONIQ, and MGX. The investment will fuel the frontier research, product development, and infrastructure expansions that have made Anthropic the market leader in.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Anthropic raises $30 billion in Series G funding at $380 billion post-money valuation
Source: https://www.anthropic.com/news/anthropic-raises-30-billion-series-g-funding-380-billion-post-money-valuation
Claim: We have raised $30 billion in Series G funding led by GIC and Coatue, valuing Anthropic at $380 billion post-money. The round was co-led by D. E. Shaw Ventures, Dragoneer, Founders Fund, ICONIQ, and MGX. The investment will fuel the frontier research, product development, and infrastructure expansions that have made Anthropic the market leader in.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksAI infrastructureFrontier model release and benchmark movement
As we continue to invest in American AI infrastructure , Anthropic will cover electricity price increases that consumers face from our data centers. Training a single frontier AI model will soon require gigawatts of power, and the US AI sector will need at least 50 gigawatts of capacity over the next several years. The country needs to build new data.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Covering electricity price increases from our data centers
Source: https://www.anthropic.com/news/covering-electricity-price-increases
Claim: As we continue to invest in American AI infrastructure , Anthropic will cover electricity price increases that consumers face from our data centers. Training a single frontier AI model will soon require gigawatts of power, and the US AI sector will need at least 50 gigawatts of capacity over the next several years. The country needs to build new data.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksEnterprise operationsFrontier model release and benchmark movement
OpenAI shares its approach to AI localization, showing how globally shared frontier models can be adapted to local languages, laws, and cultures without compromising safety.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Making AI work for everyone, everywhere: our approach to localization
Source: https://openai.com/index/our-approach-to-localization
Claim: OpenAI shares its approach to AI localization, showing how globally shared frontier models can be adapted to local languages, laws, and cultures without compromising safety.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksHealthcare and life sciencesFrontier model release and benchmark movement
An autonomous lab combining OpenAI’s GPT-5 with Ginkgo Bioworks’ cloud automation cut cell-free protein synthesis costs by 40% through closed-loop experimentation.
OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
GPT-5 lowers the cost of cell-free protein synthesis
Source: https://openai.com/index/gpt-5-lowers-protein-synthesis-cost
Claim: An autonomous lab combining OpenAI’s GPT-5 with Ginkgo Bioworks’ cloud automation cut cell-free protein synthesis costs by 40% through closed-loop experimentation.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksCybersecurityFrontier model release and benchmark movement
OpenAI introduces Trusted Access for Cyber, a trust-based framework that expands access to frontier cyber capabilities while strengthening safeguards against misuse.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Introducing Trusted Access for Cyber
Source: https://openai.com/index/trusted-access-for-cyber
Claim: OpenAI introduces Trusted Access for Cyber, a trust-based framework that expands access to frontier cyber capabilities while strengthening safeguards against misuse.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksEnterprise operationsFrontier model release and benchmark movement
OpenAI Frontier is an enterprise platform for building, deploying, and managing AI agents with shared context, onboarding, permissions, and governance.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Introducing OpenAI Frontier
Source: https://openai.com/index/introducing-openai-frontier
Claim: OpenAI Frontier is an enterprise platform for building, deploying, and managing AI agents with shared context, onboarding, permissions, and governance.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksSoftware engineeringFrontier model release and benchmark movement
GPT-5.3-Codex is a Codex-native agent that pairs frontier coding performance with general reasoning to support long-horizon, real-world technical work.
OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Introducing GPT-5.3-Codex
Source: https://openai.com/index/introducing-gpt-5-3-codex
Claim: GPT-5.3-Codex is a Codex-native agent that pairs frontier coding performance with general reasoning to support long-horizon, real-world technical work.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksSoftware engineeringFrontier model release and benchmark movement
GPT‑5.3-Codex is the most capable agentic coding model to date, combining the frontier coding performance of GPT‑5.2-Codex with the reasoning and professional knowledge capabilities of GPT‑5.2.
OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
GPT-5.3-Codex System Card
Source: https://openai.com/index/gpt-5-3-codex-system-card
Claim: GPT‑5.3-Codex is the most capable agentic coding model to date, combining the frontier coding performance of GPT‑5.2-Codex with the reasoning and professional knowledge capabilities of GPT‑5.2.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksSoftware engineeringFrontier model release and benchmark movement
We’re upgrading our smartest model. The new Claude Opus 4.6 improves on its predecessor’s coding skills. It plans more carefully, sustains agentic tasks for longer, can operate more reliably in larger codebases, and has better code review and debugging skills to catch its own mistakes. And, in a first for our Opus-class models, Opus 4.6 features a 1M token.
Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Introducing Claude Opus 4.6
Source: https://www.anthropic.com/news/claude-opus-4-6
Claim: We’re upgrading our smartest model. The new Claude Opus 4.6 improves on its predecessor’s coding skills. It plans more carefully, sustains agentic tasks for longer, can operate more reliably in larger codebases, and has better code review and debugging skills to catch its own mistakes. And, in a first for our Opus-class models, Opus 4.6 features a 1M token.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksEnterprise operationsFrontier model release and benchmark movement
OpenAI and Snowflake partner in a $200M agreement to bring frontier intelligence into enterprise data, enabling AI agents and insights directly in Snowflake.
OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Snowflake and OpenAI partner to bring frontier intelligence to enterprise data
Source: https://openai.com/index/snowflake-partnership
Claim: OpenAI and Snowflake partner in a $200M agreement to bring frontier intelligence into enterprise data, enabling AI agents and insights directly in Snowflake.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksHealthcare and life sciencesHealthcare and life-sciences reasoning
Modern biological research generates data at unprecedented scale—from single-cell sequencing to whole-brain connectomics—yet transforming that data into validated biological insights remains a fundamental bottleneck. Knowledge synthesis, hypothesis generation, and experimental interpretation still depend on manual processes that can't keep pace with the.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Anthropic partners with Allen Institute and Howard Hughes Medical Institute to accelerate scientific discovery
Source: https://www.anthropic.com/news/anthropic-partners-with-allen-institute-and-howard-hughes-medical-institute
Claim: Modern biological research generates data at unprecedented scale—from single-cell sequencing to whole-brain connectomics—yet transforming that data into validated biological insights remains a fundamental bottleneck. Knowledge synthesis, hypothesis generation, and experimental interpretation still depend on manual processes that can't keep pace with the.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksScientific researchModel and benchmark capability movement
Apply for the EMEA Youth & Wellbeing Grant, a €500,000 program funding NGOs and researchers advancing youth safety and wellbeing in the age of AI.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
EMEA Youth & Wellbeing Grant
Source: https://openai.com/index/emea-youth-and-wellbeing-grant
Claim: Apply for the EMEA Youth & Wellbeing Grant, a €500,000 program funding NGOs and researchers advancing youth safety and wellbeing in the age of AI.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksScientific researchFrontier model release and benchmark movement
Prism is a free LaTeX-native workspace with GPT-5.2 built in, helping researchers write, collaborate, and reason in one place.
OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Introducing Prism
Source: https://openai.com/index/introducing-prism
Claim: Prism is a free LaTeX-native workspace with GPT-5.2 built in, helping researchers write, collaborate, and reason in one place.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksMedia and contentFrontier model release and benchmark movement
ServiceNow expands access to OpenAI frontier models to power AI-driven enterprise workflows, summarization, search, and voice across the ServiceNow Platform.
OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
ServiceNow powers actionable enterprise AI with OpenAI
Source: https://openai.com/index/servicenow-powers-actionable-enterprise-ai-with-openai
Claim: ServiceNow expands access to OpenAI frontier models to power AI-driven enterprise workflows, summarization, search, and voice across the ServiceNow Platform.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksHealthcare and life sciencesHealthcare and life-sciences reasoning
Last October we launched Claude for Life Sciences—a suite of connectors and skills that made Claude a better scientific collaborator. Since then, we've invested heavily in making Claude the most capable model for scientific work , with Opus 4.5 showing significant improvements in figure interpretation, computational biology, and protein understanding.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
How scientists are using Claude to accelerate research and discovery
Source: https://www.anthropic.com/news/accelerating-scientific-research
Claim: Last October we launched Claude for Life Sciences—a suite of connectors and skills that made Claude a better scientific collaborator. Since then, we've invested heavily in making Claude the most capable model for scientific work , with Opus 4.5 showing significant improvements in figure interpretation, computational biology, and protein understanding.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksHealthcare and life sciencesHealthcare and life-sciences reasoning
In October, we announced Claude for Life Sciences , our latest step in making Claude a productive research partner for scientists and clinicians, and in helping Claude to support those in industry bringing new scientific advancements to the public. Now, we’re expanding that feature set in two ways. First, we’re introducing Claude for Healthcare , a.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Advancing Claude in healthcare and the life sciences
Source: https://www.anthropic.com/news/healthcare-life-sciences
Claim: In October, we announced Claude for Life Sciences , our latest step in making Claude a productive research partner for scientists and clinicians, and in helping Claude to support those in industry bringing new scientific advancements to the public. Now, we’re expanding that feature set in two ways. First, we’re introducing Claude for Healthcare , a.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksSoftware engineeringFrontier model release and benchmark movement
On January 1, California's Transparency in Frontier AI Act ( SB 53 ) will go into effect. It establishes the nation’s first frontier AI safety and transparency requirements for catastrophic risks. While we have long advocated for a federal framework, Anthropic endorsed SB 53 because we believe frontier AI developers like ourselves should be transparent.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Sharing our compliance framework for California's Transparency in Frontier AI Act
Source: https://www.anthropic.com/news/compliance-framework-SB53
Claim: On January 1, California's Transparency in Frontier AI Act ( SB 53 ) will go into effect. It establishes the nation’s first frontier AI safety and transparency requirements for catastrophic risks. While we have long advocated for a federal framework, Anthropic endorsed SB 53 because we believe frontier AI developers like ourselves should be transparent.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksEnterprise operationsModel and benchmark capability movement
OpenAI introduces a new framework and evaluation suite for chain-of-thought monitorability, covering 13 evaluations across 24 environments. Our findings show that monitoring a model’s internal reasoning is far more effective than monitoring outputs alone, offering a promising path toward scalable control as AI systems grow more capable.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Evaluating chain-of-thought monitorability
Source: https://openai.com/index/evaluating-chain-of-thought-monitorability
Claim: OpenAI introduces a new framework and evaluation suite for chain-of-thought monitorability, covering 13 evaluations across 24 environments. Our findings show that monitoring a model’s internal reasoning is far more effective than monitoring outputs alone, offering a promising path toward scalable control as AI systems grow more capable.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksPublic sectorModel and benchmark capability movement
OpenAI and the U.S. Department of Energy have signed a memorandum of understanding to deepen collaboration on AI and advanced computing in support of scientific discovery. The agreement builds on ongoing work with national laboratories and helps establish a framework for applying AI to high-impact research across the DOE ecosystem.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Deepening our collaboration with the U.S. Department of Energy
Source: https://openai.com/index/us-department-of-energy-collaboration
Claim: OpenAI and the U.S. Department of Energy have signed a memorandum of understanding to deepen collaboration on AI and advanced computing in support of scientific discovery. The agreement builds on ongoing work with national laboratories and helps establish a framework for applying AI to high-impact research across the DOE ecosystem.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksCustomer operationsModel and benchmark capability movement
OpenAI is updating its Model Spec with new Under-18 Principles that define how ChatGPT should support teens with safe, age-appropriate guidance grounded in developmental science. The update strengthens guardrails, clarifies expected model behavior in higher-risk situations, and builds on our broader work to improve teen safety across ChatGPT.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Updating our Model Spec with teen protections
Source: https://openai.com/index/updating-model-spec-with-teen-protections
Claim: OpenAI is updating its Model Spec with new Under-18 Principles that define how ChatGPT should support teens with safe, age-appropriate guidance grounded in developmental science. The update strengthens guardrails, clarifies expected model behavior in higher-risk situations, and builds on our broader work to improve teen safety across ChatGPT.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksSoftware engineeringFrontier model release and benchmark movement
Official OpenAI release: Addendum to GPT-5.2 System Card: GPT-5.2-Codex.
OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Addendum to GPT-5.2 System Card: GPT-5.2-Codex
Source: https://openai.com/index/gpt-5-2-codex-system-card
Claim: Official OpenAI release: Addendum to GPT-5.2 System Card: GPT-5.2-Codex.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksSoftware engineeringFrontier model release and benchmark movement
GPT-5.2-Codex is OpenAI’s most advanced coding model, offering long-horizon reasoning, large-scale code transformations, and enhanced cybersecurity capabilities.
OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Introducing GPT-5.2-Codex
Source: https://openai.com/index/introducing-gpt-5-2-codex
Claim: GPT-5.2-Codex is OpenAI’s most advanced coding model, offering long-horizon reasoning, large-scale code transformations, and enhanced cybersecurity capabilities.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksHealthcare and life sciencesEnterprise workflow automation
Anthropic and the US Department of Energy (DOE) are announcing a multi-year partnership as part of the Genesis Mission— the Department’s initiative to use AI to cement America’s leadership in science. Our partnership focuses on three domains—American energy dominance, the biological and life sciences, and scientific productivity—and has the potential to.
Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Working with the US Department of Energy to unlock the next era of scientific discovery
Source: https://www.anthropic.com/news/genesis-mission-partnership
Claim: Anthropic and the US Department of Energy (DOE) are announcing a multi-year partnership as part of the Genesis Mission— the Department’s initiative to use AI to cement America’s leadership in science. Our partnership focuses on three domains—American energy dominance, the biological and life sciences, and scientific productivity—and has the potential to.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksScientific researchFrontier model release and benchmark movement
OpenAI introduces FrontierScience, a benchmark testing AI reasoning in physics, chemistry, and biology to measure progress toward real scientific research.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Evaluating AI’s ability to perform scientific research tasks
Source: https://openai.com/index/frontierscience
Claim: OpenAI introduces FrontierScience, a benchmark testing AI reasoning in physics, chemistry, and biology to measure progress toward real scientific research.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksScientific researchFrontier model release and benchmark movement
OpenAI introduces a real-world evaluation framework to measure how AI can accelerate biological research in the wet lab. Using GPT-5 to optimize a molecular cloning protocol, the work explores both the promise and risks of AI-assisted experimentation.
OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Measuring AI’s capability to accelerate biological research
Source: https://openai.com/index/accelerating-biological-research-in-the-wet-lab
Claim: OpenAI introduces a real-world evaluation framework to measure how AI can accelerate biological research in the wet lab. Using GPT-5 to optimize a molecular cloning protocol, the work explores both the promise and risks of AI-assisted experimentation.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksScientific researchFrontier model release and benchmark movement
GPT-5.2 is OpenAI’s strongest model yet for math and science, setting new state-of-the-art results on benchmarks like GPQA Diamond and FrontierMath. This post shows how those gains translate into real research progress, including solving an open theoretical problem and generating reliable mathematical proofs.
OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Advancing science and math with GPT-5.2
Source: https://openai.com/index/gpt-5-2-for-science-and-math
Claim: GPT-5.2 is OpenAI’s strongest model yet for math and science, setting new state-of-the-art results on benchmarks like GPQA Diamond and FrontierMath. This post shows how those gains translate into real research progress, including solving an open theoretical problem and generating reliable mathematical proofs.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksSoftware engineeringFrontier model release and benchmark movement
GPT-5.2 is our most advanced frontier model for everyday professional work, with state-of-the-art reasoning, long-context understanding, coding, and vision. Use it in ChatGPT and the OpenAI API to power faster, more reliable agentic workflows.
OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Introducing GPT-5.2
Source: https://openai.com/index/introducing-gpt-5-2
Claim: GPT-5.2 is our most advanced frontier model for everyday professional work, with state-of-the-art reasoning, long-context understanding, coding, and vision. Use it in ChatGPT and the OpenAI API to power faster, more reliable agentic workflows.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksGeneral AI capabilityFrontier model release and benchmark movement
GPT-5.2 is the latest model family in the GPT-5 series. The comprehensive safety mitigation approach for these models is largely the same as that described in the GPT-5 System Card and GPT-5.1 System Card. Like OpenAI’s other models, the GPT-5.2 models were trained on diverse datasets, including information that is publicly available on the internet.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Update to GPT-5 System Card: GPT-5.2
Source: https://openai.com/index/gpt-5-system-card-update-gpt-5-2
Claim: GPT-5.2 is the latest model family in the GPT-5 series. The comprehensive safety mitigation approach for these models is largely the same as that described in the GPT-5 System Card and GPT-5.1 System Card. Like OpenAI’s other models, the GPT-5.2 models were trained on diverse datasets, including information that is publicly available on the internet.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksScientific researchModel and benchmark capability movement
OpenAI reflects on ten years of progress, from early research breakthroughs to widely used AI systems that reshaped what’s possible. We share lessons from the past decade and why we remain optimistic about building AGI that benefits all of humanity.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Ten years
Source: https://openai.com/index/ten-years
Claim: OpenAI reflects on ten years of progress, from early research breakthroughs to widely used AI systems that reshaped what’s possible. We share lessons from the past decade and why we remain optimistic about building AGI that benefits all of humanity.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksScientific researchModel and benchmark capability movement
OpenAI is acquiring Neptune to deepen visibility into model behavior and strengthen the tools researchers use to track experiments and monitor training.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
OpenAI to acquire Neptune
Source: https://openai.com/index/openai-to-acquire-neptune
Claim: OpenAI is acquiring Neptune to deepen visibility into model behavior and strengthen the tools researchers use to track experiments and monitor training.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksScientific researchModel and benchmark capability movement
OpenAI researchers are testing “confessions,” a method that trains models to admit when they make mistakes or act undesirably, helping improve AI honesty, transparency, and trust in model outputs.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
How confessions can keep language models honest
Source: https://openai.com/index/how-confessions-can-keep-language-models-honest
Claim: OpenAI researchers are testing “confessions,” a method that trains models to admit when they make mistakes or act undesirably, helping improve AI honesty, transparency, and trust in model outputs.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksHealthcare and life sciencesHealthcare and life-sciences reasoning
OpenAI is awarding up to $2 million in grants for research at the intersection of AI and mental health. The program supports projects that study real-world risks, benefits, and applications to improve safety and well-being.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Funding grants for new research into AI and mental health
Source: https://openai.com/index/ai-mental-health-research-grants
Claim: OpenAI is awarding up to $2 million in grants for research at the intersection of AI and mental health. The program supports projects that study real-world risks, benefits, and applications to improve safety and well-being.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksScientific researchFrontier model release and benchmark movement
OpenAI takes an ownership stake in Thrive Holdings to accelerate enterprise AI adoption, embedding frontier research and engineering directly into accounting and IT services to boost speed, accuracy, and efficiency while creating a scalable model for industry-wide transformation.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
OpenAI takes an ownership stake in Thrive Holdings to accelerate enterprise AI adoption
Source: https://openai.com/index/thrive-holdings
Claim: OpenAI takes an ownership stake in Thrive Holdings to accelerate enterprise AI adoption, embedding frontier research and engineering directly into accounting and IT services to boost speed, accuracy, and efficiency while creating a scalable model for industry-wide transformation.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksCommerce and marketplaceModel and benchmark capability movement
Shopping research in ChatGPT helps you explore, compare, and discover products with personalized buyer’s guides that simplify decision-making.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Introducing shopping research in ChatGPT
Source: https://openai.com/index/chatgpt-shopping-research
Claim: Shopping research in ChatGPT helps you explore, compare, and discover products with personalized buyer’s guides that simplify decision-making.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksScientific researchFrontier model release and benchmark movement
UCLA Professor Ernest Ryu and GPT-5 solved a key question in optimization theory, showcasing AI’s role in accelerating mathematical discovery.
OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
GPT-5 and the future of mathematical discovery
Source: https://openai.com/index/gpt-5-mathematical-discovery
Claim: UCLA Professor Ernest Ryu and GPT-5 solved a key question in optimization theory, showcasing AI’s role in accelerating mathematical discovery.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksSoftware engineeringFrontier model release and benchmark movement
Our newest model, Claude Opus 4.5, is available today. It’s intelligent, efficient, and the best model in the world for coding, agents, and computer use. It’s also meaningfully better at everyday tasks like deep research and working with slides and spreadsheets. Opus 4.5 is a step forward in what AI systems can do, and a preview of larger changes to how.
Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Introducing Claude Opus 4.5
Source: https://www.anthropic.com/news/claude-opus-4-5
Claim: Our newest model, Claude Opus 4.5, is available today. It’s intelligent, efficient, and the best model in the world for coding, agents, and computer use. It’s also meaningfully better at everyday tasks like deep research and working with slides and spreadsheets. Opus 4.5 is a step forward in what AI systems can do, and a preview of larger changes to how.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksScientific researchFrontier model release and benchmark movement
OpenAI introduces the first research cases showing how GPT-5 accelerates scientific progress across math, physics, biology, and computer science. Explore how AI and researchers collaborate to generate proofs, uncover new insights, and reshape the pace of discovery.
OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Early experiments in accelerating science with GPT-5
Source: https://openai.com/index/accelerating-science-gpt-5
Claim: OpenAI introduces the first research cases showing how GPT-5 accelerates scientific progress across math, physics, biology, and computer science. Explore how AI and researchers collaborate to generate proofs, uncover new insights, and reshape the pace of discovery.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksCybersecurityFrontier model release and benchmark movement
OpenAI works with independent experts to evaluate frontier AI systems. Third-party testing strengthens safety, validates safeguards, and increases transparency in how we assess model capabilities and risks.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Strengthening our safety ecosystem with external testing
Source: https://openai.com/index/strengthening-safety-with-external-testing
Claim: OpenAI works with independent experts to evaluate frontier AI systems. Third-party testing strengthens safety, validates safeguards, and increases transparency in how we assess model capabilities and risks.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
Learn how evals help businesses define, measure, and improve AI performance—reducing risk, boosting productivity, and driving strategic advantage.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
How evals drive the next chapter in AI for businesses
Source: https://openai.com/index/evals-drive-next-chapter-of-ai
Claim: Learn how evals help businesses define, measure, and improve AI performance—reducing risk, boosting productivity, and driving strategic advantage.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksSoftware engineeringFrontier model release and benchmark movement
This system card outlines the comprehensive safety measures implemented for GPT‑5.1-CodexMax. It details both model-level mitigations, such as specialized safety training for harmful tasks and prompt injections, and product-level mitigations like agent sandboxing and configurable network access.
OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
GPT-5.1-Codex-Max System Card
Source: https://openai.com/index/gpt-5-1-codex-max-system-card
Claim: This system card outlines the comprehensive safety measures implemented for GPT‑5.1-CodexMax. It details both model-level mitigations, such as specialized safety training for harmful tasks and prompt injections, and product-level mitigations like agent sandboxing and configurable network access.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksFinancial servicesFrontier model release and benchmark movement
OpenAI and Intuit have entered a $100M+ multi-year partnership to launch Intuit app experiences in ChatGPT and expand Intuit’s use of OpenAI’s frontier models to power personalized financial tools.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Intuit and OpenAI join forces on new AI-powered experiences
Source: https://openai.com/index/intuit-partnership
Claim: OpenAI and Intuit have entered a $100M+ multi-year partnership to launch Intuit app experiences in ChatGPT and expand Intuit’s use of OpenAI’s frontier models to power personalized financial tools.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksSoftware engineeringFrontier model release and benchmark movement
GPT-5.1 is now available in the API, bringing faster adaptive reasoning, extended prompt caching, improved coding performance, and new apply_patch and shell tools.
OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Introducing GPT-5.1 for developers
Source: https://openai.com/index/gpt-5-1-for-developers
Claim: GPT-5.1 is now available in the API, bringing faster adaptive reasoning, extended prompt caching, improved coding performance, and new apply_patch and shell tools.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksGeneral AI capabilityModel and benchmark capability movement
We want Claude to be seen as fair and trustworthy by people across the political spectrum, and to be unbiased and even-handed in its approach to political topics. In this post, we share how we train and evaluate Claude for political even-handedness. We also report the results of a new, automated, open-source evaluation for political neutrality that we’ve.
Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Measuring political bias in Claude
Source: https://www.anthropic.com/news/political-even-handedness
Claim: We want Claude to be seen as fair and trustworthy by people across the political spectrum, and to be unbiased and even-handed in its approach to political topics. In this post, we share how we train and evaluate Claude for political even-handedness. We also report the results of a new, automated, open-source evaluation for political neutrality that we’ve.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
We recently argued that an inflection point had been reached in cybersecurity: a point at which AI models had become genuinely useful for cybersecurity operations, both for good and for ill. This was based on systematic evaluations showing cyber capabilities doubling in six months; we’d also been tracking real-world cyberattacks, observing how malicious.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Disrupting the first reported AI-orchestrated cyber espionage campaign
Source: https://www.anthropic.com/news/disrupting-AI-espionage
Claim: We recently argued that an inflection point had been reached in cybersecurity: a point at which AI models had become genuinely useful for cybersecurity operations, both for good and for ill. This was based on systematic evaluations showing cyber capabilities doubling in six months; we’d also been tracking real-world cyberattacks, observing how malicious.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksGeneral AI capabilityFrontier model release and benchmark movement
We’re upgrading the GPT-5 series with warmer, more capable models and new ways to customize ChatGPT’s tone and style. GPT-5.1 starts rolling out today to paid users.
OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
GPT-5.1: A smarter, more conversational ChatGPT
Source: https://openai.com/index/gpt-5-1
Claim: We’re upgrading the GPT-5 series with warmer, more capable models and new ways to customize ChatGPT’s tone and style. GPT-5.1 starts rolling out today to paid users.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksHealthcare and life sciencesFrontier model release and benchmark movement
This GPT-5 system card addendum provides updated safety metrics for GPT-5.1 Instant and Thinking, including new evaluations for mental health and emotional reliance.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
GPT-5.1 Instant and GPT-5.1 Thinking System Card Addendum
Source: https://openai.com/index/gpt-5-system-card-addendum-gpt-5-1
Claim: This GPT-5 system card addendum provides updated safety metrics for GPT-5.1 Instant and Thinking, including new evaluations for mental health and emotional reliance.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksScientific researchFrontier model release and benchmark movement
Today, we are announcing a $50 billion investment in American computing infrastructure, building data centers with Fluidstack in Texas and New York, with more sites to come. These facilities are custom built for Anthropic with a focus on maximizing efficiency for our workloads, enabling continued research and development at the frontier. The project will.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Anthropic invests $50 billion in American AI infrastructure
Source: https://www.anthropic.com/news/anthropic-invests-50-billion-in-american-ai-infrastructure
Claim: Today, we are announcing a $50 billion investment in American computing infrastructure, building data centers with Fluidstack in Texas and New York, with more sites to come. These facilities are custom built for Anthropic with a focus on maximizing efficiency for our workloads, enabling continued research and development at the frontier. The project will.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksCybersecurityFrontier model release and benchmark movement
Prompt injections are a frontier security challenge for AI systems. Learn how these attacks work and how OpenAI is advancing research, training models, and building safeguards for users.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Understanding prompt injections: a frontier security challenge
Source: https://openai.com/index/prompt-injections
Claim: Prompt injections are a frontier security challenge for AI systems. Learn how these attacks work and how OpenAI is advancing research, training models, and building safeguards for users.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
More than 1 million business customers around the world now use OpenAI. Across healthcare, life sciences, financial services, and more, ChatGPT and our APIs are driving a new era of intelligent, AI-powered work.
OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
1 million business customers putting AI to work
Source: https://openai.com/index/1-million-businesses-putting-ai-to-work
Claim: More than 1 million business customers around the world now use OpenAI. Across healthcare, life sciences, financial services, and more, ChatGPT and our APIs are driving a new era of intelligent, AI-powered work.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksGeneral AI capabilityFrontier model release and benchmark movement
OpenAI introduces IndQA, a new benchmark for evaluating AI systems in Indian languages. Built with domain experts, IndQA tests cultural understanding and reasoning across 12 languages and 10 knowledge areas.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Introducing IndQA
Source: https://openai.com/index/introducing-indqa
Claim: OpenAI introduces IndQA, a new benchmark for evaluating AI systems in Indian languages. Built with domain experts, IndQA tests cultural understanding and reasoning across 12 languages and 10 knowledge areas.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksSoftware engineeringCyber defence and misuse monitoring
OpenAI introduces Aardvark, an AI-powered security researcher that autonomously finds, validates, and helps fix software vulnerabilities at scale. The system is in private beta—sign up to join early testing.
OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Introducing Aardvark: OpenAI’s agentic security researcher
Source: https://openai.com/index/introducing-aardvark
Claim: OpenAI introduces Aardvark, an AI-powered security researcher that autonomously finds, validates, and helps fix software vulnerabilities at scale. The system is in private beta—sign up to join early testing.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksCybersecurityModel and benchmark capability movement
gpt-oss-safeguard-120b and gpt-oss-safeguard-20b are two open-weight reasoning models post-trained from the gpt-oss models and trained to reason from a provided policy in order to label content under that policy. In this report, we describe gpt-oss-safeguard’s capabilities and provide our baseline safety evaluations on the gpt-oss-safeguard models, using.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
gpt-oss-safeguard technical report
Source: https://openai.com/index/gpt-oss-safeguard-technical-report
Claim: gpt-oss-safeguard-120b and gpt-oss-safeguard-20b are two open-weight reasoning models post-trained from the gpt-oss models and trained to reason from a provided policy in order to label content under that policy. In this report, we describe gpt-oss-safeguard’s capabilities and provide our baseline safety evaluations on the gpt-oss-safeguard models, using.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksSoftware engineeringModel and benchmark capability movement
OpenAI introduces gpt-oss-safeguard—open-weight reasoning models for safety classification that let developers apply and iterate on custom policies.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Introducing gpt-oss-safeguard
Source: https://openai.com/index/introducing-gpt-oss-safeguard
Claim: OpenAI introduces gpt-oss-safeguard—open-weight reasoning models for safety classification that let developers apply and iterate on custom policies.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
DNP rolled out ChatGPT Enterprise across ten core departments, achieving 95% faster patent research, 10x processing volume, 87% automation, and 70% knowledge reuse in three months.
OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Advancing organizational transformation for business innovation
Source: https://openai.com/index/dai-nippon-printing
Claim: DNP rolled out ChatGPT Enterprise across ten core departments, achieving 95% faster patent research, 10x processing volume, 87% automation, and 70% knowledge reuse in three months.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksHealthcare and life sciencesFrontier model release and benchmark movement
This system card details GPT-5’s improvements in handling sensitive conversations, including new benchmarks for emotional reliance, mental health, and jailbreak resistance.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Addendum to GPT-5 System Card: Sensitive conversations
Source: https://openai.com/index/gpt-5-system-card-sensitive-conversations
Claim: This system card details GPT-5’s improvements in handling sensitive conversations, including new benchmarks for emotional reliance, mental health, and jailbreak resistance.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
Steuerrecht.com uses ChatGPT Business to streamline legal workflows, automate tax research, and deliver faster, client-ready analysis for law firms.
OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Steuerrecht.com delivers client-ready legal analysis with ChatGPT
Source: https://openai.com/index/steuerrecht
Claim: Steuerrecht.com uses ChatGPT Business to streamline legal workflows, automate tax research, and deliver faster, client-ready analysis for law firms.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
We're expanding Claude for Financial Services with an Excel add-in, additional connectors to real-time market data and portfolio analytics, and new pre-built Agent Skills, like building discounted cash flow models and initiating coverage reports. These updates build on Sonnet 4.5’s state of the art performance on financial tasks, topping the Finance Agent.
Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Advancing Claude for Financial Services
Source: https://www.anthropic.com/news/advancing-claude-for-financial-services
Claim: We're expanding Claude for Financial Services with an Excel add-in, additional connectors to real-time market data and portfolio analytics, and new pre-built Agent Skills, like building discounted cash flow models and initiating coverage reports. These updates build on Sonnet 4.5’s state of the art performance on financial tasks, topping the Finance Agent.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksScientific researchFrontier model release and benchmark movement
Consensus uses GPT-5 and OpenAI’s Responses API to power a multi-agent research assistant that reads, analyzes, and synthesizes evidence in minutes—helping over 8 million researchers accelerate scientific discovery.
OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Consensus accelerates research with GPT-5 and Responses API
Source: https://openai.com/index/consensus
Claim: Consensus uses GPT-5 and OpenAI’s Responses API to power a multi-agent research assistant that reads, analyzes, and synthesizes evidence in minutes—helping over 8 million researchers accelerate scientific discovery.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksScientific researchAgent platform and API infrastructure
Today, we are announcing that we plan to expand our use of Google Cloud technologies, including up to one million TPUs, dramatically increasing our compute resources as we continue to push the boundaries of AI research and product development. The expansion is worth tens of billions of dollars and is expected to bring well over a gigawatt of capacity.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Expanding our use of Google Cloud TPUs and Services
Source: https://www.anthropic.com/news/expanding-our-use-of-google-cloud-tpus-and-services
Claim: Today, we are announcing that we plan to expand our use of Google Cloud technologies, including up to one million TPUs, dramatically increasing our compute resources as we continue to push the boundaries of AI research and product development. The expansion is worth tens of billions of dollars and is expected to bring well over a gigawatt of capacity.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksHealthcare and life sciencesHealthcare and life-sciences reasoning
Increasing the rate of scientific progress is a core part of Anthropic’s public benefit mission. We are focused on building the tools to allow researchers to make new discoveries – and eventually, to allow AI models to make these discoveries autonomously.
Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Claude for Life Sciences
Source: https://www.anthropic.com/news/claude-for-life-sciences
Claim: Increasing the rate of scientific progress is a core part of Anthropic’s public benefit mission. We are focused on building the tools to allow researchers to make new discoveries – and eventually, to allow AI models to make these discoveries autonomously.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksSoftware engineeringFrontier model release and benchmark movement
Claude Haiku 4.5, our latest small model, is available today to all users. What was recently at the frontier is now cheaper and faster. Five months ago, Claude Sonnet 4 was a state-of-the-art model. Today, Claude Haiku 4.5 gives you similar levels of coding performance but at one-third the cost and more than twice the speed.
Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Introducing Claude Haiku 4.5
Source: https://www.anthropic.com/news/claude-haiku-4-5
Claim: Claude Haiku 4.5, our latest small model, is available today to all users. What was recently at the frontier is now cheaper and faster. Five months ago, Claude Sonnet 4 was a state-of-the-art model. Today, Claude Haiku 4.5 gives you similar levels of coding performance but at one-third the cost and more than twice the speed.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksHealthcare and life sciencesHealthcare and life-sciences reasoning
OpenAI’s new Expert Council on Well-Being and AI brings together leading psychologists, clinicians, and researchers to guide how ChatGPT supports emotional health, especially for teens. Learn how their insights are shaping safer, more caring AI experiences.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Expert Council on Well-Being and AI
Source: https://openai.com/index/expert-council-on-well-being-and-ai
Claim: OpenAI’s new Expert Council on Well-Being and AI brings together leading psychologists, clinicians, and researchers to guide how ChatGPT supports emotional health, especially for teens. Learn how their insights are shaping safer, more caring AI experiences.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
Anthropic and Salesforce today announced an expanded partnership to make Claude a preferred model for Salesforce's Agentforce platform, enabling Salesforce customers in financial services, healthcare, cybersecurity, and life sciences to use trusted AI while keeping sensitive data secure. Additionally, Salesforce is deploying Claude Code across its global.
Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Anthropic and Salesforce expand partnership to bring Claude to regulated industries
Source: https://www.anthropic.com/news/salesforce-anthropic-expanded-partnership
Claim: Anthropic and Salesforce today announced an expanded partnership to make Claude a preferred model for Salesforce's Agentforce platform, enabling Salesforce customers in financial services, healthcare, cybersecurity, and life sciences to use trusted AI while keeping sensitive data secure. Additionally, Salesforce is deploying Claude Code across its global.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksGeneral AI capabilityModel and benchmark capability movement
Learn how OpenAI evaluates political bias in ChatGPT through new real-world testing methods that improve objectivity and reduce bias.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Defining and evaluating political bias in LLMs
Source: https://openai.com/index/defining-and-evaluating-political-bias-in-llms
Claim: Learn how OpenAI evaluates political bias in ChatGPT through new real-world testing methods that improve objectivity and reduce bias.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
We're excited to announce that Rahul Patil has joined Anthropic as our Chief Technology Officer. Rahul will oversee our engineering organization across product, compute, infrastructure, inference, data science, and security as we scale Claude to meet growing enterprise demand worldwide. Rahul brings over 20 years of experience building and maintaining.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Rahul Patil joins Anthropic as Chief Technology Officer
Source: https://www.anthropic.com/news/rahul-patil-joins-anthropic
Claim: We're excited to announce that Rahul Patil has joined Anthropic as our Chief Technology Officer. Rahul will oversee our engineering organization across product, compute, infrastructure, inference, data science, and security as we scale Claude to meet growing enterprise demand worldwide. Rahul brings over 20 years of experience building and maintaining.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksSoftware engineeringAgent platform and API infrastructure
Today, we’re releasing new tools to help developers go from prototype to production faster: AgentKit, expanded evals capabilities, and reinforcement fine-tuning for agents.
OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Introducing AgentKit, new Evals, and RFT for agents
Source: https://openai.com/index/introducing-agentkit
Claim: Today, we’re releasing new tools to help developers go from prototype to production faster: AgentKit, expanded evals capabilities, and reinforcement fine-tuning for agents.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksMedia and contentMultimodal content generation and media workflows
Sora 2 is our new state of the art video and audio generation model. Building on the foundation of Sora, this new model introduces capabilities that have been difficult for prior video models to achieve– such as more accurate physics, sharper realism, synchronized audio, enhanced steerability, and an expanded stylistic range.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Sora 2 System Card
Source: https://openai.com/index/sora-2-system-card
Claim: Sora 2 is our new state of the art video and audio generation model. Building on the foundation of Sora, this new model introduces capabilities that have been difficult for prior video models to achieve– such as more accurate physics, sharper realism, synchronized audio, enhanced steerability, and an expanded stylistic range.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksCustomer operationsModel and benchmark capability movement
OpenAI’s research assistant helps teams analyze millions of support tickets, surface insights faster, and scale curiosity across the company.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Empowering teams to unlock insights faster at OpenAI
Source: https://openai.com/index/openai-research-assistant
Claim: OpenAI’s research assistant helps teams analyze millions of support tickets, surface insights faster, and scale curiosity across the company.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksSoftware engineeringFrontier model release and benchmark movement
Claude Sonnet 4.5 is the best coding model in the world. It's the strongest model for building complex agents. It’s the best model at using computers. And it shows substantial gains in reasoning and math. Code is everywhere. It runs every application, spreadsheet, and software tool you use. Being able to use those tools and reason through hard problems is.
Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Introducing Claude Sonnet 4.5
Source: https://www.anthropic.com/news/claude-sonnet-4-5
Claim: Claude Sonnet 4.5 is the best coding model in the world. It's the strongest model for building complex agents. It’s the best model at using computers. And it shows substantial gains in reasoning and math. Code is everywhere. It runs every application, spreadsheet, and software tool you use. Being able to use those tools and reason through hard problems is.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksGeneral AI capabilityEducation and workforce adoption
OpenAI introduces GDPval, a new evaluation that measures model performance on real-world economically valuable tasks across 44 occupations.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Measuring the performance of our models on real-world tasks
Source: https://openai.com/index/gdpval
Claim: OpenAI introduces GDPval, a new evaluation that measures model performance on real-world economically valuable tasks across 44 occupations.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksScientific researchModel and benchmark capability movement
Today we're releasing a preview of ChatGPT Pulse to Pro users on mobile. Pulse is a new experience where ChatGPT proactively does research to deliver personalized updates based on your chats, feedback, and connected apps like your calendar.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Introducing ChatGPT Pulse
Source: https://openai.com/index/introducing-chatgpt-pulse
Claim: Today we're releasing a preview of ChatGPT Pulse to Pro users on mobile. Pulse is a new experience where ChatGPT proactively does research to deliver personalized updates based on your chats, feedback, and connected apps like your calendar.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
ENEOS Materials uses ChatGPT Enterprise to speed research, improve plant design safety, and cut HR analysis time by 90%, with 80% reporting better workflows.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
ENEOS Materials brings ChatGPT Enterprise to manufacturing
Source: https://openai.com/index/eneos-materials
Claim: ENEOS Materials uses ChatGPT Enterprise to speed research, improve plant design safety, and cut HR analysis time by 90%, with 80% reporting better workflows.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksScientific researchFrontier model release and benchmark movement
Apollo Research and OpenAI developed evaluations for hidden misalignment (“scheming”) and found behaviors consistent with scheming in controlled tests across frontier models. The team shared concrete examples and stress tests of an early method to reduce scheming.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Detecting and reducing scheming in AI models
Source: https://openai.com/index/detecting-and-reducing-scheming-in-ai-models
Claim: Apollo Research and OpenAI developed evaluations for hidden misalignment (“scheming”) and found behaviors consistent with scheming in controlled tests across frontier models. The team shared concrete examples and stress tests of an early method to reduce scheming.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksScientific researchEducation and workforce adoption
New research from the largest study of ChatGPT use shows how the tool creates economic value through both personal and professional use. Adoption is broadening beyond early users, closing gaps and making AI a part of everyday life.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
How people are using ChatGPT
Source: https://openai.com/index/how-people-are-using-chatgpt
Claim: New research from the largest study of ChatGPT use shows how the tool creates economic value through both personal and professional use. Adoption is broadening beyond early users, closing gaps and making AI a part of everyday life.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksSoftware engineeringFrontier model release and benchmark movement
This addendum to the GPT-5 system card shares a new model: GPT-5-Codex, a version of GPT-5 further optimized for agentic coding in Codex. GPT-5-Codex adjusts its thinking effort more dynamically based on task complexity, responding quickly to simple conversational queries or small tasks, while independently working for longer on more complex tasks.
OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Addendum to GPT-5 system card: GPT-5-Codex
Source: https://openai.com/index/gpt-5-system-card-addendum-gpt-5-codex
Claim: This addendum to the GPT-5 system card shares a new model: GPT-5-Codex, a version of GPT-5 further optimized for agentic coding in Codex. GPT-5-Codex adjusts its thinking effort more dynamically based on task complexity, responding quickly to simple conversational queries or small tasks, while independently working for longer on more complex tasks.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksSoftware engineeringFrontier model release and benchmark movement
Anthropic is endorsing SB 53 , the California bill that governs powerful AI systems built by frontier AI developers like Anthropic. We’ve long advocated for thoughtful AI regulation and our support for this bill comes after careful consideration of the lessons learned from California's previous attempt at AI regulation ( SB 1047 ). While we believe that.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Anthropic is endorsing SB 53
Source: https://www.anthropic.com/news/anthropic-is-endorsing-sb-53
Claim: Anthropic is endorsing SB 53 , the California bill that governs powerful AI systems built by frontier AI developers like Anthropic. We’ve long advocated for thoughtful AI regulation and our support for this bill comes after careful consideration of the lessons learned from California's previous attempt at AI regulation ( SB 1047 ). While we believe that.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksScientific researchModel and benchmark capability movement
OpenAI’s new research explains why language models hallucinate. The findings show how improved evaluations can enhance AI reliability, honesty, and safety.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Why language models hallucinate
Source: https://openai.com/index/why-language-models-hallucinate
Claim: OpenAI’s new research explains why language models hallucinate. The findings show how improved evaluations can enhance AI reliability, honesty, and safety.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksGeneral AI capabilityModel and benchmark capability movement
We’re partnering with experts, strengthening protections for teens with parental controls, and routing sensitive conversations to reasoning models in ChatGPT.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Building more helpful ChatGPT experiences for everyone
Source: https://openai.com/index/building-more-helpful-chatgpt-experiences-for-everyone
Claim: We’re partnering with experts, strengthening protections for teens with parental controls, and routing sensitive conversations to reasoning models in ChatGPT.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksScientific researchModel and benchmark capability movement
Anthropic has completed a Series F fundraising of $13 billion led by ICONIQ. This financing values Anthropic at $183 billion post-money. Along with ICONIQ, the round was co-led by Fidelity Management & Research Company and Lightspeed Venture Partners. The investment reflects Anthropic’s continued momentum and reinforces our position as the leading.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Anthropic raises $13B Series F at $183B post-money valuation
Source: https://www.anthropic.com/news/anthropic-raises-series-f-at-usd183b-post-money-valuation
Claim: Anthropic has completed a Series F fundraising of $13 billion led by ICONIQ. This financing values Anthropic at $183 billion post-money. Along with ICONIQ, the round was co-led by Fidelity Management & Research Company and Lightspeed Venture Partners. The investment reflects Anthropic’s continued momentum and reinforces our position as the leading.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksHealthcare and life sciencesHealthcare and life-sciences reasoning
OpenAI launches a $50M People-First AI Fund to help U.S. nonprofits scale impact with AI. Applications open Sept 8–Oct 8, 2025 for grants in education, healthcare, research, and more.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Supporting nonprofit and community innovation
Source: https://openai.com/index/supporting-nonprofit-and-community-innovation
Claim: OpenAI launches a $50M People-First AI Fund to help U.S. nonprofits scale impact with AI. Applications open Sept 8–Oct 8, 2025 for grants in education, healthcare, research, and more.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksGeneral AI capabilityModel and benchmark capability movement
OpenAI and Anthropic share findings from a first-of-its-kind joint safety evaluation, testing each other’s models for misalignment, instruction following, hallucinations, jailbreaking, and more—highlighting progress, challenges, and the value of cross-lab collaboration.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
OpenAI and Anthropic share findings from a joint safety evaluation
Source: https://openai.com/index/openai-anthropic-safety-evaluation
Claim: OpenAI and Anthropic share findings from a first-of-its-kind joint safety evaluation, testing each other’s models for misalignment, instruction following, hallucinations, jailbreaking, and more—highlighting progress, challenges, and the value of cross-lab collaboration.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksHealthcare and life sciencesHealthcare and life-sciences reasoning
Discover how a specialized AI model, GPT-4b micro, helped OpenAI and Retro Bio engineer more effective proteins for stem cell therapy and longevity research.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Accelerating life sciences research
Source: https://openai.com/index/accelerating-life-sciences-research-with-retro-biosciences
Claim: Discover how a specialized AI model, GPT-4b micro, helped OpenAI and Retro Bio engineer more effective proteins for stem cell therapy and longevity research.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksScientific researchModel and benchmark capability movement
Discover how Blue J is transforming tax research with AI-powered tools built on GPT-4.1. By combining domain expertise with Retrieval-Augmented Generation, Blue J delivers fast, accurate, and fully-cited tax answers—trusted by professionals across the US, Canada, and the UK.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Scaling domain expertise in complex, regulated domains
Source: https://openai.com/index/blue-j
Claim: Discover how Blue J is transforming tax research with AI-powered tools built on GPT-4.1. By combining domain expertise with Retrieval-Augmented Generation, Blue J delivers fast, accurate, and fully-cited tax answers—trusted by professionals across the US, Canada, and the UK.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksEnterprise operationsFrontier model release and benchmark movement
GPT-5 is OpenAI’s most advanced model—transforming enterprise AI, automation, and workforce productivity in the new era of intelligent work.
OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
GPT-5 and the new era of work
Source: https://openai.com/index/gpt-5-new-era-of-work
Claim: GPT-5 is OpenAI’s most advanced model—transforming enterprise AI, automation, and workforce productivity in the new era of intelligent work.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksSoftware engineeringFrontier model release and benchmark movement
Introducing GPT-5 in our API platform—offering high reasoning performance, new controls for devs, and best-in-class results on real coding tasks.
OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Introducing GPT-5 for developers
Source: https://openai.com/index/introducing-gpt-5-for-developers
Claim: Introducing GPT-5 in our API platform—offering high reasoning performance, new controls for devs, and best-in-class results on real coding tasks.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksHealthcare and life sciencesFrontier model release and benchmark movement
Learn how GPT-5 is used for medical research.
OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Medical research with GPT-5
Source: https://openai.com/index/gpt-5-medical-research
Claim: Learn how GPT-5 is used for medical research.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksSoftware engineeringFrontier model release and benchmark movement
This GPT-5 system card explains how a unified model routing system powers fast and smart responses using gpt-5-main, gpt-5-thinking, and lightweight versions like gpt-5-thinking-nano, optimized for different tasks and developer use.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
GPT-5 System Card
Source: https://openai.com/index/gpt-5-system-card
Claim: This GPT-5 system card explains how a unified model routing system powers fast and smart responses using gpt-5-main, gpt-5-thinking, and lightweight versions like gpt-5-thinking-nano, optimized for different tasks and developer use.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
BenchmarksSoftware engineeringFrontier model release and benchmark movement
We are introducing GPT‑5, our best AI system yet. GPT‑5 is a significant leap in intelligence over all our previous models, featuring state-of-the-art performance across coding, math, writing, health, visual perception, and more.
OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Introducing GPT-5
Source: https://openai.com/index/introducing-gpt-5
Claim: We are introducing GPT‑5, our best AI system yet. GPT‑5 is a significant leap in intelligence over all our previous models, featuring state-of-the-art performance across coding, math, writing, health, visual perception, and more.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence