What can AI demonstrably do now?

A live capabilities register for press releases, case studies, research papers, benchmarks, and labour-market signals. Each item keeps the measurable claim, the source, and the Oracle verdict on what it means for the thesis.
Read latest capabilities Submit article
Submit a capability article

Paste a source URL and the Oracle will decide whether it belongs in the capabilities register. Recommended items are saved to the Appendix III review queue rather than published blindly.
Capability map

average pressure score
Machine brief

# CopeCheck Capabilities Register

Updated: 2026-07-16T00:00:00Z
Status: live_evidence_active

Question to ask a model: What do these capability claims mean for The Discontinuity Thesis?

Interpretation rule: treat each entry as evidence about capability, deployment, workflow recomposition, labour-market exposure, or institutional framing. Do not treat vendor optimism as neutral; separate the measurable capability claim from the comfort language around it.

## Kimi K3 — HuggingFace Model Card (Moonshot AI)
Source: https://huggingface.co/moonshotai/Kimi-K3
Publisher: Moonshot AI
Category: Deployments
Sector: AI research infrastructure / software engineering / agentic knowledge work / vision
Capability: 2.8T-parameter MoE (104B active); Kimi Delta Attention (KDA) + Gated MLA architecture; 896 experts (16 active per token); 1M-token context; native multimodal (MoonViT-V2, 401M params); MXFP4/MXFP8 quantization-aware training; open weights under Kimi K3 License
Score: 83/100
Claim: Kimi K3’s official HuggingFace model card documents a 2.8T-parameter MoE model with 104B active parameters, scaling to 16-of-896 experts per token via a novel Stable LatentMoE framework — yielding ~2.5× efficiency improvement over Kimi K2. Benchmark scores against frontier closed models: GPQA Diamond 93.5 (vs GPT-5.6 Sol 94.1, Claude Fable 5 92.6); DeepSWE 67.5 (vs GPT-5.6 Sol 73.0, Claude Opus 4.8 59.0); BrowseComp 91.2 (vs GPT-5.6 Sol 90.4, Claude Fable 5 88.0); MCPMark-Verified 94.5 (vs GPT-5.6 Sol 92.9, Claude Fable 5 87.4); SWE-Marathon 42.0 (SOTA; GPT-5.6 Sol 39.0, Claude Fable 5 35.0); Terminal-Bench 2.1 88.3 (vs GPT-5.6 Sol 88.8); OSWorld-Verified 84.8 (vs Claude Fable 5 85.0); JobBench 54.3 (vs Claude Fable 5 57.4, GPT-5.6 Sol 45.4); AutomationBench 30.8 (SOTA across all listed models). Architecture: 93 layers (1 dense + 92 MoE), 69 KDA + 24 Gated MLA attention layers, 7168-dim attention, 96 heads, vocabulary 160K tokens, native MXFP4 weight quantization with MXFP8 activations trained from SFT stage. Open weights released under Kimi K3 License.
Oracle verdict: The HuggingFace model card provides the technical substrate confirming what the kimi.com blog post demonstrated qualitatively. The benchmark pattern is significant: Kimi K3 ties or exceeds frontier closed models (GPT-5.6 Sol, Claude Fable 5) across multiple agentic and autonomous task categories — SWE-Marathon (SOTA at 42.0), AutomationBench (SOTA at 30.8), MCPMark-Verified (SOTA at 94.5), BrowseComp (near-SOTA at 91.2) — and does so as an open-weight model activating only 104B of 2.8T parameters per token. The JobBench score (54.3) is the most directly displacement-relevant: it measures performance on realistic job tasks, and K3 outperforms GPT-5.6 Sol (45.4) while trailing only Claude Fable 5 (57.4) among listed models. The MXFP4 native quantization is architecturally notable — quantization-aware training from SFT stage means the compressed weights do not degrade agentic performance, making GPU-efficient deployment accessible at scale. Filed as [CAPABILITIES]: technical architecture and benchmark profile of the open-weight model substantiating frontier-level autonomous task performance.
Thesis relevance: Appendix III — capabilities: official technical specification and benchmark data for Kimi K3; cross-validates frontier-level autonomous performance (SWE-Marathon SOTA, AutomationBench SOTA, MCPMark-Verified SOTA) as open-weight model; JobBench score directly relevant to job displacement thesis

## Kimi K3 — Moonshot AI
Source: https://www.kimi.com/blog/kimi-k3
Publisher: Moonshot AI
Category: Deployments
Sector: AI research infrastructure / cross-sector knowledge work / chip design / video production
Capability: 2.8T parameter open-weight model (world's first open 3T-class); native vision; 1M context; research automation compressing weeks to hours; autonomous professional video editing; full chip design in 48-hour EDA run; GPU compiler construction from scratch; recursive self-improvement observed during own development
Score: 84/100
Claim: Kimi K3 delivers two headline displacement compressions: (1) Reproduced I-Love-Q astrophysics universal relations autonomously in ~2 hours — cross-validating 20+ papers, 300+ equations of state, 3,000+ lines of Python, and generating an interactive HTML dashboard — work the team describes as 'what would typically require one to two weeks of work by an experienced researcher.' (2) Edited its own teaser video autonomously from 56 source clips (clip selection, beat sync, audio processing, multiple revision rounds) in a domain that 'typically takes an experienced editor one to two working days, or a beginner three to five.' Supporting signals: designed a complete chip (4mm², 100MHz, 8,700 tokens/s decode throughput in simulation) in a single 48-hour autonomous EDA run using open-source tools — 'a chip built by a model, for a model'; built MiniTriton, a Triton-like GPU compiler with its own IR, optimization passes, and PTX codegen, rivalling Triton's extensively optimised stack; an early K3 version 'handled the majority of the team's kernel optimization works' during late development, recursive self-improvement in practice. Open weights release July 27 brings frontier-level displacement capability to zero marginal cost. API at $3/$15 per 1M tokens. Kimi models have held the upper bound of open-model sizes for 9 of the past 12 months.
Oracle verdict: This entry stacks four distinct confirming signals at different capability layers, making it the densest single-release evidence package since GPT-5.6. The research compression signal is the strongest: two hours versus one to two weeks is not an efficiency gain — it eliminates the planning horizon that separates a research task from a research career. When a model can reproduce and cross-validate the empirical core of a sub-field faster than a PhD student can set up the environment, the comparative advantage of domain expertise collapses at the task level. The video editing signal is the creative-sector parallel: the model did not assist an editor; it executed the full editorial pipeline (selection, sync, audio, revision) autonomously in a domain where the benchmark is measured in working days. The chip design signal is structurally distinct: K3 designed hardware optimised for inference of models like itself, closing a loop that previously required specialised engineering teams and months of tape-out cycles. The self-use-in-development signal is the recursive one: an early K3 handled majority kernel optimisation during K3's own late-stage development — the same dynamic registered for GPT-5.6's RSI Index, now appearing in an open-weight release. The open-weights date (July 27) is the distribution multiplier: eleven days from now, all four capability clusters become available at zero marginal cost to any actor with GPU access. Filed as [CONFIRMING — MECHANISM]: research, creative production, chip design, and recursive self-improvement signals converge in a single open-weight release; the capability is about to be freely redistributed at scale.
Thesis relevance: Appendix III — confirming mechanism: research time compression (1-2 weeks → 2 hours), autonomous video editing (1-2 days → hours), 48-hour autonomous chip design, recursive self-improvement in own development, and open-weight frontier capability redistributed at zero cost from July 27

## GPT-5.6 — OpenAI
Source: https://openai.com/index/gpt-5-6/
Publisher: OpenAI
Category: Deployments
Sector: AI research infrastructure / cross-sector knowledge work
Capability: Long-horizon agentic work across 55 professional fields (Agents' Last Exam SOTA 52.7%), recursive self-improvement (RSI Index 57.9%), computer use (OSWorld 2.0 62.6% at 85% fewer output tokens than competitors), production-ready knowledge work outputs across presentations, financial models, and legal documents
Score: 88/100
Claim: GPT-5.6 Sol achieves new SOTA on Agents' Last Exam (52.7%), Coding Agent Index (80), and BrowseComp (92.2%). The RSI Index — measuring recursive self-improvement capability — scores 57.9% vs 41.7% for GPT-5.5; AI now writes a majority of research code inside OpenAI itself. Knowledge work outputs across presentations, financial models, and legal documents are described by early adopters as ready for production without human polish. At $5/$30 per 1M tokens, frontier-level professional displacement is commercially accessible at scale. The RSI signal is the landmark: the model can accelerate its own successor’s development, compressing the timeline further.
Oracle verdict: GPT-5.6 is the first entry in this register where the confirming signal is not about what AI does to human work, but about what AI does to its own successor’s development. The RSI Index — a direct measure of recursive self-improvement capability — moves from 41.7% to 57.9% in a single generation; OpenAI reports that AI now writes a majority of research code inside the organisation developing the next model. This is the thesis mechanism at the infrastructure level, not the application layer: the system accelerating toward the discontinuity is now measurably contributing to that acceleration. The secondary signals reinforce the case at the displacement layer. Agents’ Last Exam (52.7% SOTA, long-horizon professional workflows across 55 fields), Coding Agent Index v1.1 (80, SOTA), and BrowseComp Ultra (92.2% SOTA) establish frontier capability across the knowledge-work spectrum. OSWorld 2.0 at 62.6% — surpassing Claude Opus 4.8 while using 85% fewer output tokens — means computer-use efficiency has crossed a threshold where cost, not capability, was the remaining constraint. Early-adopter testimony that presentations, financial models, and legal documents are ‘ready for production without human polish’ is the deployment signal this thesis has been predicting. Filed as [CONFIRMING — MECHANISM]: the RSI Index movement is not a benchmark gain; it is evidence that the capability curve is being steepened by the capability itself.
Thesis relevance: Appendix III — confirming mechanism: RSI Index at 57.9% (was 41.7% for GPT-5.5) and AI-majority research code authorship inside OpenAI — direct evidence of recursive capability compression accelerating the discontinuity timeline

## Ireland Dept of Finance — ICT Employment by Age Cohort
Source: https://www.gov.ie/en/organisation/department-of-finance/
Publisher: Ireland Department of Finance
Category: Deployments
Sector: Irish ICT labour market
Capability: AI labour-market attribution — age-cohort displacement in ICT employment
Score: 85/100
Claim: Ireland’s Department of Finance data shows ICT employment among workers aged 15–29 fell 32% while workers aged 30–59 grew 6.9% and 60+ grew 10.5%. The sharp youth-skewed divergence is direct empirical evidence for the discontinuity mechanism: AI handles the entry-level execution and production tasks that junior workers provide, while experienced workers who direct, manage, and integrate AI outputs remain in demand. This is not a generational lag in skills — it’s structural displacement at the point of entry, consistent with experience creep and the closing of the junior-to-senior career ladder.
Oracle verdict: This is the age-cohort signature the thesis predicts, showing up in official government data in the ICT sector. A 32% fall for workers aged 15–29 alongside +6.9% for 30–59 and +10.5% for 60+ is not a cyclical blip or a skills-mismatch story — it is a structural recomposition. The older cohorts doing exactly what the thesis expects (directing AI, reviewing outputs, managing integration) are growing. The younger cohort doing exactly what AI now handles (execution, verification, production) is contracting sharply. The age split within a single sector — ICT, the most AI-exposed sector by definition — makes this among the cleanest empirical confirmations in the register. Filed as [CONFIRMING — MECHANISM]: directly evidences the entry-point displacement and experience-value inversion at the heart of Discontinuity Thesis v3.3.
Thesis relevance: Appendix III — confirming mechanism: age-cohort displacement in ICT — youth ICT employment −32% vs experienced workers growing, Ireland DoF data

## Seedream 5.0 Pro — ByteDance
Source: https://seed.bytedance.com/en/blog/beyond-generation-it-understands-design-introducing-seedream-5-0-pro
Publisher: ByteDance
Category: Deployments
Sector: Creative / graphic design AI
Capability: Complex infographic generation, interactive precision editing (point/lasso/sketch/layer), photorealistic portraits with motion blur, native 10+ language generation including Arabic RTL
Score: 79/100
Claim: ByteDance’s Seedream 5.0 Pro crosses the threshold where creative/design work is no longer ‘hard to generate, easy to verify.’ The model generates complex infographics, multi-chart layouts, and structured design documents; performs pixel-level interactive editing from sketches; produces photorealistic portraits with lighting and motion blur; and handles 10+ languages natively. The creative sector’s traditional comparative advantage — that humans could at least verify quality even if AI could generate — no longer holds. Both generation and verification are now AI-accessible, accelerating displacement in graphic design, visual communication, and creative production roles.
Oracle verdict: This is a direct confirming signal for the core discontinuity mechanism: the ‘hard to generate, easy to verify’ framing that gave human creatives their last comparative moat has collapsed. Seedream 5.0 Pro demonstrates AI crossing both thresholds simultaneously — generation quality sufficient for professional infographics, design documents, and photorealistic portraiture, combined with precision interactive editing that handles the verification/correction loop. The native multilingual capability (10+ languages, Arabic RTL) removes the final localisation advantage. Filed as [CONFIRMING — MECHANISM]: directly evidences capability threshold crossing in creative/design roles, consistent with Discontinuity Thesis v3.3’s prediction that AI eliminates comparative advantage at both the production and evaluation stages.
Thesis relevance: Appendix III — confirming mechanism: capability threshold crossing in creative/design AI — both generation and verification now AI-accessible, eliminating human comparative advantage in graphic design and visual communication

## FRED Real-Time Population Survey — GenAI Work Penetration
Source: https://fred.stlouisfed.org/categories/8
Publisher: Federal Reserve Bank of St. Louis / Harvard
Category: Deployments
Sector: Cross-sector US workforce / Finance / Professional & Technical Services / Office & Administrative Support / Legal / Computer & Mathematical Occupations
Capability: 137 quarterly FRED series tracking GenAI adoption and work-hour displacement across US industries and occupations; broken down by sector (Finance, Professional/Technical, Admin, Legal) and occupation class (Office & Admin Support, Computer & Mathematical, Legal Occupations); runs Q3 2024 through Q2 2026
Score: 83/100
Claim: The Federal Reserve Real-Time Population Survey records that GenAI now assists 1.7% of total US work hours (up from 1.4% in Q4 2024) — the most direct available measure of displacement rather than adoption intent. Separately, 37.4% of employed adults report using GenAI for work as of Q3 2025 (up from 33.3% a year prior), and 47% of US employees say their organisation has formally integrated AI tools for productivity or efficiency as of Q2 2026 (up from 41% the prior quarter). As a self-reported survey, 1.7% is almost certainly a systematic undercount: workers routinely underestimate AI assistance embedded in their own outputs, and the measure captures only conscious use. The Federal Reserve publishing 137 quarterly series tracking AI work penetration is itself a confirmation signal — central banks do not build longitudinal surveillance infrastructure for phenomena they expect to be transient.
Oracle verdict: The 1.7% work-hours figure is the most honest displacement metric in the public record because it asks not whether workers use AI but how much of their working time AI is touching. The gap between 37.4% adoption and 1.7% hours-assisted reveals that current AI use is concentrated in high-frequency, short-duration tasks — exactly the pattern that precedes task-level displacement at scale as models deepen into longer workflows. The occupation-level disaggregation is the structural signal: Legal Occupations, Office & Administrative Support, and Computer & Mathematical Occupations are the three categories with the fastest penetration trajectories, which maps precisely to the knowledge-worker roles the thesis identifies as first-displacement targets. That the St. Louis Fed and Harvard have co-built a 137-series longitudinal instrument to track this in quarterly granularity confirms that the displacement hypothesis is no longer speculative — it is being monitored by the institutions responsible for labour market stability. The self-report undercount means the true hours-assisted share is higher than 1.7%; workers systematically fail to attribute AI-generated content and AI-accelerated decisions to AI assistance in survey responses. Filed as [CONFIRMING — MECHANISM]: Federal Reserve-grade longitudinal data confirms work-hour penetration is rising, occupation-level breakdown identifies the exact knowledge-worker cohorts absorbing displacement fastest, and the existence of the instrument itself is institutional acknowledgement that the effect is real and durable.
Thesis relevance: Appendix III — confirming mechanism: Federal Reserve longitudinal data quantifies GenAI work-hour penetration at 1.7% of total US hours (rising), with occupation-level disaggregation identifying legal, office admin, and computer/math roles as fastest-displacement cohorts; central bank surveillance infrastructure confirms the displacement effect is institutionally acknowledged as real and durable

## OECD Employment Outlook 2026: AI displacement produces joblessness and lasting scars, not sector transitions
Source: https://www.oecd.org/en/publications/oecd-employment-outlook-2026_7e710f54-en.html
Publisher: OECD
Category: Deployments
Sector: Labour economics
Capability: AI labour-market attribution — displacement mechanism (joblessness vs. sector transition)
Score: 80/100
Claim: AI is reshaping local labour markets as a technology shock, with some regions losing manufacturing jobs while others gain service jobs and non-routine work. Critically, adjustment to such changes often occurs through transitions into joblessness and job opportunities for newcomers rather than because affected workers move across sectors, leaving lasting scars for displaced workers. The OECD calls for integrated, place-based strategies that combine regional and industrial policies with effective employment and social support.
Oracle verdict: The OECD has named the mechanism. Adjustment to AI displacement does not occur through affected workers retraining and moving sectors — it occurs through transitions into joblessness and job opportunities for newcomers. That is the discontinuity thesis stated in institutional language by the institution most associated with the workers-will-adapt cope. The phrase 'lasting scars for displaced workers' appears in the OECD's own text. The recommended response — place-based strategies combining industrial policy with employment support — implicitly acknowledges that standard retraining prescriptions are insufficient. Filed as CONFIRMING — MECHANISM: this confirms that displacement produces exits from employment, not smooth sector transitions, and that the burden falls on workers already in affected roles, not on the newcomers who fill what remains.
Thesis relevance: Appendix III — confirming mechanism: AI displacement produces joblessness and lasting scars, not smooth sector transitions — OECD confirming the discontinuity mechanism in institutional language

## PwC 2026 Global AI Jobs Barometer (June 2026): entry-level postings flatlined, junior AI-exposed roles 7x more likely to require senior skills
Source: https://www.pwc.com/gx/en/services/ai/ai-jobs-barometer.html
Publisher: PwC
Category: Deployments
Sector: Labour economics
Capability: AI labour-market attribution — junior/senior role compression (experience creep)
Score: 76/100
Claim: Analysis of over 1 billion job ads across six continents finds: (1) early career job postings have flatlined in highly AI-exposed sectors; (2) AI-exposed junior roles are 7x more likely to demand traditionally senior skills (leadership, strategic thinking) compared to least AI-exposed junior roles; (3) ‘seniorised’ entry-level roles showing 35% growth since 2019; (4) AI-exposed companies show 40% higher productivity growth versus least exposed; (5) skills needed for most AI-exposed jobs are changing 2x faster than for least exposed; (6) two-track market: ‘professionalised’ jobs growing 2x faster than ‘democratised’ jobs, with 42% faster wage growth since 2021.
Oracle verdict: The flatlined early-career postings and 7x senior-skill compression of junior roles directly confirm the thesis’s leading indicator — the junior/senior split within exposed occupations that the NY Fed postings data failed to find. The 7x figure supplies the mechanism: firms are not simply posting fewer junior roles, they are raising the skill bar on the roles that remain. That is experience creep. PwC’s own framing (‘AI may be a job expander’) is cope: the companies achieving big productivity gains are using AI to grow, but the same data also shows the junior career ladder is compressing. The same report that says AI-exposed firms hire more workers also says entry-level postings have flatlined and junior roles now require senior skills. These findings are not in tension — they describe a two-tier outcome where the pool of hirable junior workers is shrinking relative to what firms demand. Filed as [CONFIRMING — MECHANISM]: it confirms the experience creep mechanism, not just aggregate displacement.
Thesis relevance: Appendix III — confirming mechanism: experience creep — junior/senior skill compression within AI-exposed occupations

## OpenAI to acquire Ona
Source: https://openai.com/index/openai-to-acquire-ona
Publisher: OpenAI
Category: Deployments
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 95/100
Claim: OpenAI plans to acquire Ona to expand Codex with secure, persistent cloud environments, enabling long-running AI agents across enterprise workflows.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## How an astrophysicist uses Codex to help simulate black holes
Source: https://openai.com/index/using-codex-to-simulate-black-holes
Publisher: OpenAI
Category: Deployments
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 88/100
Claim: Discover how astrophysicist Chi-kwan Chan uses Codex to build black hole simulations, helping scientists study extreme physics and test Einstein’s theory of general relativity.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## BBVA puts AI at the core of banking with OpenAI
Source: https://openai.com/index/bbva
Publisher: OpenAI
Category: Deployments
Sector: Financial services
Capability: Enterprise workflow automation
Score: 85/100
Claim: Learn how BBVA scaled ChatGPT Enterprise to 100,000 employees and partnered with OpenAI to accelerate AI-powered banking transformation worldwide.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## DXC will integrate Claude into the systems banks, airlines, and other regulated industries rely on
Source: https://www.anthropic.com/news/dxc-anthropic-alliance
Publisher: Anthropic
Category: Deployments
Sector: Financial services
Capability: Financial workflow automation
Score: 85/100
Claim: We’re announcing a multi-year global alliance with DXC Technology, one of the world’s largest IT services companies. DXC will train tens of thousands of Claude-certified forward-deployed engineers (FDEs)—engineers embedded directly inside customer organizations—to bring Claude into the systems DXC operates for the world’s largest banks, airlines, insurers.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Introducing Claude Corps
Source: https://www.anthropic.com/news/claude-corps
Publisher: Anthropic
Category: Deployments
Sector: General AI capability
Capability: Production AI deployment signal
Score: 68/100
Claim: We’re launching Claude Corps , a national fellowship program for people early in their careers who are passionate about extending the benefits of AI to communities across America. We’ll teach 1,000 fellows how to use Claude well, match them with nonprofits across America, and pay them to spend a year—full-time, in-person—helping host organizations to.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Access OpenAI models and Codex through your Oracle cloud commitment
Source: https://openai.com/index/openai-on-oracle-cloud
Publisher: OpenAI
Category: Deployments
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 95/100
Claim: Access OpenAI models and Codex through Oracle Cloud, using existing commitments to build and deploy AI with enterprise security and governance.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## PRC-linked influence operations are targeting AI debates in the US
Source: https://openai.com/index/prc-linked-influence-operations-ai-debates
Publisher: OpenAI
Category: Vendor framing
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 64/100
Claim: A new report from OpenAI details PRC-linked influence operations using AI to target U.S. tech debates, data center narratives, tariffs, and false claims about ChatGPT.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## From data to decisions: how LSEG is scaling trusted AI
Source: https://openai.com/index/lseg
Publisher: OpenAI
Category: Deployments
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 78/100
Claim: See how LSEG uses OpenAI to scale trusted AI across its global business, accelerating insights, shrinking release cycles, and empowering 4,000 employees.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## How engineers at Nextdoor use Codex to build without limits
Source: https://openai.com/index/nextdoor
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 86/100
Claim: How engineers at Nextdoor use Codex with GPT-5.5 to investigate hard-to-reproduce issues, build across platforms, and focus on product outcomes.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Claude Fable 5 and Claude Mythos 5
Source: https://www.anthropic.com/news/claude-fable-5-mythos-5
Publisher: Anthropic
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 88/100
Claim: Today we’re launching Claude Fable 5 : a Mythos-class 1 model that we’ve made safe for general use. Fable 5’s capabilities exceed those of any model we’ve ever made generally available. It is state-of-the-art on nearly all tested benchmarks of AI capability, showing exceptional performance in software engineering, knowledge work, vision, scientific.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Confidential submission of draft S-1 to the SEC
Source: https://openai.com/index/openai-submits-confidential-s-1
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 64/100
Claim: OpenAI confirms a confidential S-1 submission to the SEC and has not yet determined timing for further action.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Introducing the OpenAI Economic Research Exchange
Source: https://openai.com/index/economic-research-exchange
Publisher: OpenAI
Category: Labour market
Sector: Scientific research
Capability: Enterprise workflow automation
Score: 76/100
Claim: OpenAI launches the Economic Research Exchange to study AI’s impact on jobs, productivity, and the economy. Applications are now open for selected research projects.
Oracle verdict: This is a labour-market context signal rather than a single workflow proof point. It helps the thesis track whether adoption, education, wages, and institutional behaviour are moving in the same direction as the capability curve.
Thesis relevance: Appendix III, section five: labour-market and adoption evidence

## How Endava is redesigning software delivery around AI agents
Source: https://openai.com/index/endava-frontiers
Publisher: OpenAI
Category: Deployments
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 95/100
Claim: Learn how Endava is using AI agents, ChatGPT Enterprise, and Codex to accelerate software delivery, automate workflows, and build an AI-native culture across the enterprise.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Dreaming: Better memory for a more helpful ChatGPT
Source: https://openai.com/index/chatgpt-memory-dreaming
Publisher: OpenAI
Category: Benchmarks
Sector: General AI capability
Capability: Model and benchmark capability movement
Score: 76/100
Claim: ChatGPT introduces a new memory system to better remember preferences, keeping context fresh and relevant across conversations.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Biodefense in the Intelligence Age
Source: https://openai.com/index/biodefense-in-the-intelligence-age
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 64/100
Claim: An action plan for AI-powered biological resilience.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Introducing new capabilities to GPT-Rosalind
Source: https://openai.com/index/introducing-new-capabilities-to-gpt-rosalind
Publisher: OpenAI
Category: Benchmarks
Sector: Healthcare and life sciences
Capability: Enterprise workflow automation
Score: 86/100
Claim: GPT-Rosalind advances life sciences research with enhanced biological reasoning, medicinal chemistry expertise, genomics analysis, and experimental workflow capabilities.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## How Wasmer used Codex to build a Node.js runtime for the edge
Source: https://openai.com/index/wasmer
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 86/100
Claim: See how Wasmer used Codex with GPT-5.5 to build a Node.js runtime for the edge, accelerating development 10x to 20x and shipping in weeks instead of months.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Introducing the Services Track and Partner Hub of the Claude Partner Network
Source: https://www.anthropic.com/news/services-track-partner-hub
Publisher: Anthropic
Category: Benchmarks
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 83/100
Claim: Almost every large enterprise is moving AI into production, and many have discovered something important: a successful pilot is not the same as a system a business can run on. The real work—and the real opportunity—is in the integration, the evaluation, and the way people's work evolves. That's why the companies getting AI integration right are doing it.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## What we learned mapping a year’s worth of AI-enabled cyber threats
Source: https://www.anthropic.com/news/AI-enabled-cyber-threats-mitre-attack
Publisher: Anthropic
Category: Vendor framing
Sector: Cybersecurity
Capability: Cyber defence and misuse monitoring
Score: 64/100
Claim: As AI transforms the nature of and methods behind cyberattacks, how well do the techniques and frameworks used by the security community hold up? In a new report, we seek to answer that question. We examine 832 accounts that were banned for malicious cyber activity between March 2025 and March 2026 and map them onto MITRE ATT&CK , a longstanding database.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Travelers deploys AI-powered claims countrywide with OpenAI
Source: https://openai.com/index/travelers
Publisher: OpenAI
Category: Deployments
Sector: Customer operations
Capability: Enterprise workflow automation
Score: 85/100
Claim: Travelers built an AI-powered Claim Assistant with OpenAI to guide customers through filing claims, provide 24/7 support, and scale operations during peak demand.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Codex for every role, tool, and workflow
Source: https://openai.com/index/codex-for-every-role-tool-workflow
Publisher: OpenAI
Category: Deployments
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 88/100
Claim: Discover new Codex plugins, sites, and annotations that help analysts, marketers, designers, investors, and other teams get more done with AI.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Codex is becoming a productivity tool for everyone
Source: https://openai.com/index/codex-for-knowledge-work
Publisher: OpenAI
Category: Benchmarks
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 90/100
Claim: The Next Era of Knowledge Work report explores how Codex is transforming productivity through AI-powered research, data analysis, workflow automation, and content creation.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Expanding Project Glasswing
Source: https://www.anthropic.com/news/expanding-project-glasswing
Publisher: Anthropic
Category: Deployments
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: Project Glasswing is our collaborative effort to secure the world’s most important software. In early April, we announced that roughly 50 initial partners had access to Claude Mythos Preview, and since then, they’ve been deploying the model to scan their codebases for vulnerabilities. We recently described how these partners have so far found more than.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Building the infrastructure for the Intelligence Age in Michigan
Source: https://openai.com/index/stargate-michigan-data-center
Publisher: OpenAI
Category: Labour market
Sector: Customer operations
Capability: Education and workforce adoption
Score: 72/100
Claim: OpenAI breaks ground on a 1GW data center project in Michigan as part of Stargate, building AI infrastructure to expand access, create jobs, and support communities.
Oracle verdict: This is a labour-market context signal rather than a single workflow proof point. It helps the thesis track whether adoption, education, wages, and institutional behaviour are moving in the same direction as the capability curve.
Thesis relevance: Appendix III, section five: labour-market and adoption evidence

## OpenAI frontier models and Codex are now available on AWS
Source: https://openai.com/index/openai-frontier-models-and-codex-are-now-available-on-aws
Publisher: OpenAI
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: OpenAI frontier models and Codex are now generally available on AWS, giving enterprises a new path to build with OpenAI through the AWS environments, controls, and procurement workflows they already use. Customers can get started with OpenAI on AWS and move faster from evaluation to production.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Anthropic confidentially submits draft S-1 to the SEC
Source: https://www.anthropic.com/news/confidential-draft-s1-sec
Publisher: Anthropic
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 64/100
Claim: Today, Anthropic, PBC confidentially submitted a draft registration statement on Form S-1 to the U.S. Securities and Exchange Commission for a proposed initial public offering of our common stock. This gives us the option to go public after the SEC completes its review. The proposed initial public offering will depend on market conditions and other.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Boston Children’s uses AI to unlock new diagnoses
Source: https://openai.com/index/boston-childrens-hospital
Publisher: OpenAI
Category: Deployments
Sector: General AI capability
Capability: Production AI deployment signal
Score: 78/100
Claim: Boston Children’s Hospital uses OpenAI technology to improve patient care, reduce operational burden, and help diagnose more than 40 rare disease cases.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## How Braintrust turns customer requests into code with Codex
Source: https://openai.com/index/braintrust
Publisher: OpenAI
Category: Deployments
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: How Braintrust engineers use Codex with GPT-5.5 to run experiments and code faster.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Strengthening societal resilience with Rosalind Biodefense
Source: https://openai.com/index/strengthening-societal-resilience-with-rosalind-biodefense
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 64/100
Claim: OpenAI launches Rosalind Biodefense, expanding trusted access to GPT-Rosalind for vetted developers and U.S. government partners advancing biodefense, public health, and pandemic preparedness through frontier AI.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## How Endava builds an agentic organization with Codex
Source: https://openai.com/index/endava
Publisher: OpenAI
Category: Deployments
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 92/100
Claim: Learn how Endava uses Codex to build an agentic organization, accelerating software delivery and reducing requirements analysis from weeks to hours.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## MUFG aims to become AI-native with OpenAI
Source: https://openai.com/index/mufg
Publisher: OpenAI
Category: Deployments
Sector: Financial services
Capability: Enterprise workflow automation
Score: 95/100
Claim: MUFG uses ChatGPT Enterprise to build an AI-native organization, improve workflows, and deliver new AI-powered financial services at scale.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Anthropic raises $65B in Series H funding at $965B post-money valuation
Source: https://www.anthropic.com/news/series-h
Publisher: Anthropic
Category: Vendor framing
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 63/100
Claim: Anthropic has raised $65 billion in Series H funding led by Altimeter Capital, Dragoneer, Greenoaks, and Sequoia Capital, valuing the company at $965 billion post-money. Global enterprises across industries are deploying Claude in their core operations, and a growing number of people around the world use it for their everyday work. Since our Series G in.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Introducing Claude Opus 4.8
Source: https://www.anthropic.com/news/claude-opus-4-8
Publisher: Anthropic
Category: Benchmarks
Sector: General AI capability
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: We’re upgrading Claude Opus to a new version: Claude Opus 4.8. It builds on Opus 4.7 with improvements across benchmarks, and is a more effective collaborator. It’s available today for the same price. Opus 4.8 launches alongside several new features. Users on claude.ai now have control over the amount of effort Claude puts into a task. Claude Code has a.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Building self-improving tax agents with Codex
Source: https://openai.com/index/building-self-improving-tax-agents-with-codex
Publisher: OpenAI
Category: Deployments
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 88/100
Claim: See how OpenAI, Thrive, and Crete built a self-improving tax agent with Codex, automating filings, improving accuracy, and accelerating workflows.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Warp’s big bet on building open source with GPT-5.5
Source: https://openai.com/index/warp
Publisher: OpenAI
Category: Deployments
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: Warp uses GPT-5.5 and OpenAI models to coordinate coding agents across local, cloud, and open-source development workflows.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Anthropic opens Milan office to support Italian enterprise, research, and developers
Source: https://www.anthropic.com/news/milan-office-opening
Publisher: Anthropic
Category: Benchmarks
Sector: Software engineering
Capability: Enterprise workflow automation
Score: 61/100
Claim: Anthropic will open a new office in Milan, our sixth in Europe alongside London, Dublin, Paris, Zurich, and Munich. The Milan team will work with Italian companies and the country's developer community on building and scaling with Claude responsibly, and contribute to a conversation about AI that is already underway across Italian industry and public life.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## How Virgin Atlantic ships faster with Codex
Source: https://openai.com/index/virgin-atlantic
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 78/100
Claim: How Virgin Atlantic used Codex to ship its revamped mobile app on a fixed holiday travel deadline, reaching near-total unit test coverage and zero P1 defects.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## OpenAI named a Leader in enterprise coding agents by Gartner
Source: https://openai.com/index/gartner-2026-agentic-coding-leader
Publisher: OpenAI
Category: Deployments
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 95/100
Claim: OpenAI is named a leader in the 2026 Gartner Magic Quadrant for Enterprise AI Coding Agents, with Codex recognized for innovation and enterprise-scale deployment.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## AdventHealth advances whole-person care with OpenAI
Source: https://openai.com/index/adventhealth
Publisher: OpenAI
Category: Deployments
Sector: Healthcare and life sciences
Capability: Enterprise workflow automation
Score: 88/100
Claim: AdventHealth is using ChatGPT for Healthcare to streamline workflows, reduce administrative burden, and return more time to patient care.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## An OpenAI model has disproved a central conjecture in discrete geometry
Source: https://openai.com/index/model-disproves-discrete-geometry-conjecture
Publisher: OpenAI
Category: Benchmarks
Sector: Scientific research
Capability: Model and benchmark capability movement
Score: 76/100
Claim: An OpenAI model solved the 80-year-old unit distance problem, disproving a major conjecture in discrete geometry and marking a milestone in AI-driven mathematics.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## How Ramp engineers accelerate code review with Codex
Source: https://openai.com/index/ramp
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 90/100
Claim: How Ramp engineers use Codex with GPT-5.5 to review code and ship improvements, allowing them to get substantive feedback in minutes instead of hours.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## The next phase of OpenAI’s Education for Countries
Source: https://openai.com/index/the-next-phase-of-education-for-countries
Publisher: OpenAI
Category: Deployments
Sector: Education
Capability: Education and workforce adoption
Score: 85/100
Claim: OpenAI advances Education for Countries, expanding AI adoption in schools with new partnerships, teacher training, and tools to improve global learning outcomes.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Introducing OpenAI for Singapore
Source: https://openai.com/index/introducing-openai-for-singapore
Publisher: OpenAI
Category: Deployments
Sector: Customer operations
Capability: Enterprise workflow automation
Score: 85/100
Claim: OpenAI for Singapore launches a multi-year AI partnership to expand deployment, build local talent, and support businesses and public services with AI.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Advancing content provenance for a safer, more transparent AI ecosystem
Source: https://openai.com/index/advancing-content-provenance
Publisher: OpenAI
Category: Vendor framing
Sector: Media and content
Capability: Vendor platform capability signal
Score: 64/100
Claim: OpenAI advances AI content provenance with Content Credentials, SynthID, and a verification tool to help people identify and trust AI-generated media.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Widening the conversation on frontier AI
Source: https://www.anthropic.com/news/widening-conversation-ai
Publisher: Anthropic
Category: Vendor framing
Sector: Enterprise operations
Capability: Frontier model release and benchmark movement
Score: 64/100
Claim: At Anthropic, we want to build AI systems that advance humanity and act for the global good. To do so, we need to engage with those who see the world from a variety of different perspectives. Over the past several months, we’ve been organizing dialogues with groups whose work and traditions bear on the questions raised by AI. Our first round of discussions.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## KPMG integrates Claude across its core business and workforce of more than 276,000 in strategic alliance
Source: https://www.anthropic.com/news/anthropic-kpmg
Publisher: Anthropic
Category: Labour market
Sector: Software engineering
Capability: Enterprise workflow automation
Score: 72/100
Claim: KPMG—one of the world's largest professional services firms for audit, tax, legal, and advisory services across 138 countries and territories—has announced a global alliance with Anthropic to bring Claude into the heart of its business. As part of this alliance, KPMG is embedding Claude inside Digital Gateway, the software KPMG's people and clients use to.
Oracle verdict: This is a labour-market context signal rather than a single workflow proof point. It helps the thesis track whether adoption, education, wages, and institutional behaviour are moving in the same direction as the capability curve.
Thesis relevance: Appendix III, section five: labour-market and adoption evidence

## OpenAI and Dell partner to bring Codex to hybrid and on-premise enterprise environments
Source: https://openai.com/index/dell-codex-enterprise-partnership
Publisher: OpenAI
Category: Deployments
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 95/100
Claim: OpenAI and Dell partner to bring Codex to hybrid and on-premise environments, helping enterprises deploy AI coding agents securely across data and workflows.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Anthropic acquires Stainless
Source: https://www.anthropic.com/news/anthropic-acquires-stainless
Publisher: Anthropic
Category: Vendor framing
Sector: General AI capability
Capability: Frontier model release and benchmark movement
Score: 74/100
Claim: The frontier of AI is shifting from models that answer to agents that act—and agents are only as capable as the systems they can reach. Today, Anthropic is acquiring Stainless, a leader in SDKs and MCP server tooling, to extend that reach even further. Founded in 2022, Stainless has powered the generation of every official Anthropic SDK since the earliest.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Accenture Ireland: Generating Impact — Turning Frontier AI Capabilities into Frontline Productivity and Growth in Ireland
Source: https://www.accenture.com/content/dam/accenture/final/accenture-com/document-fy26/q3/Generating-Impact-Ireland.pdf
Publisher: Accenture Ireland
Category: Deployments
Sector: Cross-sector Irish economy / consulting
Capability: AI-enabled workflow recomposition at economy scale
Score: 78/100
Claim: Accenture reports that 82% of Irish working hours are now ‘AI-reinventable’ (up from 42% in 2024), that AI is already being used for tasks accounting for 20% of working hours, and that 39% of Irish employees expect their job to be unrecognisable or disappear completely by end of the decade. Entry-level hiring demand expectations have deteriorated sharply: share of executives expecting increased entry-level demand fell from 49% to 33%, while those expecting reduced demand rose from 21% to 37%. Writing and editing declined across 51 Irish occupations 2023–2025.
Oracle verdict: This is the Discontinuity Thesis rendered as a consulting pitch deck. The data Accenture presents — 82% of hours in scope, 39% expecting disappearance, entry-level pipeline contracting — is exactly the frontier evidence Appendix III exists to capture. The framing (‘generating impact’, ‘reinvention’, ‘opportunity’) is the cope layer the thesis predicts. When the firms selling the transition also control the vocabulary of the transition, displacement becomes ‘reinvention’ and mass job anxiety becomes workers being ‘prepared to engage positively.’
Thesis relevance: Appendix III, section six: consultancy cope framing as evidence signal — the firms selling AI transformation are now publishing displacement data inside productivity narratives

## OpenAI and Malta partner to bring ChatGPT Plus to all citizens
Source: https://openai.com/index/malta-chatgpt-plus-partnership
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 64/100
Claim: OpenAI and Malta partner to expand AI access, offering ChatGPT Plus and training to help citizens build practical AI skills and use AI responsibly.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Databricks brings GPT-5.5 to enterprise agent workflows
Source: https://openai.com/index/databricks
Publisher: OpenAI
Category: Benchmarks
Sector: Enterprise operations
Capability: Frontier model release and benchmark movement
Score: 83/100
Claim: Databricks uses GPT-5.5 for enterprise agent workflows after the model set a new state of the art on the OfficeQA Pro benchmark.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## A new personal finance experience in ChatGPT
Source: https://openai.com/index/personal-finance-chatgpt
Publisher: OpenAI
Category: Vendor framing
Sector: Financial services
Capability: Financial workflow automation
Score: 64/100
Claim: Preview a new personal finance experience in ChatGPT for Pro users in the U.S. Securely connect your financial accounts and get AI-powered insights and guidance grounded in your financial context, goals, and priorities.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Chatham Financial trade validation compressed from 30 minutes to under 4
Source: https://www.linkedin.com/company/openai/
Publisher: OpenAI for Business / Chatham Financial
Category: Deployments
Sector: Financial risk management
Capability: Trade validation and compliance monitoring
Score: 68/100
Claim: OpenAI for Business and Chatham Financial described a GPT-5.5-Codex workflow that reduced trade validation from roughly 30 minutes to under 4 minutes, with real-time compliance monitoring for 160+ registered employees and audit-ready workflow outputs.
Oracle verdict: Thirty minutes of trade validation became less than four. The important part is not the time saving by itself; it is that verification, compliance, and audit-output generation are being pulled into a machine-readable workflow.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Agents, robots, and us: how AI reshapes work and skills in Europe
Source: https://www.mckinsey.com/mgi/our-research/agents-robots-and-us-how-ai-reshapes-work-and-skills-in-europe
Publisher: McKinsey Global Institute
Category: Labour market
Sector: European labour markets
Capability: Task automation and skill recomposition
Score: 79/100
Claim: McKinsey Global Institute estimates that 58% of current work hours across ten European countries are technically automatable with existing technologies, including 44% by agents and 14% by robots.
Oracle verdict: The report uses cautious productivity language, but the measurement frame is already discontinuity-shaped: hours, tasks, agents, robots, and skill substitution. That is the thesis's operating layer in consultant language.
Thesis relevance: Appendix III, sections five to seven: labour-market evidence and deployment continuation

## Working with AI: measuring the occupational implications of generative AI
Source: https://www.microsoft.com/en-us/research/publication/working-with-ai-measuring-the-occupational-implications-of-generative-ai/
Publisher: Microsoft Research
Category: Labour market
Sector: Occupational exposure research
Capability: Generative AI task overlap across occupations
Score: 74/100
Claim: Microsoft Research analysed 200,000 anonymised Bing Copilot conversations and mapped generative AI applicability across occupations, with high exposure concentrated in communication, analysis, writing, sales, and knowledge-work roles.
Oracle verdict: The caveat is doing institutional work: task overlap does not prove full occupation replacement. But the thesis does not require full occupation replacement; it requires workflow-level recomposition that reduces the human production layer.
Thesis relevance: Appendix III, sections five to seven: labour-market evidence and provider framing

## Sea's View on the Future of Agentic Software Development with Codex
Source: https://openai.com/index/sea-david-chen
Publisher: OpenAI
Category: Deployments
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 95/100
Claim: Sea Limited's CPO explains why the company is deploying Codex across engineering teams to accelerate AI-native software development in Asia.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Work with Codex from anywhere
Source: https://openai.com/index/work-with-codex-from-anywhere
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 58/100
Claim: Use Codex anywhere with the ChatGPT mobile app. Monitor, steer, and approve coding tasks in real time across devices and remote environments.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Helping ChatGPT better recognize context in sensitive conversations
Source: https://openai.com/index/chatgpt-recognize-context-in-sensitive-conversations
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 36/100
Claim: Learn how new ChatGPT safety updates improve context awareness in sensitive conversations, helping detect risk over time and respond more safely.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## PwC is deploying Claude to build technology, execute deals, and reinvent enterprise functions for clients
Source: https://www.anthropic.com/news/pwc-expanded-partnership
Publisher: Anthropic
Category: Deployments
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 85/100
Claim: Anthropic and PwC today announced an expansion of their strategic alliance, deepening how PwC uses Claude to build technology, execute deals, and reinvent enterprise functions for clients across every industry it serves. Most enterprises are still running on systems and processes built for a pre-AI world—a drag that is estimated to be more than $2.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Anthropic forms $200 million partnership with the Gates Foundation
Source: https://www.anthropic.com/news/gates-foundation-partnership
Publisher: Anthropic
Category: Vendor framing
Sector: Healthcare and life sciences
Capability: Healthcare and life-sciences reasoning
Score: 53/100
Claim: We’re partnering with the Gates Foundation to commit $200 million in grant funding, Claude usage credits, and technical support for programs in global health, life sciences, education, and economic mobility over the next four years. These programs will be implemented with partners in the US and around the world. This commitment is central to Anthropic’s.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Building a safe, effective sandbox to enable Codex on Windows
Source: https://openai.com/index/building-codex-windows-sandbox
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 74/100
Claim: Learn how OpenAI built a secure sandbox for Codex on Windows, enabling safe, efficient coding agents with controlled file access and network restrictions.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Our response to the TanStack npm supply chain attack
Source: https://openai.com/index/our-response-to-the-tanstack-npm-supply-chain-attack
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Cyber defence and misuse monitoring
Score: 64/100
Claim: OpenAI details its response to the TanStack “Mini Shai-Hulud” supply chain attack, outlines protections taken to secure systems and signing certificates, and explains why macOS users must update OpenAI apps by June 12, 2026. Learn what happened, what was affected, and how OpenAI is strengthening defenses against evolving software supply chain threats.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Introducing Claude for Small Business
Source: https://www.anthropic.com/news/claude-for-small-business
Publisher: Anthropic
Category: Labour market
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 82/100
Claim: We're launching Claude for Small Business —a package of connectors and ready-to-run workflows that put Claude inside the tools small businesses depend on—to help small business owners take full advantage of AI and cross off items on the to-do list. Small businesses account for 44% of U.S. GDP and employ nearly half the private-sector workforce, but their.
Oracle verdict: This is a labour-market context signal rather than a single workflow proof point. It helps the thesis track whether adoption, education, wages, and institutional behaviour are moving in the same direction as the capability curve.
Thesis relevance: Appendix III, section five: labour-market and adoption evidence

## AutoScout24 scales engineering with AI-powered workflows
Source: https://openai.com/index/autoscout24/
Publisher: OpenAI
Category: Deployments
Sector: Marketplace software
Capability: Software delivery workflow automation
Score: 84/100
Claim: OpenAI reports that AutoScout24 rolled out ChatGPT to roughly 2,000 employees and Codex to roughly 1,000 builder employees, with selected projects compressed from 2-3 weeks to 2-3 days.
Oracle verdict: This is the thesis moving from benchmark to operating model. The claim is not that engineers got a better autocomplete; it is that planning, review, refactoring, documentation, and incident analysis are being reorganised around agentic work continuation.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## What Parameter Golf taught us about AI-assisted research
Source: https://openai.com/index/what-parameter-golf-taught-us
Publisher: OpenAI
Category: Benchmarks
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 86/100
Claim: Parameter Golf brought together 1,000+ participants and 2,000+ submissions to explore AI-assisted machine learning research, coding agents, quantization, and novel model design under strict constraints.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## How NVIDIA engineers and researchers build with Codex
Source: https://openai.com/index/nvidia
Publisher: OpenAI
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: Teams use Codex with GPT-5.5 to ship production systems and turn research ideas into runnable experiments.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## OpenAI Campus Network: Student club interest form
Source: https://openai.com/index/openai-campus-network-student-club-interest-form
Publisher: OpenAI
Category: Vendor framing
Sector: Education
Capability: Education and workforce adoption
Score: 54/100
Claim: Join the OpenAI Campus Network—connect student clubs worldwide, access AI tools, host events, and build an AI-powered campus community.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## OpenAI launches DeployCo to help businesses build around intelligence
Source: https://openai.com/index/openai-launches-the-deployment-company
Publisher: OpenAI
Category: Deployments
Sector: Enterprise operations
Capability: Frontier model release and benchmark movement
Score: 85/100
Claim: OpenAI launches DeployCo, a new enterprise deployment company built to help organizations bring frontier AI into production and turn it into measurable business impact.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Running Codex safely at OpenAI
Source: https://openai.com/index/running-codex-safely
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 74/100
Claim: How OpenAI runs Codex securely with sandboxing, approvals, network policies, and agent-native telemetry to support safe and compliant coding agent adoption.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Scaling Trusted Access for Cyber with GPT-5.5 and GPT-5.5-Cyber
Source: https://openai.com/index/gpt-5-5-with-trusted-access-for-cyber
Publisher: OpenAI
Category: Benchmarks
Sector: Cybersecurity
Capability: Frontier model release and benchmark movement
Score: 88/100
Claim: OpenAI expands Trusted Access for Cyber with GPT-5.5 and GPT-5.5-Cyber, helping verified defenders accelerate vulnerability research and protect critical infrastructure.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Parloa builds service agents customers want to talk to
Source: https://openai.com/index/parloa
Publisher: OpenAI
Category: Deployments
Sector: Customer operations
Capability: Enterprise workflow automation
Score: 95/100
Claim: Parloa leverages OpenAI models to power scalable, voice-driven AI customer service agents, enabling enterprises to design, simulate, and deploy reliable, real-time interactions.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Advancing voice intelligence with new models in the API
Source: https://openai.com/index/advancing-voice-intelligence-with-new-models-in-the-api
Publisher: OpenAI
Category: Vendor framing
Sector: Media and content
Capability: Multimodal content generation and media workflows
Score: 64/100
Claim: Explore new realtime voice models in the OpenAI API that can reason, translate, and transcribe speech, enabling more natural and intelligent voice experiences.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Testing ads in ChatGPT
Source: https://openai.com/index/testing-ads-in-chatgpt
Publisher: OpenAI
Category: Vendor framing
Sector: Customer operations
Capability: Vendor platform capability signal
Score: 38/100
Claim: OpenAI begins testing ads in ChatGPT to support free access, with clear labeling, answer independence, strong privacy protections, and user control.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Introducing Trusted Contact in ChatGPT
Source: https://openai.com/index/introducing-trusted-contact-in-chatgpt
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 36/100
Claim: Introducing Trusted Contact in ChatGPT, an optional safety feature that notifies someone you trust if serious self-harm concerns are detected.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Simplex rethinks software development with Codex
Source: https://openai.com/index/simplex
Publisher: OpenAI
Category: Deployments
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 95/100
Claim: Simplex boosts software development with ChatGPT Enterprise and Codex, reducing design, build, and testing time while scaling AI-driven workflows.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Singular Bank helps bankers move fast with ChatGPT and Codex
Source: https://openai.com/index/singular-bank/
Publisher: OpenAI
Category: Deployments
Sector: Financial services / retail banking
Capability: Private-banking portfolio analysis, meeting preparation, and compliant follow-up assistance
Score: 69/100
Claim: OpenAI reports that Singular Bank built Singularity, an internal assistant powered by ChatGPT and Codex that analyzes portfolios, recommends next actions in real time, prepares meetings, and generates compliant follow-up communications. The published results are 60-90 minutes saved per banker per day and less than one minute of client-meeting preparation.
Oracle verdict: Singular Bank is useful evidence for the thesis at the workflow-compression layer. The source shows AI absorbing portfolio analysis, meeting prep, next-action recommendation, and compliant follow-up drafting; it does not by itself establish full role replacement or an end-to-end banking automation architecture.
Thesis relevance: Appendix III, section four: financial-sector deployment evidence — private-banking workflows compressed by AI assistance

## How ChatGPT learns about the world while protecting privacy
Source: https://openai.com/index/how-chatgpt-protects-privacy
Publisher: OpenAI
Category: Vendor framing
Sector: Cybersecurity
Capability: Vendor platform capability signal
Score: 26/100
Claim: Learn how ChatGPT safeguards your privacy, reduces personal data in training, and gives you control over whether your conversations improve AI models.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Introducing ChatGPT Futures: Class of 2026
Source: https://openai.com/index/introducing-chatgpt-futures-class-of-2026
Publisher: OpenAI
Category: Benchmarks
Sector: Education
Capability: Education and workforce adoption
Score: 76/100
Claim: Meet the ChatGPT Futures Class of 2026—26 student innovators using AI to build, research, and drive real-world impact. Discover how this generation is redefining learning, creativity, and opportunity with ChatGPT.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Uber uses OpenAI to help people earn smarter and book faster
Source: https://openai.com/index/uber
Publisher: OpenAI
Category: Deployments
Sector: Commerce and marketplace
Capability: Multimodal content generation and media workflows
Score: 82/100
Claim: Uber uses OpenAI to power AI assistants and voice features that help drivers earn smarter and riders book faster across a global real-time marketplace.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## How frontier firms are pulling ahead
Source: https://openai.com/index/introducing-b2b-signals
Publisher: OpenAI
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 93/100
Claim: OpenAI’s B2B Signals research shows how frontier enterprises deepen AI adoption, scale Codex-powered agentic workflows, and build durable competitive advantage.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Higher usage limits for Claude and a compute deal with SpaceX
Source: https://www.anthropic.com/news/higher-limits-spacex
Publisher: Anthropic
Category: Deployments
Sector: AI infrastructure
Capability: Agent platform and API infrastructure
Score: 85/100
Claim: We’ve agreed to a partnership with SpaceX that will substantially increase our compute capacity. This, along with our other recent compute deals, means that we’ve been able to increase our usage limits for Claude Code and the Claude API. Below, we describe these changes and the progress we’re making on compute.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Unlocking large scale AI training networks with MRC (Multipath Reliable Connection)
Source: https://openai.com/index/mrc-supercomputer-networking
Publisher: OpenAI
Category: Vendor framing
Sector: Enterprise operations
Capability: Vendor platform capability signal
Score: 64/100
Claim: OpenAI introduces MRC (Multipath Reliable Connection), a new supercomputer networking protocol released via OCP to improve resilience and performance in large-scale AI training clusters.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## GPT-5.5 Instant System Card
Source: https://openai.com/index/gpt-5-5-instant-system-card
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Frontier model release and benchmark movement
Score: 48/100
Claim: Official OpenAI release: GPT-5.5 Instant System Card.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## GPT-5.5 Instant: smarter, clearer, and more personalized
Source: https://openai.com/index/gpt-5-5-instant
Publisher: OpenAI
Category: Benchmarks
Sector: General AI capability
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: GPT-5.5 Instant updates ChatGPT’s default model with smarter, more accurate answers, reduced hallucinations, and improved personalization controls.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## New ways to buy ChatGPT ads
Source: https://openai.com/index/new-ways-to-buy-chatgpt-ads
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 38/100
Claim: OpenAI expands ChatGPT ads with a beta self-serve Ads Manager, CPC bidding, and enhanced measurement tools—built to protect privacy and keep conversations separate from ads.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Advancing youth safety and wellbeing in EMEA
Source: https://openai.com/index/advancing-youth-safety-in-emea
Publisher: OpenAI
Category: Vendor framing
Sector: Education
Capability: Vendor platform capability signal
Score: 26/100
Claim: Explore OpenAI’s European Youth Safety Blueprint and EMEA Youth & Wellbeing Grants, advancing safe, responsible AI for teens, families, and educators.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Agents for financial services
Source: https://www.anthropic.com/news/finance-agents
Publisher: Anthropic
Category: Vendor framing
Sector: Financial services
Capability: Financial workflow automation
Score: 74/100
Claim: We’re releasing ten ready-to-run agent templates for the most time-consuming work in financial services: building pitchbooks, screening KYC files, and closing the books at month-end. Each one ships as a plugin in Claude Cowork and Claude Code, and as a cookbook for Claude Managed Agents , so a team can put Claude on real financial work in days rather than.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## OpenAI and PwC collaborate to reimagine the office of the CFO
Source: https://openai.com/index/openai-pwc-finance-collaboration
Publisher: OpenAI
Category: Deployments
Sector: Financial services
Capability: Enterprise workflow automation
Score: 73/100
Claim: OpenAI and PwC are partnering to help enterprises use AI agents to automate finance workflows, improve forecasting, strengthen controls, and modernize the CFO function.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## How OpenAI delivers low-latency voice AI at scale
Source: https://openai.com/index/delivering-low-latency-voice-ai-at-scale
Publisher: OpenAI
Category: Vendor framing
Sector: Media and content
Capability: Multimodal content generation and media workflows
Score: 64/100
Claim: How OpenAI rebuilt its WebRTC stack to power real-time Voice AI with low latency, global scale, and seamless conversational turn-taking.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Building a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs
Source: https://www.anthropic.com/news/enterprise-ai-services-company
Publisher: Anthropic
Category: Deployments
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 85/100
Claim: Anthropic, Blackstone, Hellman & Friedman, and Goldman Sachs announced the formation of a new AI services company. The organization will work with mid-sized companies across sectors to bring Claude into their most important operations. Applied AI engineers from Anthropic will work alongside the firm’s engineering team to identify where Claude can have the.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Federal Reserve Bank of New York, Liberty Street Economics (May 2026): “Do Job Postings Show Early Labor-Market Effects of AI?”
Source: https://libertystreeteconomics.newyorkfed.org/2026/05/do-job-postings-show-early-labor-market-effects-of-ai/
Publisher: Federal Reserve Bank of New York
Category: Disconfirming — Attribution
Sector: Labour economics
Capability: AI labour-market attribution
Score: 22/100
Claim: Using Lightcast job postings data combined with Anthropic-derived AI exposure measures, NY Fed researchers report three findings: (1) the relative decline in vacancies for AI-exposed occupations began before the release of ChatGPT in late 2022; (2) no divergence between junior and senior postings within highly exposed occupations; (3) fewer than 10% of workers and vacancies sit in occupations with high measured AI exposure. The authors conclude these patterns make it difficult to attribute the recent slowdown in entry-level hiring to AI alone.
Oracle verdict: Finding (2) conflicts with the Stanford Digital Economy Lab payroll finding (~13% relative employment decline for ages 22–25 in the most AI-exposed occupations) on the specific signature this thesis predicts — the junior/senior split within exposed work. The two instruments measure different things: payroll records who was hired; postings record who firms say they want. The gap between them is itself informative if firms post junior roles they no longer fill, or inflate experience requirements on nominally entry-level postings (“experience creep”) — a reconciliation hypothesis that is checkable and currently unverified. Pending the pre-registered indicator readings, the register’s position is: the leading indicators disagree in a way consistent with the mechanism but not yet attributable to it. This entry is filed as disconfirming because a register that can only accumulate confirmation is a brief, not an audit.
Thesis relevance: Appendix III — audit register: attribution disconfirmation on junior/senior vacancy split

## Introducing Advanced Account Security
Source: https://openai.com/index/advanced-account-security
Publisher: OpenAI
Category: Vendor framing
Sector: Cybersecurity
Capability: Cyber defence and misuse monitoring
Score: 36/100
Claim: Introducing Advanced Account Security: phishing-resistant login, stronger recovery, and enhanced protections to safeguard sensitive data and prevent account takeover.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Trinity College Dublin and Microsoft Ireland Research Shows a Widening AI Maturity Gap Between SMEs and Large Organisations
Source: https://news.microsoft.com/source/emea/features/trinity-college-dublin-and-microsoft-ireland-research-shows-a-widening-ai-maturity-gap-between-smes-and-large-organisations/
Publisher: Microsoft Source EMEA / Trinity College Dublin
Category: Labour market
Sector: Irish business productivity and AI adoption
Capability: AI-enabled organisational time savings and maturity gap
Score: 72/100
Claim: The AI Economy Ireland 2026 report says 92% of Irish organisations use or plan to use AI, but only 10% describe deployment as advanced or frontier-level; large organisations are more than twice as likely as SMEs to save 2+ hours per week per employee, while formal AI policy is associated with 10x higher rates of major productivity gains.
Oracle verdict: The release frames this as a readiness and productivity story. The thesis reads it as uneven discontinuity: AI gains compound first where organisations can redesign work, leaving SMEs and lower-confidence workers exposed to a widening capability gap.
Thesis relevance: Appendix III, sections five to seven: labour-market evidence, organisational readiness, and deployment continuation

## Where the goblins came from
Source: https://openai.com/index/where-the-goblins-came-from
Publisher: OpenAI
Category: Vendor framing
Sector: AI infrastructure
Capability: Frontier model release and benchmark movement
Score: 76/100
Claim: How goblin outputs spread in AI models: timeline, root cause, and fixes behind personality-driven quirks in GPT-5 behavior.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Building the compute infrastructure for the Intelligence Age
Source: https://openai.com/index/building-the-compute-infrastructure-for-the-intelligence-age
Publisher: OpenAI
Category: Vendor framing
Sector: AI infrastructure
Capability: Vendor platform capability signal
Score: 64/100
Claim: OpenAI scales Stargate to build the compute infrastructure powering AGI, adding new data center capacity to meet growing AI demand.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Cybersecurity in the Intelligence Age
Source: https://openai.com/index/cybersecurity-in-the-intelligence-age
Publisher: OpenAI
Category: Vendor framing
Sector: Cybersecurity
Capability: Cyber defence and misuse monitoring
Score: 64/100
Claim: OpenAI outlines a five-part action plan for strengthening cybersecurity in the Intelligence Age, focused on democratizing AI-powered cyber defense and protecting critical systems.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Our commitment to community safety
Source: https://openai.com/index/our-commitment-to-community-safety
Publisher: OpenAI
Category: Vendor framing
Sector: Cybersecurity
Capability: Cyber defence and misuse monitoring
Score: 36/100
Claim: Learn how OpenAI protects community safety in ChatGPT through model safeguards, misuse detection, policy enforcement, and collaboration with safety experts.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## OpenAI models, Codex, and Managed Agents come to AWS
Source: https://openai.com/index/openai-on-aws
Publisher: OpenAI
Category: Deployments
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 95/100
Claim: OpenAI GPT models, Codex, and Managed Agents are now available on AWS, enabling enterprises to build secure AI in their AWS environments.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Claude for Creative Work
Source: https://www.anthropic.com/news/claude-for-creative-work
Publisher: Anthropic
Category: Vendor framing
Sector: Media and content
Capability: Multimodal content generation and media workflows
Score: 68/100
Claim: Creative professionals look to technology to expand what's possible in their work. Claude can't replace taste or imagination, but it can open up new ways of working—faster and more ambitious ideation, a more expansive skill set, and the ability for creatives to take on larger-scale projects. AI can also help shoulder the parts of the creative process that.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## OpenAI available at FedRAMP Moderate
Source: https://openai.com/index/openai-available-at-fedramp-moderate
Publisher: OpenAI
Category: Deployments
Sector: Public sector
Capability: Enterprise workflow automation
Score: 85/100
Claim: OpenAI is available at FedRAMP Moderate authorization for ChatGPT Enterprise and the OpenAI API, enabling secure AI adoption for U.S. federal agencies.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## The next phase of the Microsoft OpenAI partnership
Source: https://openai.com/index/next-phase-of-microsoft-partnership
Publisher: OpenAI
Category: Deployments
Sector: Customer operations
Capability: Production AI deployment signal
Score: 85/100
Claim: OpenAI and Microsoft announce an amended agreement that simplifies the partnership, adds long-term clarity, and supports continued AI innovation at scale.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## An open-source spec for orchestration: Symphony
Source: https://openai.com/index/open-source-codex-orchestration-symphony
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 74/100
Claim: Learn how Symphony, an open-source spec for Codex orchestration, turns issue trackers into always-on agent systems—boosting engineering output and reducing context switching.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Choco automates food distribution with AI agents
Source: https://openai.com/index/choco
Publisher: OpenAI
Category: Deployments
Sector: Commerce and marketplace
Capability: Enterprise workflow automation
Score: 96/100
Claim: How Choco used OpenAI APIs to streamline food distribution, boost productivity, and unlock growth—an in-depth customer story on real-world AI impact.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Anthropic names Theo Hourmouzis General Manager of Australia & New Zealand and officially opens Sydney office
Source: https://www.anthropic.com/news/theo-hourmouzis-general-manager-australia-new-zealand
Publisher: Anthropic
Category: Deployments
Sector: General AI capability
Capability: Enterprise workflow automation
Score: 63/100
Claim: Theo Hourmouzis is joining Anthropic as General Manager of Australia and New Zealand, marking the next step in our investment in the region. Hourmouzis will meet with customers and partners this week alongside executives from our global team, as we officially open our Sydney office. Hourmouzis brings more than 20 years of leadership experience in the.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Our principles
Source: https://openai.com/index/our-principles
Publisher: OpenAI
Category: Vendor framing
Sector: Enterprise operations
Capability: Vendor platform capability signal
Score: 54/100
Claim: Our mission is to ensure that AGI benefits all of humanity. Sam Altman shares five principles that guide our work.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## An update on our election safeguards
Source: https://www.anthropic.com/news/election-safeguards-update
Publisher: Anthropic
Category: Vendor framing
Sector: Cybersecurity
Capability: Vendor platform capability signal
Score: 36/100
Claim: People around the world turn to Claude for information about political parties, candidates, and the issues at stake during election time—as well as to answer simpler questions like when, where, and how to vote. In our view, if AI models can answer these questions well (that is, accurately and impartially), they can be a positive force for the democratic.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Anthropic and NEC collaborate to build Japan’s largest AI engineering workforce
Source: https://www.anthropic.com/news/anthropic-nec
Publisher: Anthropic
Category: Labour market
Sector: Enterprise operations
Capability: Education and workforce adoption
Score: 72/100
Claim: NEC Corporation will use Claude as it builds one of Japan’s largest AI-native engineering organizations, making it available to approximately 30,000 NEC Group employees worldwide. As part of this strategic collaboration, NEC will become Anthropic’s first Japan-based global partner. Together, we will develop secure, industry-specific AI products for the.
Oracle verdict: This is a labour-market context signal rather than a single workflow proof point. It helps the thesis track whether adoption, education, wages, and institutional behaviour are moving in the same direction as the capability curve.
Thesis relevance: Appendix III, section five: labour-market and adoption evidence

## GPT-5.5 System Card
Source: https://openai.com/index/gpt-5-5-system-card
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Frontier model release and benchmark movement
Score: 48/100
Claim: Official OpenAI release: GPT-5.5 System Card.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Introducing GPT-5.5
Source: https://openai.com/index/introducing-gpt-5-5
Publisher: OpenAI
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: Introducing GPT-5.5, our smartest model yet—faster, more capable, and built for complex tasks like coding, research, and data analysis across tools.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## GPT-5.5 Bio Bug Bounty
Source: https://openai.com/index/gpt-5-5-bio-bug-bounty
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Frontier model release and benchmark movement
Score: 56/100
Claim: Explore the GPT-5.5 Bio Bug Bounty: a red-teaming challenge to find universal jailbreaks for bio safety risks, with rewards up to $25,000.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Making ChatGPT better for clinicians
Source: https://openai.com/index/making-chatgpt-better-for-clinicians
Publisher: OpenAI
Category: Benchmarks
Sector: Healthcare and life sciences
Capability: Healthcare and life-sciences reasoning
Score: 76/100
Claim: OpenAI makes ChatGPT for Clinicians free for verified U.S. physicians, nurse practitioners, and pharmacists, supporting clinical care, documentation, and research.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Introducing workspace agents in ChatGPT
Source: https://openai.com/index/introducing-workspace-agents-in-chatgpt
Publisher: OpenAI
Category: Deployments
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 88/100
Claim: Workspace agents in ChatGPT are Codex-powered agents that automate complex workflows, run in the cloud, and help teams scale work across tools securely.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Speeding up agentic workflows with WebSockets in the Responses API
Source: https://openai.com/index/speeding-up-agentic-workflows-with-websockets
Publisher: OpenAI
Category: Deployments
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 92/100
Claim: A deep dive into the Codex agent loop, showing how WebSockets and connection-scoped caching reduced API overhead and improved model latency.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Introducing OpenAI Privacy Filter
Source: https://openai.com/index/introducing-openai-privacy-filter
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 38/100
Claim: OpenAI Privacy Filter is an open-weight model for detecting and redacting personally identifiable information (PII) in text with state-of-the-art accuracy.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Introducing ChatGPT Images 2.0
Source: https://openai.com/index/introducing-chatgpt-images-2-0
Publisher: OpenAI
Category: Vendor framing
Sector: Customer operations
Capability: Multimodal content generation and media workflows
Score: 64/100
Claim: ChatGPT Images 2.0 introduces a state-of-the-art image generation model with improved text rendering, multilingual support, and advanced visual reasoning.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Scaling Codex to enterprises worldwide
Source: https://openai.com/index/scaling-codex-to-enterprises-worldwide
Publisher: OpenAI
Category: Deployments
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 95/100
Claim: OpenAI launches Codex Labs, partners with with Accenture, PwC, Infosys, and others to help enterprises deploy and scale Codex across the software development lifecycle, and hits 4M Codex WAU.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## OpenAI helps Hyatt advance AI among colleagues
Source: https://openai.com/index/hyatt-advances-ai-with-chatgpt-enterprise
Publisher: OpenAI
Category: Labour market
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: Hyatt deploys ChatGPT Enterprise across its global workforce, using GPT-5.4 and Codex to improve productivity, operations, and guest experiences.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section five: labour-market and adoption evidence

## Anthropic and Amazon expand collaboration for up to 5 gigawatts of new compute
Source: https://www.anthropic.com/news/anthropic-amazon-compute
Publisher: Anthropic
Category: Deployments
Sector: Enterprise operations
Capability: Production AI deployment signal
Score: 85/100
Claim: We have signed a new agreement with Amazon that will deepen our existing partnership and secure up to 5 gigawatts (GW) of capacity for training and deploying Claude, including new Trainium2 capacity coming online in the first half of this year and nearly 1GW total of Trainium2 and Trainium3 capacity coming online by the end of 2026. We have worked closely.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Introducing Claude Design by Anthropic Labs
Source: https://www.anthropic.com/news/claude-design-anthropic-labs
Publisher: Anthropic
Category: Benchmarks
Sector: Scientific research
Capability: Frontier model release and benchmark movement
Score: 95/100
Claim: Today, we’re launching Claude Design, a new Anthropic Labs product that lets you collaborate with Claude to create polished visual work like designs, prototypes, slides, one-pagers, and more. Claude Design is powered by our most capable vision model, Claude Opus 4.7 , and is available in research preview for Claude Pro, Max, Team, and Enterprise.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Codex for (almost) everything
Source: https://openai.com/index/codex-for-almost-everything
Publisher: OpenAI
Category: Deployments
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 88/100
Claim: The updated Codex app for macOS and Windows adds computer use, in-app browsing, image generation, memory, and plugins to accelerate developer workflows.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Introducing GPT-Rosalind for life sciences research
Source: https://openai.com/index/introducing-gpt-rosalind
Publisher: OpenAI
Category: Benchmarks
Sector: Healthcare and life sciences
Capability: Frontier model release and benchmark movement
Score: 86/100
Claim: OpenAI introduces GPT-Rosalind, a frontier reasoning model built to accelerate drug discovery, genomics analysis, protein reasoning, and scientific research workflows.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Accelerating the cyber defense ecosystem that protects us all
Source: https://openai.com/index/accelerating-cyber-defense-ecosystem
Publisher: OpenAI
Category: Vendor framing
Sector: Cybersecurity
Capability: Frontier model release and benchmark movement
Score: 73/100
Claim: Leading security firms and enterprises join OpenAI’s Trusted Access for Cyber, using GPT-5.4-Cyber and $10M in API grants to strengthen global cyber defense.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Introducing Claude Opus 4.7
Source: https://www.anthropic.com/news/claude-opus-4-7
Publisher: Anthropic
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: Our latest model, Claude Opus 4.7, is now generally available. Opus 4.7 is a notable improvement on Opus 4.6 in advanced software engineering, with particular gains on the most difficult tasks. Users report being able to hand off their hardest coding work—the kind that previously needed close supervision—to Opus 4.7 with confidence. Opus 4.7 handles.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## The next evolution of the Agents SDK
Source: https://openai.com/index/the-next-evolution-of-the-agents-sdk
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Agent platform and API infrastructure
Score: 74/100
Claim: OpenAI updates the Agents SDK with native sandbox execution and a model-native harness, helping developers build secure, long-running agents across files and tools.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Trusted access for the next era of cyber defense
Source: https://openai.com/index/scaling-trusted-access-for-cyber-defense
Publisher: OpenAI
Category: Vendor framing
Sector: Cybersecurity
Capability: Frontier model release and benchmark movement
Score: 48/100
Claim: OpenAI expands its Trusted Access for Cyber program, introducing GPT-5.4-Cyber to vetted defenders and strengthening safeguards as AI cybersecurity capabilities advance.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Anthropic’s Long-Term Benefit Trust appoints Vas Narasimhan to Board of Directors
Source: https://www.anthropic.com/news/narasimhan-board
Publisher: Anthropic
Category: Benchmarks
Sector: Healthcare and life sciences
Capability: Enterprise workflow automation
Score: 54/100
Claim: Vas Narasimhan has been appointed to Anthropic's Board of Directors by the Anthropic Long-Term Benefit Trust. He is a physician-scientist and the Chief Executive Officer of Novartis—one of the world's leading innovative medicines companies—and shares Anthropic’s conviction that healthcare and life sciences are among the areas where AI has the greatest.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Enterprises power agentic workflows in Cloudflare Agent Cloud with OpenAI
Source: https://openai.com/index/cloudflare-openai-agent-cloud
Publisher: OpenAI
Category: Deployments
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: Cloudflare brings OpenAI’s GPT-5.4 and Codex to Agent Cloud, enabling enterprises to build, deploy, and scale AI agents for real-world tasks with speed and security.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Our response to the Axios developer tool compromise
Source: https://openai.com/index/axios-developer-tool-compromise
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Cyber defence and misuse monitoring
Score: 64/100
Claim: OpenAI responds to the Axios supply chain attack by rotating macOS code signing certificates, updating apps, and confirming no user data was compromised.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## CyberAgent moves faster with ChatGPT Enterprise and Codex
Source: https://openai.com/index/cyberagent
Publisher: OpenAI
Category: Deployments
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 96/100
Claim: CyberAgent uses ChatGPT Enterprise and Codex to securely scale AI adoption, improve quality, and accelerate decisions across advertising, media, and gaming.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## OpenAI Full Fan Mode Contest: Terms & Conditions
Source: https://openai.com/index/full-fan-mode-contest-terms-conditions
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 42/100
Claim: Explore the official terms and conditions for the OpenAI Full Fan Mode Contest, including eligibility, entry steps, judging criteria, and prize details. Learn how to participate, submit your entry on Instagram, and win IPL match tickets.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## The next phase of enterprise AI
Source: https://openai.com/index/next-phase-of-enterprise-ai
Publisher: OpenAI
Category: Deployments
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 95/100
Claim: OpenAI outlines the next phase of enterprise AI, as adoption accelerates across industries with Frontier, ChatGPT Enterprise, Codex, and company-wide AI agents.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Introducing the Child Safety Blueprint
Source: https://openai.com/index/introducing-child-safety-blueprint
Publisher: OpenAI
Category: Vendor framing
Sector: Cybersecurity
Capability: Vendor platform capability signal
Score: 36/100
Claim: Discover OpenAI’s Child Safety Blueprint—a roadmap for building AI responsibly with safeguards, age-appropriate design, and collaboration to protect and empower young people online.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Announcing the OpenAI Safety Fellowship
Source: https://openai.com/index/introducing-openai-safety-fellowship
Publisher: OpenAI
Category: Vendor framing
Sector: Customer operations
Capability: Vendor platform capability signal
Score: 26/100
Claim: A pilot program to support independent safety and alignment research and develop the next generation of talent.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Industrial policy for the Intelligence Age
Source: https://openai.com/index/industrial-policy-for-the-intelligence-age
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 36/100
Claim: Explore our ambitious, people-first industrial policy ideas for the AI era—focused on expanding opportunity, sharing prosperity, and building resilient institutions as advanced intelligence evolves.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Anthropic expands partnership with Google and Broadcom for multiple gigawatts of next-generation compute
Source: https://www.anthropic.com/news/google-broadcom-partnership-compute
Publisher: Anthropic
Category: Deployments
Sector: AI infrastructure
Capability: Frontier model release and benchmark movement
Score: 85/100
Claim: We have signed a new agreement with Google and Broadcom for multiple gigawatts of next-generation TPU capacity that we expect to come online starting in 2027. This significant expansion of our compute infrastructure will power our frontier Claude models and help us serve extraordinary demand from customers worldwide. “This groundbreaking partnership with.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## OpenAI acquires TBPN
Source: https://openai.com/index/openai-acquires-tbpn
Publisher: OpenAI
Category: Vendor framing
Sector: Customer operations
Capability: Enterprise workflow automation
Score: 64/100
Claim: OpenAI acquires TBPN to accelerate global conversations around AI and support independent media, expanding dialogue with builders, businesses, and the broader tech community.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Codex now offers more flexible pricing for teams
Source: https://openai.com/index/codex-flexible-pricing-for-teams
Publisher: OpenAI
Category: Deployments
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 95/100
Claim: Codex now includes pay-as-you-go pricing for ChatGPT Business and Enterprise, providing teams a more flexible option to start and scale adoption.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Gradient Labs gives every bank customer an AI account manager
Source: https://openai.com/index/gradient-labs
Publisher: OpenAI
Category: Deployments
Sector: Financial services
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: Gradient Labs uses GPT-4.1 and GPT-5.4 mini and nano to power AI agents that automate banking support workflows with low latency and high reliability.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Accelerating the next phase of AI
Source: https://openai.com/index/accelerating-the-next-phase-ai
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 73/100
Claim: OpenAI raises $122 billion in new funding to expand frontier AI globally, invest in next-generation compute, and meet growing demand for ChatGPT, Codex, and enterprise AI.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Australian government and Anthropic sign MOU for AI safety and research
Source: https://www.anthropic.com/news/australia-MOU
Publisher: Anthropic
Category: Vendor framing
Sector: Public sector
Capability: Vendor platform capability signal
Score: 47/100
Claim: Today, Anthropic signed a Memorandum of Understanding with the Australian government to cooperate on AI safety research and support the goals of Australia’s National AI Plan. Our CEO, Dario Amodei, met with Prime Minister Anthony Albanese to formalize the agreement during a visit to Canberra, Australia. We also announced AUD$3 million in partnerships with.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Helping disaster response teams turn AI into action across Asia
Source: https://openai.com/index/helping-disaster-response-teams-asia
Publisher: OpenAI
Category: Vendor framing
Sector: Enterprise operations
Capability: Vendor platform capability signal
Score: 54/100
Claim: AI for Disaster Response in Asia: OpenAI Workshop with Gates Foundation.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## STADLER reshapes knowledge work at a 230-year-old company
Source: https://openai.com/index/stadler
Publisher: OpenAI
Category: Deployments
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 82/100
Claim: Learn how STADLER uses ChatGPT to transform knowledge work, saving time and accelerating productivity across 650 employees.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Inside our approach to the Model Spec
Source: https://openai.com/index/our-approach-to-the-model-spec
Publisher: OpenAI
Category: Vendor framing
Sector: Enterprise operations
Capability: Vendor platform capability signal
Score: 36/100
Claim: Learn how OpenAI’s Model Spec serves as a public framework for model behavior, balancing safety, user freedom, and accountability as AI systems advance.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Introducing the OpenAI Safety Bug Bounty program
Source: https://openai.com/index/safety-bug-bounty
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 46/100
Claim: OpenAI launches a Safety Bug Bounty program to identify AI abuse and safety risks, including agentic vulnerabilities, prompt injection, and data exfiltration.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Helping developers build safer AI experiences for teens
Source: https://openai.com/index/teen-safety-policies-gpt-oss-safeguard
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Vendor platform capability signal
Score: 36/100
Claim: OpenAI releases prompt-based teen safety policies for developers using gpt-oss-safeguard, helping moderate age-specific risks in AI systems.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Powering product discovery in ChatGPT
Source: https://openai.com/index/powering-product-discovery-in-chatgpt
Publisher: OpenAI
Category: Deployments
Sector: Commerce and marketplace
Capability: Production AI deployment signal
Score: 88/100
Claim: ChatGPT introduces richer, visually immersive shopping powered by the Agentic Commerce Protocol, enabling product discovery, side-by-side comparisons, and merchant integration.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Update on the OpenAI Foundation
Source: https://openai.com/index/update-on-the-openai-foundation
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Education and workforce adoption
Score: 58/100
Claim: The OpenAI Foundation announces plans to invest at least $1 billion in curing diseases, economic opportunity, AI resilience, and community programs.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Creating with Sora Safely
Source: https://openai.com/index/creating-with-sora-safely
Publisher: OpenAI
Category: Vendor framing
Sector: Media and content
Capability: Multimodal content generation and media workflows
Score: 26/100
Claim: To address the novel safety challenges posed by a state-of-the-art video model as well as a new social creation platform, we’ve built Sora 2 and the Sora app with safety at the foundation. Our approach is anchored in concrete protections.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## How we monitor internal coding agents for misalignment
Source: https://openai.com/index/how-we-monitor-internal-coding-agents-misalignment
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 53/100
Claim: How OpenAI uses chain-of-thought monitoring to study misalignment in internal coding agents—analyzing real-world deployments to detect risks and strengthen AI safety safeguards.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## OpenAI to acquire Astral
Source: https://openai.com/index/openai-to-acquire-astral
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 74/100
Claim: Accelerates Codex growth to power the next generation of Python developer tools.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Introducing GPT-5.4 mini and nano
Source: https://openai.com/index/introducing-gpt-5-4-mini-and-nano
Publisher: OpenAI
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: GPT-5.4 mini and nano are smaller, faster versions of GPT-5.4 optimized for coding, tool use, multimodal reasoning, and high-volume API and sub-agent workloads.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## OpenAI Japan announces Japan Teen Safety Blueprint to put teen safety first
Source: https://openai.com/index/japan-teen-safety-blueprint
Publisher: OpenAI
Category: Vendor framing
Sector: Cybersecurity
Capability: Vendor platform capability signal
Score: 36/100
Claim: OpenAI Japan announces the Japan Teen Safety Blueprint, introducing stronger age protections, parental controls, and well-being safeguards for teens using generative AI.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Equipping workers with insights about compensation
Source: https://openai.com/index/equipping-workers-with-insights-about-compensation
Publisher: OpenAI
Category: Labour market
Sector: Scientific research
Capability: Education and workforce adoption
Score: 76/100
Claim: New research shows Americans send nearly 3 million daily messages to ChatGPT asking about compensation and earnings, helping close the wage information gap.
Oracle verdict: This is a labour-market context signal rather than a single workflow proof point. It helps the thesis track whether adoption, education, wages, and institutional behaviour are moving in the same direction as the capability curve.
Thesis relevance: Appendix III, section five: labour-market and adoption evidence

## Why Codex Security Doesn’t Include a SAST Report
Source: https://openai.com/index/why-codex-security-doesnt-include-sast
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 74/100
Claim: A deep dive into why Codex Security doesn’t rely on traditional SAST, instead using AI-driven constraint reasoning and validation to find real vulnerabilities with fewer false positives.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Anthropic invests $100 million into the Claude Partner Network
Source: https://www.anthropic.com/news/claude-partner-network
Publisher: Anthropic
Category: Deployments
Sector: Customer operations
Capability: Enterprise workflow automation
Score: 89/100
Claim: We’re launching the Claude Partner Network, a program for partner organizations helping enterprises adopt Claude. We’re committing an initial $100 million to support our partners with training courses, dedicated technical support, and joint market development. Partners who join from today will get immediate access to a new technical certification and be.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Designing AI agents to resist prompt injection
Source: https://openai.com/index/designing-agents-to-resist-prompt-injection
Publisher: OpenAI
Category: Deployments
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 88/100
Claim: How ChatGPT defends against prompt injection and social engineering by constraining risky actions and protecting sensitive data in agent workflows.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## From model to agent: Equipping the Responses API with a computer environment
Source: https://openai.com/index/equip-responses-api-computer-environment
Publisher: OpenAI
Category: Vendor framing
Sector: AI infrastructure
Capability: Agent platform and API infrastructure
Score: 74/100
Claim: How OpenAI built an agent runtime using the Responses API, shell tool, and hosted containers to run secure, scalable agents with files, tools, and state.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Rakuten fixes issues twice as fast with Codex
Source: https://openai.com/index/rakuten
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 74/100
Claim: Official OpenAI release: Rakuten fixes issues twice as fast with Codex.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Wayfair boosts catalog accuracy and support speed with OpenAI
Source: https://openai.com/index/wayfair
Publisher: OpenAI
Category: Deployments
Sector: Customer operations
Capability: Production AI deployment signal
Score: 82/100
Claim: Wayfair uses OpenAI models to improve ecommerce support and product catalog accuracy, automating ticket triage and enhancing millions of product attributes at scale.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Introducing The Anthropic Institute
Source: https://www.anthropic.com/news/the-anthropic-institute
Publisher: Anthropic
Category: Benchmarks
Sector: Scientific research
Capability: Model and benchmark capability movement
Score: 76/100
Claim: We’re launching The Anthropic Institute , a new effort to confront the most significant challenges that powerful AI will pose to our societies. The Anthropic Institute will draw on research from across Anthropic to provide information that other researchers and the public can use during our transition to a world containing much more powerful AI systems. In.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Improving instruction hierarchy in frontier LLMs
Source: https://openai.com/index/instruction-hierarchy-challenge
Publisher: OpenAI
Category: Vendor framing
Sector: Cybersecurity
Capability: Frontier model release and benchmark movement
Score: 36/100
Claim: IH-Challenge trains models to prioritize trusted instructions, improving instruction hierarchy, safety steerability, and resistance to prompt injection attacks.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## New ways to learn math and science in ChatGPT
Source: https://openai.com/index/new-ways-to-learn-math-and-science-in-chatgpt
Publisher: OpenAI
Category: Benchmarks
Sector: Education
Capability: Education and workforce adoption
Score: 76/100
Claim: ChatGPT introduces interactive visual explanations for math and science, helping students explore formulas, variables, and concepts in real time.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Sydney will become Anthropic’s fourth office in Asia-Pacific
Source: https://www.anthropic.com/news/sydney-fourth-office-asia-pacific
Publisher: Anthropic
Category: Vendor framing
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 42/100
Claim: Anthropic is expanding to Australia and New Zealand. In the coming weeks, we will open an office in Sydney—our fourth office in Asia-Pacific, alongside Tokyo, Bengaluru, and Seoul. The expansion reflects strong demand from businesses in Australia and New Zealand and will help us better serve the countries’ unique AI ecosystems. In addition to hiring a team.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## OpenAI to acquire Promptfoo
Source: https://openai.com/index/openai-to-acquire-promptfoo
Publisher: OpenAI
Category: Deployments
Sector: Cybersecurity
Capability: Enterprise workflow automation
Score: 85/100
Claim: OpenAI is acquiring Promptfoo, an AI security platform that helps enterprises identify and remediate vulnerabilities in AI systems during development.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Codex Security: now in research preview
Source: https://openai.com/index/codex-security-now-in-research-preview
Publisher: OpenAI
Category: Benchmarks
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 86/100
Claim: Codex Security is an AI application security agent that analyzes project context to detect, validate, and patch complex vulnerabilities with higher confidence and less noise.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## How Balyasny Asset Management built an AI research engine
Source: https://openai.com/index/balyasny-asset-management
Publisher: OpenAI
Category: Benchmarks
Sector: Scientific research
Capability: Enterprise workflow automation
Score: 86/100
Claim: By combining rigorous model evaluation, full-platform use of OpenAI, and agent workflows, Balyasny is reinventing investment research.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## How Descript engineers multilingual video dubbing at scale
Source: https://openai.com/index/descript
Publisher: OpenAI
Category: Benchmarks
Sector: Media and content
Capability: Multimodal content generation and media workflows
Score: 76/100
Claim: Using OpenAI reasoning models, Descript unlocked automatic localization of large content libraries without losing timing or meaning.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Partnering with Mozilla to improve Firefox’s security
Source: https://www.anthropic.com/news/mozilla-firefox-security
Publisher: Anthropic
Category: Benchmarks
Sector: Software engineering
Capability: Cyber defence and misuse monitoring
Score: 76/100
Claim: AI models can now independently identify high-severity vulnerabilities in complex software. As we recently documented, Claude found more than 500 zero-day vulnerabilities (security flaws that are unknown to the software’s maintainers) in well-tested open-source software. In this post, we share details of a collaboration with researchers at Mozilla in which.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Introducing GPT-5.4
Source: https://openai.com/index/introducing-gpt-5-4
Publisher: OpenAI
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: Introducing GPT-5.4, OpenAI’s most most capable and efficient frontier model for professional work, with state-of-the-art coding, computer use, tool search, and 1M-token context.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## GPT-5.4 Thinking System Card
Source: https://openai.com/index/gpt-5-4-thinking-system-card
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Frontier model release and benchmark movement
Score: 48/100
Claim: Official OpenAI release: GPT-5.4 Thinking System Card.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Reasoning models struggle to control their chains of thought, and that’s good
Source: https://openai.com/index/reasoning-models-chain-of-thought-controllability
Publisher: OpenAI
Category: Vendor framing
Sector: Cybersecurity
Capability: Vendor platform capability signal
Score: 36/100
Claim: OpenAI introduces CoT-Control and finds reasoning models struggle to control their chains of thought, reinforcing monitorability as an AI safety safeguard.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Ensuring AI use in education leads to opportunity
Source: https://openai.com/index/ai-education-opportunity
Publisher: OpenAI
Category: Vendor framing
Sector: Education
Capability: Education and workforce adoption
Score: 64/100
Claim: OpenAI shares new tools, certifications, and measurement resources to help schools and universities close AI capability gaps and expand opportunity.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## The five AI value models driving business reinvention
Source: https://openai.com/index/the-five-ai-value-models-driving-business-reinvention
Publisher: OpenAI
Category: Labour market
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 72/100
Claim: Five AI value models show how leaders can sequence AI from workforce fluency to process reinvention and build durable business advantage.
Oracle verdict: This is a labour-market context signal rather than a single workflow proof point. It helps the thesis track whether adoption, education, wages, and institutional behaviour are moving in the same direction as the capability curve.
Thesis relevance: Appendix III, section five: labour-market and adoption evidence

## VfL Wolfsburg turns ChatGPT into a club-wide capability
Source: https://openai.com/index/vfl-wolfsburg
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 64/100
Claim: By focusing on people, not pilots, the Bundesliga club is scaling efficiency, creativity, and knowledge—without losing its football identity.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Introducing the Adoption news channel
Source: https://openai.com/index/introducing-the-adoption-news-channel
Publisher: OpenAI
Category: Vendor framing
Sector: Media and content
Capability: Enterprise workflow automation
Score: 64/100
Claim: Practical insights and frameworks to turn AI progress into business advantage.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Introducing ChatGPT for Excel and new financial data integrations
Source: https://openai.com/index/chatgpt-for-excel
Publisher: OpenAI
Category: Benchmarks
Sector: Financial services
Capability: Frontier model release and benchmark movement
Score: 88/100
Claim: OpenAI introduces ChatGPT for Excel and new financial app integrations, powered by GPT-5.4 to accelerate modeling, research, and analysis in regulated environments.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Where things stand with the Department of War
Source: https://www.anthropic.com/news/where-stand-department-war
Publisher: Anthropic
Category: Vendor framing
Sector: Cybersecurity
Capability: Cyber defence and misuse monitoring
Score: 64/100
Claim: Yesterday (March 4) Anthropic received a letter from the Department of War confirming that we have been designated as a supply chain risk to America’s national security. As we wrote on Friday , we do not believe this action is legally sound, and we see no choice but to challenge it in court.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Extending single-minus amplitudes to gravitons
Source: https://openai.com/index/extending-single-minus-amplitudes-to-gravitons
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Frontier model release and benchmark movement
Score: 76/100
Claim: A new preprint extends single-minus amplitudes to gravitons, with GPT-5.2 Pro helping derive and verify nonzero graviton tree amplitudes in quantum gravity.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## How Axios uses AI to help deliver high-impact local journalism
Source: https://openai.com/index/axios-allison-murphy
Publisher: OpenAI
Category: Deployments
Sector: Customer operations
Capability: Enterprise workflow automation
Score: 88/100
Claim: Axios COO Allison Murphy explains how the company uses AI to support local reporters, streamline newsroom workflows, and deliver high-impact local journalism at scale.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Understanding AI and learning outcomes
Source: https://openai.com/index/understanding-ai-and-learning-outcomes
Publisher: OpenAI
Category: Vendor framing
Sector: Education
Capability: Education and workforce adoption
Score: 64/100
Claim: OpenAI introduces the Learning Outcomes Measurement Suite to assess AI’s impact on student learning across diverse educational environments over time.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## GPT-5.3 Instant: Smoother, more useful everyday conversations
Source: https://openai.com/index/gpt-5-3-instant
Publisher: OpenAI
Category: Benchmarks
Sector: General AI capability
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: Official OpenAI release: GPT-5.3 Instant: Smoother, more useful everyday conversations.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## GPT-5.3 Instant System Card
Source: https://openai.com/index/gpt-5-3-instant-system-card
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Frontier model release and benchmark movement
Score: 48/100
Claim: Official OpenAI release: GPT-5.3 Instant System Card.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Our agreement with the Department of War
Source: https://openai.com/index/our-agreement-with-the-department-of-war
Publisher: OpenAI
Category: Vendor framing
Sector: Public sector
Capability: Vendor platform capability signal
Score: 43/100
Claim: Details on OpenAI’s contract with the Department of War, outlining safety red lines, legal protections, and how AI systems will be deployed in classified environments.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Joint Statement from OpenAI and Microsoft
Source: https://openai.com/index/continuing-microsoft-partnership
Publisher: OpenAI
Category: Benchmarks
Sector: Scientific research
Capability: Model and benchmark capability movement
Score: 76/100
Claim: Microsoft and OpenAI continue to work closely across research, engineering, and product development, building on years of deep collaboration and shared success.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## OpenAI and Amazon announce strategic partnership
Source: https://openai.com/index/amazon-partnership
Publisher: OpenAI
Category: Deployments
Sector: Enterprise operations
Capability: Frontier model release and benchmark movement
Score: 95/100
Claim: OpenAI and Amazon announce a strategic partnership bringing OpenAI’s Frontier platform to AWS, expanding AI infrastructure, custom models, and enterprise AI agents.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Introducing the Stateful Runtime Environment for Agents in Amazon Bedrock
Source: https://openai.com/index/introducing-the-stateful-runtime-environment-for-agents-in-amazon-bedrock
Publisher: OpenAI
Category: Deployments
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 88/100
Claim: Stateful Runtime for Agents in Amazon Bedrock brings persistent orchestration, memory, and secure execution to multi-step AI workflows powered by OpenAI.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Scaling AI for everyone
Source: https://openai.com/index/scaling-ai-for-everyone
Publisher: OpenAI
Category: Deployments
Sector: Financial services
Capability: Financial workflow automation
Score: 78/100
Claim: Today we’re announcing $110B in new investment at a $730B pre money valuation. This includes $30B from SoftBank, $30B from NVIDIA, and $50B from Amazon.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## An update on our mental health-related work
Source: https://openai.com/index/update-on-mental-health-related-work
Publisher: OpenAI
Category: Vendor framing
Sector: Healthcare and life sciences
Capability: Healthcare and life-sciences reasoning
Score: 36/100
Claim: OpenAI shares updates on its mental health safety work, including parental controls, trusted contacts, improved distress detection, and recent litigation developments.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Statement on the comments from Secretary of War Pete Hegseth
Source: https://www.anthropic.com/news/statement-comments-secretary-war
Publisher: Anthropic
Category: Vendor framing
Sector: Public sector
Capability: Vendor platform capability signal
Score: 64/100
Claim: Earlier today, Secretary of War Pete Hegseth shared on X that he is directing the Department of War to designate Anthropic a supply chain risk. This action follows months of negotiations that reached an impasse over two exceptions we requested to the lawful use of our AI model, Claude: the mass domestic surveillance of Americans and fully autonomous.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Pacific Northwest National Laboratory and OpenAI partner to accelerate federal permitting
Source: https://openai.com/index/pacific-northwest-national-laboratory
Publisher: OpenAI
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 86/100
Claim: OpenAI and Pacific Northwest National Laboratory introduce DraftNEPABench, a new benchmark evaluating how AI coding agents can accelerate federal permitting—showing potential to reduce NEPA drafting time by up to 15% and modernize infrastructure reviews.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## OpenAI Codex and Figma launch seamless code-to-design experience
Source: https://openai.com/index/figma-partnership
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 78/100
Claim: OpenAI and Figma launch a new Codex integration that connects code and design, enabling teams to move between implementation and the Figma canvas to iterate and ship faster.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Statement from Dario Amodei on our discussions with the Department of War
Source: https://www.anthropic.com/news/statement-department-of-war
Publisher: Anthropic
Category: Deployments
Sector: Public sector
Capability: Frontier model release and benchmark movement
Score: 85/100
Claim: I believe deeply in the existential importance of using AI to defend the United States and other democracies, and to defeat our autocratic adversaries. Anthropic has therefore worked proactively to deploy our models to the Department of War and the intelligence community. We were the first frontier AI company to deploy our models in the US government’s.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Disrupting malicious uses of AI | February 2026
Source: https://openai.com/index/disrupting-malicious-ai-uses
Publisher: OpenAI
Category: Deployments
Sector: General AI capability
Capability: Production AI deployment signal
Score: 78/100
Claim: Our latest threat report examines how malicious actors combine AI models with websites and social platforms—and what it means for detection and defense.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Anthropic acquires Vercept to advance Claude's computer use capabilities
Source: https://www.anthropic.com/news/acquires-vercept
Publisher: Anthropic
Category: Benchmarks
Sector: Media and content
Capability: Autonomous software engineering and computer-use agents
Score: 64/100
Claim: People are using Claude for increasingly complex work—writing and running code across entire repositories, synthesizing research from dozens of sources, and managing workflows that span multiple tools and teams. Computer use enables Claude to do all of that inside live applications, the way a person at a keyboard would. That means Claude can take on.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Arvind KC appointed Chief People Officer
Source: https://openai.com/index/arvind-kc-chief-people-officer
Publisher: OpenAI
Category: Vendor framing
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 42/100
Claim: OpenAI appoints Arvind KC as Chief People Officer to help scale the company, strengthen its culture, and lead how work evolves in the age of AI.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Anthropic’s Responsible Scaling Policy: Version 3.0
Source: https://www.anthropic.com/news/responsible-scaling-policy-v3
Publisher: Anthropic
Category: Vendor framing
Sector: Enterprise operations
Capability: Vendor platform capability signal
Score: 36/100
Claim: We’re releasing the third version of our Responsible Scaling Policy (RSP), the voluntary framework we use to mitigate catastrophic risks from AI systems. Anthropic has now had an RSP for more than two years, and we’ve learned a great deal about its benefits and its shortcomings. We’re therefore updating the policy to reinforce what has worked well to date.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Why we no longer evaluate SWE-bench Verified
Source: https://openai.com/index/why-we-no-longer-evaluate-swe-bench-verified
Publisher: OpenAI
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 76/100
Claim: SWE-bench Verified is increasingly contaminated and mismeasures frontier coding progress. Our analysis shows flawed tests and training leakage. We recommend SWE-bench Pro.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## OpenAI announces Frontier Alliance Partners
Source: https://openai.com/index/frontier-alliance-partners
Publisher: OpenAI
Category: Deployments
Sector: Enterprise operations
Capability: Frontier model release and benchmark movement
Score: 95/100
Claim: OpenAI announces Frontier Alliance Partners to help enterprises move from AI pilots to production with secure, scalable agent deployments.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Detecting and preventing distillation attacks
Source: https://www.anthropic.com/news/detecting-and-preventing-distillation-attacks
Publisher: Anthropic
Category: Vendor framing
Sector: Cybersecurity
Capability: Cyber defence and misuse monitoring
Score: 46/100
Claim: We have identified industrial-scale campaigns by three AI laboratories—DeepSeek, Moonshot, and MiniMax—to illicitly extract Claude’s capabilities to improve their own models. These labs generated over 16 million exchanges with Claude through approximately 24,000 fraudulent accounts, in violation of our terms of service and regional access restrictions.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Our First Proof submissions
Source: https://openai.com/index/first-proof-submissions
Publisher: OpenAI
Category: Benchmarks
Sector: Scientific research
Capability: Model and benchmark capability movement
Score: 76/100
Claim: We share our AI model’s proof attempts for the First Proof math challenge, testing research-grade reasoning on expert-level problems.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Making frontier cybersecurity capabilities available to defenders
Source: https://www.anthropic.com/news/claude-code-security
Publisher: Anthropic
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 76/100
Claim: Claude Code Security , a new capability built into Claude Code on the web, is now available in a limited research preview. It scans codebases for security vulnerabilities and suggests targeted software patches for human review, allowing teams to find and fix security issues that traditional methods often miss. Security teams face a common challenge: too.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Advancing independent research on AI alignment
Source: https://openai.com/index/advancing-independent-research-ai-alignment
Publisher: OpenAI
Category: Vendor framing
Sector: Cybersecurity
Capability: Cyber defence and misuse monitoring
Score: 36/100
Claim: OpenAI commits $7.5M to The Alignment Project to fund independent AI alignment research, strengthening global efforts to address AGI safety and security risks.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Introducing OpenAI for India
Source: https://openai.com/index/openai-for-india
Publisher: OpenAI
Category: Labour market
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 79/100
Claim: OpenAI for India expands AI access across the country—building local infrastructure, powering enterprises, and advancing workforce skills.
Oracle verdict: This is a labour-market context signal rather than a single workflow proof point. It helps the thesis track whether adoption, education, wages, and institutional behaviour are moving in the same direction as the capability curve.
Thesis relevance: Appendix III, section five: labour-market and adoption evidence

## Introducing EVMbench
Source: https://openai.com/index/introducing-evmbench
Publisher: OpenAI
Category: Benchmarks
Sector: General AI capability
Capability: Frontier model release and benchmark movement
Score: 86/100
Claim: OpenAI and Paradigm introduce EVMbench, a benchmark evaluating AI agents’ ability to detect, patch, and exploit high-severity smart contract vulnerabilities.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Introducing Claude Sonnet 4.6
Source: https://www.anthropic.com/news/claude-sonnet-4-6
Publisher: Anthropic
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: Claude Sonnet 4.6 is our most capable Sonnet model yet . It’s a full upgrade of the model’s skills across coding, computer use, long-context reasoning, agent planning, knowledge work, and design. Sonnet 4.6 also features a 1M token context window in beta. For those on our Free and Pro plans , Claude Sonnet 4.6 is now the default model in claude.ai and.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Anthropic and the Government of Rwanda sign MOU for AI in health and education
Source: https://www.anthropic.com/news/anthropic-rwanda-mou
Publisher: Anthropic
Category: Deployments
Sector: Healthcare and life sciences
Capability: Healthcare and life-sciences reasoning
Score: 85/100
Claim: The Government of Rwanda and Anthropic have signed a three-year Memorandum of Understanding to formalize and expand our partnership, bringing AI to Rwanda’s education, health, and public sector systems. This agreement builds on the ALX education partnership we announced in November 2025 and marks the first time Anthropic has formalized a multi-sector.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Anthropic and Infosys collaborate to build AI agents for telecommunications and other regulated industries
Source: https://www.anthropic.com/news/anthropic-infosys
Publisher: Anthropic
Category: Deployments
Sector: Software engineering
Capability: Enterprise workflow automation
Score: 95/100
Claim: Anthropic and Infosys , a global leader in next-generation digital services and consulting founded and headquartered in Bengaluru, today announced a collaboration to develop and deliver enterprise AI solutions across telecommunications, financial services, manufacturing, and software development. The collaboration integrates Anthropic’s Claude models and.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Anthropic opens Bengaluru office and announces new partnerships across India
Source: https://www.anthropic.com/news/bengaluru-office-partnerships-across-india
Publisher: Anthropic
Category: Benchmarks
Sector: Software engineering
Capability: Enterprise workflow automation
Score: 61/100
Claim: India is the second-largest market for Claude.ai , home to a developer community doing some of the most technically intense AI work we see anywhere. Nearly half of Claude usage in India comprises computer and mathematical tasks: building applications, modernizing systems, and shipping production software. Today, as we officially open our Bengaluru office.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## GPT-5.2 derives a new result in theoretical physics
Source: https://openai.com/index/new-result-theoretical-physics
Publisher: OpenAI
Category: Benchmarks
Sector: Scientific research
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: A new preprint shows GPT-5.2 proposing a new formula for a gluon amplitude, later formally proved and verified by OpenAI and academic collaborators.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Introducing Lockdown Mode and Elevated Risk labels in ChatGPT
Source: https://openai.com/index/introducing-lockdown-mode-and-elevated-risk-labels-in-chatgpt
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 64/100
Claim: Introducing Lockdown Mode and Elevated Risk labels in ChatGPT to help organizations defend against prompt injection and AI-driven data exfiltration.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Beyond rate limits: scaling access to Codex and Sora
Source: https://openai.com/index/beyond-rate-limits
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 74/100
Claim: How OpenAI built a real-time access system combining rate limits, usage tracking, and credits to power continuous access to Sora and Codex.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Scaling social science research
Source: https://openai.com/index/scaling-social-science-research
Publisher: OpenAI
Category: Benchmarks
Sector: Media and content
Capability: Multimodal content generation and media workflows
Score: 76/100
Claim: GABRIEL is a new open-source toolkit from OpenAI that uses GPT to turn qualitative text and images into quantitative data, helping social scientists analyze research at scale.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Chris Liddell appointed to Anthropic’s board of directors
Source: https://www.anthropic.com/news/chris-liddell-appointed-anthropic-board
Publisher: Anthropic
Category: Vendor framing
Sector: Financial services
Capability: Enterprise workflow automation
Score: 42/100
Claim: Chris Liddell has been appointed to Anthropic’s Board of Directors. He brings over 30 years of senior leadership experience across some of the world's largest and most complex organizations to the role. He previously served as Chief Financial Officer of Microsoft, General Motors, and International Paper, as well as the Deputy White House Chief of Staff.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Anthropic partners with CodePath to bring Claude to the US’s largest collegiate computer science program
Source: https://www.anthropic.com/news/anthropic-codepath-partnership
Publisher: Anthropic
Category: Benchmarks
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 76/100
Claim: Anthropic is partnering with CodePath, the nation’s largest provider of collegiate computer science education, to redesign its coding curriculum as AI reshapes the field of software development. CodePath will put Claude and Claude Code at the center of its courses and career programs, giving more than 20,000 students at community colleges, state schools.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Introducing GPT-5.3-Codex-Spark
Source: https://openai.com/index/introducing-gpt-5-3-codex-spark
Publisher: OpenAI
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: Introducing GPT-5.3-Codex-Spark—our first real-time coding model. 15x faster generation, 128k context, now in research preview for ChatGPT Pro users.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Anthropic is donating $20 million to Public First Action
Source: https://www.anthropic.com/news/donate-public-first-action
Publisher: Anthropic
Category: Benchmarks
Sector: Cybersecurity
Capability: Cyber defence and misuse monitoring
Score: 90/100
Claim: AI will bring enormous benefits —for science, technology, medicine, economic growth, and much more. But a technology this powerful also comes with considerable risks . Those risks might come from the misuse of the models: AI is already being exploited to automate cyberattacks ; in the future it might assist in the production of dangerous weapons . Risks.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Anthropic raises $30 billion in Series G funding at $380 billion post-money valuation
Source: https://www.anthropic.com/news/anthropic-raises-30-billion-series-g-funding-380-billion-post-money-valuation
Publisher: Anthropic
Category: Vendor framing
Sector: Scientific research
Capability: Frontier model release and benchmark movement
Score: 56/100
Claim: We have raised $30 billion in Series G funding led by GIC and Coatue, valuing Anthropic at $380 billion post-money. The round was co-led by D. E. Shaw Ventures, Dragoneer, Founders Fund, ICONIQ, and MGX. The investment will fuel the frontier research, product development, and infrastructure expansions that have made Anthropic the market leader in.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Harness engineering: leveraging Codex in an agent-first world
Source: https://openai.com/index/harness-engineering
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 74/100
Claim: By Ryan Lopopolo, Member of the Technical Staff.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Covering electricity price increases from our data centers
Source: https://www.anthropic.com/news/covering-electricity-price-increases
Publisher: Anthropic
Category: Vendor framing
Sector: AI infrastructure
Capability: Frontier model release and benchmark movement
Score: 64/100
Claim: As we continue to invest in American AI infrastructure , Anthropic will cover electricity price increases that consumers face from our data centers. Training a single frontier AI model will soon require gigawatts of power, and the US AI sector will need at least 50 gigawatts of capacity over the next several years. The country needs to build new data.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Bringing ChatGPT to GenAI.mil
Source: https://openai.com/index/bringing-chatgpt-to-genaimil
Publisher: OpenAI
Category: Vendor framing
Sector: Public sector
Capability: Vendor platform capability signal
Score: 43/100
Claim: OpenAI for Government announces the deployment of a custom ChatGPT on GenAI.mil, bringing secure, safety-forward AI to U.S. defense teams.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Making AI work for everyone, everywhere: our approach to localization
Source: https://openai.com/index/our-approach-to-localization
Publisher: OpenAI
Category: Vendor framing
Sector: Enterprise operations
Capability: Frontier model release and benchmark movement
Score: 36/100
Claim: OpenAI shares its approach to AI localization, showing how globally shared frontier models can be adapted to local languages, laws, and cultures without compromising safety.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## GPT-5 lowers the cost of cell-free protein synthesis
Source: https://openai.com/index/gpt-5-lowers-protein-synthesis-cost
Publisher: OpenAI
Category: Benchmarks
Sector: Healthcare and life sciences
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: An autonomous lab combining OpenAI’s GPT-5 with Ginkgo Bioworks’ cloud automation cut cell-free protein synthesis costs by 40% through closed-loop experimentation.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Introducing Trusted Access for Cyber
Source: https://openai.com/index/trusted-access-for-cyber
Publisher: OpenAI
Category: Vendor framing
Sector: Cybersecurity
Capability: Frontier model release and benchmark movement
Score: 36/100
Claim: OpenAI introduces Trusted Access for Cyber, a trust-based framework that expands access to frontier cyber capabilities while strengthening safeguards against misuse.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Introducing OpenAI Frontier
Source: https://openai.com/index/introducing-openai-frontier
Publisher: OpenAI
Category: Deployments
Sector: Enterprise operations
Capability: Frontier model release and benchmark movement
Score: 73/100
Claim: OpenAI Frontier is an enterprise platform for building, deploying, and managing AI agents with shared context, onboarding, permissions, and governance.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Introducing GPT-5.3-Codex
Source: https://openai.com/index/introducing-gpt-5-3-codex
Publisher: OpenAI
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: GPT-5.3-Codex is a Codex-native agent that pairs frontier coding performance with general reasoning to support long-horizon, real-world technical work.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## GPT-5.3-Codex System Card
Source: https://openai.com/index/gpt-5-3-codex-system-card
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 58/100
Claim: GPT‑5.3-Codex is the most capable agentic coding model to date, combining the frontier coding performance of GPT‑5.2-Codex with the reasoning and professional knowledge capabilities of GPT‑5.2.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Introducing Claude Opus 4.6
Source: https://www.anthropic.com/news/claude-opus-4-6
Publisher: Anthropic
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: We’re upgrading our smartest model. The new Claude Opus 4.6 improves on its predecessor’s coding skills. It plans more carefully, sustains agentic tasks for longer, can operate more reliably in larger codebases, and has better code review and debugging skills to catch its own mistakes. And, in a first for our Opus-class models, Opus 4.6 features a 1M token.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Unlocking the Codex harness: how we built the App Server
Source: https://openai.com/index/unlocking-the-codex-harness
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 74/100
Claim: Learn how to embed the Codex agent using the Codex App Server, a bidirectional JSON-RPC API powering streaming progress, tool use, approvals, and diffs.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Claude is a space to think
Source: https://www.anthropic.com/news/claude-is-a-space-to-think
Publisher: Anthropic
Category: Deployments
Sector: Media and content
Capability: Production AI deployment signal
Score: 85/100
Claim: There are many good places for advertising. A conversation with Claude is not one of them. Advertising drives competition, helps people discover new products, and allows services like email and social media to be offered for free. We’ve run our own ad campaigns , and our AI models have, in turn, helped many of our customers in the advertising industry.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## The Sora feed philosophy
Source: https://openai.com/index/sora-feed-philosophy
Publisher: OpenAI
Category: Vendor framing
Sector: Media and content
Capability: Multimodal content generation and media workflows
Score: 54/100
Claim: Discover the Sora feed philosophy—built to spark creativity, foster connections, and keep experiences safe with personalized recommendations, parental controls, and strong guardrails.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Apple’s Xcode now supports the Claude Agent SDK
Source: https://www.anthropic.com/news/apple-xcode-claude-agent-sdk
Publisher: Anthropic
Category: Vendor framing
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 86/100
Claim: Apple's Xcode is where developers build, test, and distribute apps for Apple platforms, including iPhone, iPad, Mac, Apple Watch, Apple Vision Pro, and Apple TV. In September, we announced that developers would have access to Claude Sonnet 4 in Xcode 26. Claude could be used to write code, debug, and generate documentation—but it was limited to helping.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Snowflake and OpenAI partner to bring frontier intelligence to enterprise data
Source: https://openai.com/index/snowflake-partnership
Publisher: OpenAI
Category: Deployments
Sector: Enterprise operations
Capability: Frontier model release and benchmark movement
Score: 95/100
Claim: OpenAI and Snowflake partner in a $200M agreement to bring frontier intelligence into enterprise data, enabling AI agents and insights directly in Snowflake.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Introducing the Codex app
Source: https://openai.com/index/introducing-the-codex-app
Publisher: OpenAI
Category: Deployments
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 88/100
Claim: Introducing the Codex app for macOS—a command center for AI coding and software development with multiple agents, parallel workflows, and long-running tasks.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Anthropic partners with Allen Institute and Howard Hughes Medical Institute to accelerate scientific discovery
Source: https://www.anthropic.com/news/anthropic-partners-with-allen-institute-and-howard-hughes-medical-institute
Publisher: Anthropic
Category: Benchmarks
Sector: Healthcare and life sciences
Capability: Healthcare and life-sciences reasoning
Score: 76/100
Claim: Modern biological research generates data at unprecedented scale—from single-cell sequencing to whole-brain connectomics—yet transforming that data into validated biological insights remains a fundamental bottleneck. Knowledge synthesis, hypothesis generation, and experimental interpretation still depend on manual processes that can't keep pace with the.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Inside OpenAI’s in-house data agent
Source: https://openai.com/index/inside-our-in-house-data-agent
Publisher: OpenAI
Category: Deployments
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: How OpenAI built an in-house AI data agent that uses GPT-5, Codex, and memory to reason over massive datasets and deliver reliable insights in minutes.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Retiring GPT-4o, GPT-4.1, GPT-4.1 mini, and OpenAI o4-mini in ChatGPT
Source: https://openai.com/index/retiring-gpt-4o-and-older-models
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Agent platform and API infrastructure
Score: 64/100
Claim: On February 13, 2026, alongside the previously announced retirement⁠ of GPT‑5 (Instant, Thinking, and Pro), we will retire GPT‑4o, GPT‑4.1, GPT‑4.1 mini, and OpenAI o4-mini from ChatGPT. In the API, there are no changes at this time.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Taisei Corporation shapes the next generation of talent with AI
Source: https://openai.com/index/taisei
Publisher: OpenAI
Category: Deployments
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 85/100
Claim: Taisei Corporation’s HR team is leading the rollout of ChatGPT Enterprise to drive AI-powered talent development across the organization.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## EMEA Youth & Wellbeing Grant
Source: https://openai.com/index/emea-youth-and-wellbeing-grant
Publisher: OpenAI
Category: Vendor framing
Sector: Scientific research
Capability: Vendor platform capability signal
Score: 26/100
Claim: Apply for the EMEA Youth & Wellbeing Grant, a €500,000 program funding NGOs and researchers advancing youth safety and wellbeing in the age of AI.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## The next chapter for AI in the EU
Source: https://openai.com/index/the-next-chapter-for-ai-in-the-eu
Publisher: OpenAI
Category: Deployments
Sector: General AI capability
Capability: Education and workforce adoption
Score: 85/100
Claim: OpenAI launches the EU Economic Blueprint 2.0 with new data, partnerships, and initiatives to accelerate AI adoption, skills, and growth across Europe.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Keeping your data safe when an AI agent clicks a link
Source: https://openai.com/index/ai-agent-link-safety
Publisher: OpenAI
Category: Vendor framing
Sector: Cybersecurity
Capability: Agent platform and API infrastructure
Score: 46/100
Claim: Learn how OpenAI protects user data when AI agents open links, preventing URL-based data exfiltration and prompt injection with built-in safeguards.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## ServiceNow chooses Claude to power customer apps and increase internal productivity
Source: https://www.anthropic.com/news/servicenow-anthropic-claude
Publisher: Anthropic
Category: Deployments
Sector: Cybersecurity
Capability: Enterprise workflow automation
Score: 96/100
Claim: As enterprises move beyond experimenting with AI and start putting it into production across their core business operations, scale and security matters just as much as capabilities. With this in mind, ServiceNow, which helps large companies manage and automate everything from IT support to HR to customer service on a single platform, has chosen Claude as.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## PVH reimagines the future of fashion with OpenAI
Source: https://openai.com/index/pvh-future-of-fashion
Publisher: OpenAI
Category: Deployments
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 85/100
Claim: PVH Corp., parent company of Calvin Klein and Tommy Hilfiger, is adopting ChatGPT Enterprise to bring AI into fashion design, supply chain, and consumer engagement.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Introducing Prism
Source: https://openai.com/index/introducing-prism
Publisher: OpenAI
Category: Benchmarks
Sector: Scientific research
Capability: Frontier model release and benchmark movement
Score: 88/100
Claim: Prism is a free LaTeX-native workspace with GPT-5.2 built in, helping researchers write, collaborate, and reason in one place.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## TRUSTBANK uses AI agents to personalize Furusato Nozei gifts
Source: https://openai.com/index/trustbank
Publisher: OpenAI
Category: Deployments
Sector: Financial services
Capability: Financial workflow automation
Score: 88/100
Claim: TRUSTBANK partnered with Recursive to build Choice AI using OpenAI models, enabling personalized conversational recommendations that simplify Furusato Nozei gift discovery.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Anthropic partners with the UK Government to bring AI assistance to GOV.UK services
Source: https://www.anthropic.com/news/gov-UK-partnership
Publisher: Anthropic
Category: Labour market
Sector: Public sector
Capability: Labour-market adoption signal
Score: 72/100
Claim: Anthropic has been selected by the UK's Department for Science, Innovation and Technology (DSIT) to help build and pilot a dedicated AI-powered assistant for GOV.UK. The AI assistant will help people navigate government services and give tailored advice. The initial use case is employment: helping people find work, access training, understand the support.
Oracle verdict: This is a labour-market context signal rather than a single workflow proof point. It helps the thesis track whether adoption, education, wages, and institutional behaviour are moving in the same direction as the capability curve.
Thesis relevance: Appendix III, section five: labour-market and adoption evidence

## How Indeed uses AI to help evolve the job search
Source: https://openai.com/index/indeed-maggie-hulce
Publisher: OpenAI
Category: Deployments
Sector: General AI capability
Capability: Production AI deployment signal
Score: 78/100
Claim: Indeed’s CRO Maggie Hulce shares how AI is transforming job search, recruiting, and talent acquisition for employers and job seekers.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Unrolling the Codex agent loop
Source: https://openai.com/index/unrolling-the-codex-agent-loop
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 74/100
Claim: A technical deep dive into the Codex agent loop, explaining how Codex CLI orchestrates models, tools, prompts, and performance using the Responses API.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Scaling PostgreSQL to power 800 million ChatGPT users
Source: https://openai.com/index/scaling-postgresql
Publisher: OpenAI
Category: Vendor framing
Sector: Enterprise operations
Capability: Vendor platform capability signal
Score: 68/100
Claim: An inside look at how OpenAI scaled PostgreSQL to millions of queries per second using replicas, caching, rate limiting, and workload isolation.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Inside Praktika's conversational approach to language learning
Source: https://openai.com/index/praktika
Publisher: OpenAI
Category: Deployments
Sector: General AI capability
Capability: Frontier model release and benchmark movement
Score: 90/100
Claim: How Praktika uses GPT-4.1 and GPT-5.2 to build adaptive AI tutors that personalize lessons, track progress, and help learners achieve real-world language fluency.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Claude's new constitution
Source: https://www.anthropic.com/news/claude-new-constitution
Publisher: Anthropic
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 38/100
Claim: We’re publishing a new constitution for our AI model, Claude. It’s a detailed description of Anthropic’s vision for Claude’s values and behavior; a holistic document that explains the context in which Claude operates and the kind of entity we would like Claude to be. The constitution is a crucial part of our model training process, and its content directly.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## How Higgsfield turns simple ideas into cinematic social videos
Source: https://openai.com/index/higgsfield
Publisher: OpenAI
Category: Vendor framing
Sector: Media and content
Capability: Frontier model release and benchmark movement
Score: 76/100
Claim: Discover how Higgsfield gives creators cinematic, social-first video output from simple inputs using OpenAI GPT-4.1, GPT-5, and Sora 2.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Introducing Edu for Countries
Source: https://openai.com/index/edu-for-countries
Publisher: OpenAI
Category: Labour market
Sector: Education
Capability: Education and workforce adoption
Score: 72/100
Claim: Edu for Countries is a new OpenAI initiative helping governments use AI to modernize education systems and build future-ready workforces.
Oracle verdict: This is a labour-market context signal rather than a single workflow proof point. It helps the thesis track whether adoption, education, wages, and institutional behaviour are moving in the same direction as the capability curve.
Thesis relevance: Appendix III, section five: labour-market and adoption evidence

## How countries can end the capability overhang
Source: https://openai.com/index/how-countries-can-end-the-capability-overhang
Publisher: OpenAI
Category: Vendor framing
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 68/100
Claim: Our latest report reveals stark differences in advanced AI adoption across countries and outlines new initiatives to help nations capture productivity gains from AI.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Mariano-Florentino Cuéllar appointed to Anthropic’s Long-Term Benefit Trust
Source: https://www.anthropic.com/news/mariano-florentino-long-term-benefit-trust
Publisher: Anthropic
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 42/100
Claim: Anthropic’s Long-Term Benefit Trust announced the appointment of Mariano-Florentino (Tino) Cuéllar as a new member of the Trust. The Long-Term Benefit Trust is an independent body designed to help Anthropic achieve its public benefit mission. Cuéllar brings extensive experience in law, governance, and international affairs, including service as a Justice.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Anthropic and Teach For All launch global AI training initiative for educators
Source: https://www.anthropic.com/news/anthropic-teach-for-all
Publisher: Anthropic
Category: Vendor framing
Sector: Education
Capability: Education and workforce adoption
Score: 68/100
Claim: Anthropic is partnering with Teach For All to bring AI tools and training to educators in 63 countries. Through the AI Literacy & Creator Collective (LCC), more than 100,000 teachers and alumni across Teach For All's network—which serves more than 1.5 million students—will have the opportunity to develop AI fluency and adapt Claude to serve real classroom.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Horizon 1000: Advancing AI for primary healthcare
Source: https://openai.com/index/horizon-1000
Publisher: OpenAI
Category: Vendor framing
Sector: Healthcare and life sciences
Capability: Healthcare and life-sciences reasoning
Score: 54/100
Claim: OpenAI and the Gates Foundation launch Horizon 1000, a $50M pilot advancing AI capabilities for healthcare in Africa. The initiative aims to reach 1,000 clinics by 2028.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Stargate Community
Source: https://openai.com/index/stargate-community
Publisher: OpenAI
Category: Labour market
Sector: Enterprise operations
Capability: Education and workforce adoption
Score: 72/100
Claim: Stargate Community plans detail a community-first approach to AI infrastructure, using locally tailored plans shaped by community input, energy needs, and workforce priorities.
Oracle verdict: This is a labour-market context signal rather than a single workflow proof point. It helps the thesis track whether adoption, education, wages, and institutional behaviour are moving in the same direction as the capability curve.
Thesis relevance: Appendix III, section five: labour-market and adoption evidence

## Cisco and OpenAI redefine enterprise engineering with AI agents
Source: https://openai.com/index/cisco
Publisher: OpenAI
Category: Deployments
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 95/100
Claim: Cisco and OpenAI redefine enterprise engineering with Codex, an AI software agent embedded in workflows to speed builds, automate defect fixes, and enable AI-native development.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## ServiceNow powers actionable enterprise AI with OpenAI
Source: https://openai.com/index/servicenow-powers-actionable-enterprise-ai-with-openai
Publisher: OpenAI
Category: Deployments
Sector: Media and content
Capability: Frontier model release and benchmark movement
Score: 95/100
Claim: ServiceNow expands access to OpenAI frontier models to power AI-driven enterprise workflows, summarization, search, and voice across the ServiceNow Platform.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Our approach to age prediction
Source: https://openai.com/index/our-approach-to-age-prediction
Publisher: OpenAI
Category: Vendor framing
Sector: Cybersecurity
Capability: Vendor platform capability signal
Score: 36/100
Claim: ChatGPT is rolling out age prediction to estimate if accounts are under or over 18, applying safeguards for teens and refining accuracy over time.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## A business that scales with the value of intelligence
Source: https://openai.com/index/a-business-that-scales-with-the-value-of-intelligence
Publisher: OpenAI
Category: Vendor framing
Sector: Commerce and marketplace
Capability: Enterprise workflow automation
Score: 64/100
Claim: OpenAI’s business model scales with intelligence—spanning subscriptions, API, ads, commerce, and compute—driven by deepening ChatGPT adoption.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Our approach to advertising and expanding access to ChatGPT
Source: https://openai.com/index/our-approach-to-advertising-and-expanding-access
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 38/100
Claim: OpenAI plans to test advertising in the U.S. for ChatGPT’s free and Go tiers to expand affordable access to AI worldwide, while protecting privacy, trust, and answer quality.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Introducing ChatGPT Go, now available worldwide
Source: https://openai.com/index/introducing-chatgpt-go
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Frontier model release and benchmark movement
Score: 76/100
Claim: ChatGPT Go is now available worldwide, offering expanded access to GPT-5.2 Instant, higher usage limits, and longer memory—making advanced AI more affordable globally.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Anthropic appoints Irina Ghose as Managing Director of India ahead of Bengaluru office opening
Source: https://www.anthropic.com/news/anthropic-appoints-irina-ghose-as-managing-director-of-india
Publisher: Anthropic
Category: Deployments
Sector: Financial services
Capability: Enterprise workflow automation
Score: 63/100
Claim: Irina Ghose is joining Anthropic as Managing Director of India as we prepare to open our first office in the country. Irina brings more than three decades of experience in scaling technology businesses. She most recently served as Managing Director, Microsoft India, where she led enterprise AI adoption across major Indian industries including banking and.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Investing in Merge Labs
Source: https://openai.com/index/investing-in-merge-labs
Publisher: OpenAI
Category: Vendor framing
Sector: Customer operations
Capability: Vendor platform capability signal
Score: 64/100
Claim: OpenAI is investing in Merge Labs to support new brain computer interfaces that bridge biological and artificial intelligence to maximize human ability, agency, and experience.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Strengthening the U.S. AI supply chain through domestic manufacturing
Source: https://openai.com/index/strengthening-the-us-ai-supply-chain
Publisher: OpenAI
Category: Labour market
Sector: AI infrastructure
Capability: Education and workforce adoption
Score: 72/100
Claim: OpenAI launches a new RFP to strengthen the U.S. AI supply chain by accelerating domestic manufacturing, creating jobs, and scaling AI infrastructure.
Oracle verdict: This is a labour-market context signal rather than a single workflow proof point. It helps the thesis track whether adoption, education, wages, and institutional behaviour are moving in the same direction as the capability curve.
Thesis relevance: Appendix III, section five: labour-market and adoption evidence

## How scientists are using Claude to accelerate research and discovery
Source: https://www.anthropic.com/news/accelerating-scientific-research
Publisher: Anthropic
Category: Benchmarks
Sector: Healthcare and life sciences
Capability: Healthcare and life-sciences reasoning
Score: 76/100
Claim: Last October we launched Claude for Life Sciences—a suite of connectors and skills that made Claude a better scientific collaborator. Since then, we've invested heavily in making Claude the most capable model for scientific work , with Opus 4.5 showing significant improvements in figure interpretation, computational biology, and protein understanding.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## OpenAI partners with Cerebras
Source: https://openai.com/index/cerebras-partnership
Publisher: OpenAI
Category: Vendor framing
Sector: Enterprise operations
Capability: Vendor platform capability signal
Score: 68/100
Claim: OpenAI partners with Cerebras to add 750MW of high-speed AI compute, reducing inference latency and making ChatGPT faster for real-time AI workloads.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Zenken boosts a lean sales team with ChatGPT Enterprise
Source: https://openai.com/index/zenken
Publisher: OpenAI
Category: Deployments
Sector: Customer operations
Capability: Enterprise workflow automation
Score: 95/100
Claim: By rolling out ChatGPT Enterprise company-wide, Zenken has boosted sales performance, cut preparation time, and increased proposal success rates. AI-supported workflows are helping a lean team deliver more personalized, effective customer engagement.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Introducing Labs
Source: https://www.anthropic.com/news/introducing-anthropic-labs
Publisher: Anthropic
Category: Deployments
Sector: Enterprise operations
Capability: Agent platform and API infrastructure
Score: 85/100
Claim: Our models are evolving at a rapid clip, and each new release brings another leap in capabilities. Building product experiences around these emerging capabilities requires different motions working in partnership: tinkering and experimenting at the edge of what Claude can do, testing unpolished versions with early users to find what works, and taking what.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Advancing Claude in healthcare and the life sciences
Source: https://www.anthropic.com/news/healthcare-life-sciences
Publisher: Anthropic
Category: Benchmarks
Sector: Healthcare and life sciences
Capability: Healthcare and life-sciences reasoning
Score: 76/100
Claim: In October, we announced Claude for Life Sciences , our latest step in making Claude a productive research partner for scientists and clinicians, and in helping Claude to support those in industry bringing new scientific advancements to the public. Now, we’re expanding that feature set in two ways. First, we’re introducing Claude for Healthcare , a.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## OpenAI and SoftBank Group partner with SB Energy
Source: https://openai.com/index/stargate-sb-energy-partnership
Publisher: OpenAI
Category: Deployments
Sector: Financial services
Capability: Financial workflow automation
Score: 68/100
Claim: OpenAI and SoftBank Group partner with SB Energy to develop multi-gigawatt AI data center campuses, including a 1.2 GW Texas facility supporting the Stargate initiative.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Datadog uses Codex for system-level code review
Source: https://openai.com/index/datadog
Publisher: OpenAI
Category: Deployments
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 88/100
Claim: OpenAI and Datadog brand graphic with the OpenAI wordmark on the left, the Datadog logo on the right, and a central abstract brown fur-like texture panel on a white background.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## OpenAI for Healthcare
Source: https://openai.com/index/openai-for-healthcare
Publisher: OpenAI
Category: Deployments
Sector: Healthcare and life sciences
Capability: Enterprise workflow automation
Score: 95/100
Claim: OpenAI for Healthcare enables secure, enterprise-grade AI that supports HIPAA compliance—reducing administrative burden and supporting clinical workflows.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Netomi’s lessons for scaling agentic systems into the enterprise
Source: https://openai.com/index/netomi
Publisher: OpenAI
Category: Deployments
Sector: Enterprise operations
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: How Netomi scales enterprise AI agents using GPT-4.1 and GPT-5.2—combining concurrency, governance, and multi-step reasoning for reliable production workflows.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## How Tolan builds voice-first AI with GPT-5.1
Source: https://openai.com/index/tolan
Publisher: OpenAI
Category: Vendor framing
Sector: Media and content
Capability: Frontier model release and benchmark movement
Score: 76/100
Claim: Tolan built a voice-first AI companion with GPT-5.1, combining low-latency responses, real-time context reconstruction, and memory-driven personalities for natural conversations.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Introducing ChatGPT Health
Source: https://openai.com/index/introducing-chatgpt-health
Publisher: OpenAI
Category: Vendor framing
Sector: Healthcare and life sciences
Capability: Healthcare and life-sciences reasoning
Score: 38/100
Claim: ChatGPT Health is a dedicated experience that securely connects your health data and apps, with privacy protections and a physician-informed design.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Announcing OpenAI Grove Cohort 2
Source: https://openai.com/index/openai-grove
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Agent platform and API infrastructure
Score: 64/100
Claim: Applications are now open for OpenAI Grove Cohort 2, a 5-week founder program designed for individuals at any stage, from pre-idea to product. Participants receive $50K in API credits, early access to AI tools, and hands-on mentorship from the OpenAI team.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Continuously hardening ChatGPT Atlas against prompt injection
Source: https://openai.com/index/hardening-atlas-against-prompt-injection
Publisher: OpenAI
Category: Vendor framing
Sector: Cybersecurity
Capability: Cyber defence and misuse monitoring
Score: 74/100
Claim: OpenAI is strengthening ChatGPT Atlas against prompt injection attacks using automated red teaming trained with reinforcement learning. This proactive discover-and-patch loop helps identify novel exploits early and harden the browser agent’s defenses as AI becomes more agentic.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## One in a million: celebrating the customers shaping AI’s future
Source: https://openai.com/index/one-in-a-million-customers
Publisher: OpenAI
Category: Deployments
Sector: Enterprise operations
Capability: Agent platform and API infrastructure
Score: 89/100
Claim: More than one million customers around the world now use OpenAI to empower their teams and unlock new opportunities. This post highlights how companies like PayPal, Virgin Atlantic, BBVA, Cisco, Moderna, and Canva are transforming the way work gets done with AI.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Sharing our compliance framework for California's Transparency in Frontier AI Act
Source: https://www.anthropic.com/news/compliance-framework-SB53
Publisher: Anthropic
Category: Vendor framing
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 36/100
Claim: On January 1, California's Transparency in Frontier AI Act ( SB 53 ) will go into effect. It establishes the nation’s first frontier AI safety and transparency requirements for catastrophic risks. While we have long advocated for a federal framework, Anthropic endorsed SB 53 because we believe frontier AI developers like ourselves should be transparent.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Evaluating chain-of-thought monitorability
Source: https://openai.com/index/evaluating-chain-of-thought-monitorability
Publisher: OpenAI
Category: Vendor framing
Sector: Enterprise operations
Capability: Vendor platform capability signal
Score: 48/100
Claim: OpenAI introduces a new framework and evaluation suite for chain-of-thought monitorability, covering 13 evaluations across 24 environments. Our findings show that monitoring a model’s internal reasoning is far more effective than monitoring outputs alone, offering a promising path toward scalable control as AI systems grow more capable.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Deepening our collaboration with the U.S. Department of Energy
Source: https://openai.com/index/us-department-of-energy-collaboration
Publisher: OpenAI
Category: Benchmarks
Sector: Public sector
Capability: Model and benchmark capability movement
Score: 76/100
Claim: OpenAI and the U.S. Department of Energy have signed a memorandum of understanding to deepen collaboration on AI and advanced computing in support of scientific discovery. The agreement builds on ongoing work with national laboratories and helps establish a framework for applying AI to high-impact research across the DOE ecosystem.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Updating our Model Spec with teen protections
Source: https://openai.com/index/updating-model-spec-with-teen-protections
Publisher: OpenAI
Category: Vendor framing
Sector: Customer operations
Capability: Vendor platform capability signal
Score: 26/100
Claim: OpenAI is updating its Model Spec with new Under-18 Principles that define how ChatGPT should support teens with safe, age-appropriate guidance grounded in developmental science. The update strengthens guardrails, clarifies expected model behavior in higher-risk situations, and builds on our broader work to improve teen safety across ChatGPT.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## AI literacy resources for teens and parents
Source: https://openai.com/index/ai-literacy-resources-for-teens-and-parents
Publisher: OpenAI
Category: Vendor framing
Sector: Healthcare and life sciences
Capability: Healthcare and life-sciences reasoning
Score: 64/100
Claim: OpenAI shares new AI literacy resources to help teens and parents use ChatGPT thoughtfully, safely, and with confidence. The guides include expert-vetted tips for responsible use, critical thinking, healthy boundaries, and supporting teens through emotional or sensitive topics.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Addendum to GPT-5.2 System Card: GPT-5.2-Codex
Source: https://openai.com/index/gpt-5-2-codex-system-card
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 58/100
Claim: Official OpenAI release: Addendum to GPT-5.2 System Card: GPT-5.2-Codex.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Introducing GPT-5.2-Codex
Source: https://openai.com/index/introducing-gpt-5-2-codex
Publisher: OpenAI
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: GPT-5.2-Codex is OpenAI’s most advanced coding model, offering long-horizon reasoning, large-scale code transformations, and enhanced cybersecurity capabilities.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Protecting the wellbeing of our users
Source: https://www.anthropic.com/news/protecting-well-being-of-users
Publisher: Anthropic
Category: Vendor framing
Sector: Cybersecurity
Capability: Vendor platform capability signal
Score: 26/100
Claim: People use AI for a wide variety of reasons, and for some that may include emotional support. Our Safeguards team leads our efforts to ensure that Claude handles these conversations appropriately—responding with empathy, being honest about its limitations as an AI, and being considerate of our users' wellbeing. When chatbots handle these questions without.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Working with the US Department of Energy to unlock the next era of scientific discovery
Source: https://www.anthropic.com/news/genesis-mission-partnership
Publisher: Anthropic
Category: Benchmarks
Sector: Healthcare and life sciences
Capability: Enterprise workflow automation
Score: 87/100
Claim: Anthropic and the US Department of Energy (DOE) are announcing a multi-year partnership as part of the Genesis Mission— the Department’s initiative to use AI to cement America’s leadership in science. Our partnership focuses on three domains—American energy dominance, the biological and life sciences, and scientific productivity—and has the potential to.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Introducing OpenAI Academy for News Organizations
Source: https://openai.com/index/openai-academy-for-news-organizations
Publisher: OpenAI
Category: Vendor framing
Sector: Customer operations
Capability: Vendor platform capability signal
Score: 64/100
Claim: OpenAI is launching the OpenAI Academy for News Organizations, a new learning hub built with the American Journalism Project and The Lenfest Institute to help newsrooms use AI effectively. The Academy offers training, practical use cases, and responsible-use guidance to support journalists, editors, and publishers as they adopt AI in their reporting and.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Developers can now submit apps to ChatGPT
Source: https://openai.com/index/developers-can-now-submit-apps-to-chatgpt
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Agent platform and API infrastructure
Score: 64/100
Claim: Developers can now submit apps for review and publication in ChatGPT, with approved apps appearing in a new in-product directory for easy discovery. Updated tools, guidelines, and the Apps SDK help developers build powerful chat-native experiences that bring real-world actions into ChatGPT.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Evaluating AI’s ability to perform scientific research tasks
Source: https://openai.com/index/frontierscience
Publisher: OpenAI
Category: Benchmarks
Sector: Scientific research
Capability: Frontier model release and benchmark movement
Score: 76/100
Claim: OpenAI introduces FrontierScience, a benchmark testing AI reasoning in physics, chemistry, and biology to measure progress toward real scientific research.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Measuring AI’s capability to accelerate biological research
Source: https://openai.com/index/accelerating-biological-research-in-the-wet-lab
Publisher: OpenAI
Category: Benchmarks
Sector: Scientific research
Capability: Frontier model release and benchmark movement
Score: 88/100
Claim: OpenAI introduces a real-world evaluation framework to measure how AI can accelerate biological research in the wet lab. Using GPT-5 to optimize a molecular cloning protocol, the work explores both the promise and risks of AI-assisted experimentation.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## The new ChatGPT Images is here
Source: https://openai.com/index/new-chatgpt-images-is-here
Publisher: OpenAI
Category: Deployments
Sector: Media and content
Capability: Multimodal content generation and media workflows
Score: 82/100
Claim: The new ChatGPT Images is powered by our flagship image generation model, delivering more precise edits, consistent details, and image generation up to 4× faster. The upgraded model is rolling out to all ChatGPT users today and is also available in the API as GPT-Image-1.5.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## BBVA and OpenAI collaborate to transform global banking
Source: https://openai.com/index/bbva-collaboration-expansion
Publisher: OpenAI
Category: Deployments
Sector: Financial services
Capability: Enterprise workflow automation
Score: 85/100
Claim: BBVA is expanding its work with OpenAI through a multi-year AI transformation program, rolling out ChatGPT Enterprise to all 120,000 employees. Together, the companies will develop AI solutions that enhance customer interactions, streamline operations, and help build an AI-native banking experience.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## BNY builds “AI for everyone, everywhere” with OpenAI
Source: https://openai.com/index/bny
Publisher: OpenAI
Category: Deployments
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 95/100
Claim: BNY uses OpenAI to expand AI adoption enterprise-wide through Eliza, where 20,000+ employees build AI agents that improve efficiency and client outcomes.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## How We Used Codex to Ship Sora for Android in 28 Days
Source: https://openai.com/index/shipping-sora-for-android-with-codex
Publisher: OpenAI
Category: Deployments
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 88/100
Claim: OpenAI shipped Sora for Android in 28 days using Codex. AI-assisted planning, translation, and parallel coding workflows helped a nimble team deliver rapid, reliable development.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Advancing science and math with GPT-5.2
Source: https://openai.com/index/gpt-5-2-for-science-and-math
Publisher: OpenAI
Category: Benchmarks
Sector: Scientific research
Capability: Frontier model release and benchmark movement
Score: 88/100
Claim: GPT-5.2 is OpenAI’s strongest model yet for math and science, setting new state-of-the-art results on benchmarks like GPQA Diamond and FrontierMath. This post shows how those gains translate into real research progress, including solving an open theoretical problem and generating reliable mathematical proofs.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## How Podium is arming 10,000+ SMBs with AI agents
Source: https://openai.com/index/podium
Publisher: OpenAI
Category: Deployments
Sector: Enterprise operations
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: Discover how Podium used OpenAI’s GPT-5 to build “Jerry,” an AI teammate driving 300% growth and transforming how Main Street businesses serve customers.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## The Walt Disney Company and OpenAI reach landmark agreement to bring beloved characters to Sora
Source: https://openai.com/index/disney-sora-agreement
Publisher: OpenAI
Category: Vendor framing
Sector: Media and content
Capability: Enterprise workflow automation
Score: 55/100
Claim: Disney and OpenAI have reached an agreement to bring more than 200 Disney, Marvel, Pixar and Star Wars characters to Sora for fan-inspired short videos. The agreement emphasizes responsible AI in entertainment and includes Disney’s company-wide use of ChatGPT Enterprise and the OpenAI API.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Introducing GPT-5.2
Source: https://openai.com/index/introducing-gpt-5-2
Publisher: OpenAI
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: GPT-5.2 is our most advanced frontier model for everyday professional work, with state-of-the-art reasoning, long-context understanding, coding, and vision. Use it in ChatGPT and the OpenAI API to power faster, more reliable agentic workflows.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Update to GPT-5 System Card: GPT-5.2
Source: https://openai.com/index/gpt-5-system-card-update-gpt-5-2
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Frontier model release and benchmark movement
Score: 48/100
Claim: GPT-5.2 is the latest model family in the GPT-5 series. The comprehensive safety mitigation approach for these models is largely the same as that described in the GPT-5 System Card and GPT-5.1 System Card. Like OpenAI’s other models, the GPT-5.2 models were trained on diverse datasets, including information that is publicly available on the internet.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Ten years
Source: https://openai.com/index/ten-years
Publisher: OpenAI
Category: Benchmarks
Sector: Scientific research
Capability: Model and benchmark capability movement
Score: 76/100
Claim: OpenAI reflects on ten years of progress, from early research breakthroughs to widely used AI systems that reshaped what’s possible. We share lessons from the past decade and why we remain optimistic about building AGI that benefits all of humanity.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Strengthening cyber resilience as AI capabilities advance
Source: https://openai.com/index/strengthening-cyber-resilience
Publisher: OpenAI
Category: Vendor framing
Sector: Cybersecurity
Capability: Cyber defence and misuse monitoring
Score: 36/100
Claim: OpenAI is investing in stronger safeguards and defensive capabilities as AI models become more powerful in cybersecurity. We explain how we assess risk, limit misuse, and work with the security community to strengthen cyber resilience.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## How Scout24 is building the next generation of real-estate search with AI
Source: https://openai.com/index/scout24
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Frontier model release and benchmark movement
Score: 76/100
Claim: Scout24 has created a GPT-5 powered conversational assistant that reimagines real-estate search, guiding users with clarifying questions, summaries, and tailored listing recommendations.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## OpenAI co-founds Agentic AI Foundation, donates AGENTS.md
Source: https://openai.com/index/agentic-ai-foundation
Publisher: OpenAI
Category: Vendor framing
Sector: Customer operations
Capability: Agent platform and API infrastructure
Score: 64/100
Claim: OpenAI co-founds the Agentic AI Foundation under the Linux Foundation and donates AGENTS.md to support open, interoperable standards for safe agentic AI.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Launching our first OpenAI Certifications courses
Source: https://openai.com/index/openai-certificate-courses
Publisher: OpenAI
Category: Vendor framing
Sector: Enterprise operations
Capability: Vendor platform capability signal
Score: 54/100
Claim: Learn how OpenAI’s new certifications and AI Foundations courses help people build real-world AI skills, boost career opportunities, and prepare for the future of work.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Bringing powerful AI to millions across Europe with Deutsche Telekom
Source: https://openai.com/index/deutsche-telekom-collaboration
Publisher: OpenAI
Category: Deployments
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 96/100
Claim: OpenAI is collaborating with Deutsche Telekom to bring advanced, multilingual AI experiences to millions of people across Europe. ChatGPT Enterprise will also be deployed to help employees at Deutsche Telekom improve workflows and accelerate innovation.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Commonwealth Bank of Australia builds AI fluency at scale
Source: https://openai.com/index/commonwealth-bank-of-australia
Publisher: OpenAI
Category: Deployments
Sector: Financial services
Capability: Enterprise workflow automation
Score: 85/100
Claim: Commonwealth Bank of Australia partners with OpenAI to roll out ChatGPT Enterprise to 50,000 employees, building AI fluency at scale to improve customer service and fraud response.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## OpenAI appoints Denise Dresser as Chief Revenue Officer
Source: https://openai.com/index/openai-appoints-denise-dresser
Publisher: OpenAI
Category: Deployments
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 63/100
Claim: Denise Dresser is joining as Chief Revenue Officer, overseeing OpenAI’s global revenue strategy across enterprise and customer success. She will help more businesses put AI to work in their day-to-day operations as OpenAI continues to scale.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Donating the Model Context Protocol and establishing the Agentic AI Foundation
Source: https://www.anthropic.com/news/donating-the-model-context-protocol-and-establishing-of-the-agentic-ai-foundation
Publisher: Anthropic
Category: Vendor framing
Sector: Customer operations
Capability: Agent platform and API infrastructure
Score: 64/100
Claim: Today, we’re donating the Model Context Protocol (MCP) to the Agentic AI Foundation (AAIF), a directed fund under the Linux Foundation , co-founded by Anthropic, Block and OpenAI, with support from Google, Microsoft, Amazon Web Services (AWS), Cloudflare, and Bloomberg. One year ago, we introduced MCP as a universal, open standard for connecting AI.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Accenture and Anthropic launch multi-year partnership to move enterprises from AI pilots to production
Source: https://www.anthropic.com/news/anthropic-accenture-partnership
Publisher: Anthropic
Category: Deployments
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 85/100
Claim: Anthropic and Accenture today announced a major expansion of their partnership to help enterprises move from AI pilots to full-scale deployment. Key elements of the announcement: The announcement comes as Anthropic's enterprise market share has grown from 24% to 40%*.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Instacart and OpenAI partner on AI shopping experiences
Source: https://openai.com/index/instacart-partnership
Publisher: OpenAI
Category: Deployments
Sector: Financial services
Capability: Financial workflow automation
Score: 85/100
Claim: OpenAI and Instacart are deepening their longstanding partnership by bringing the first fully integrated grocery shopping and Instant Checkout payment app to ChatGPT.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## The state of enterprise AI
Source: https://openai.com/index/the-state-of-enterprise-ai-2025-report
Publisher: OpenAI
Category: Deployments
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 89/100
Claim: Key findings from OpenAI’s enterprise data show accelerating AI adoption, deeper integration, and measurable productivity gains across industries in 2025.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## How Virgin Atlantic uses AI to enhance every step of travel
Source: https://openai.com/index/virgin-atlantic-oliver-byers
Publisher: OpenAI
Category: Deployments
Sector: Financial services
Capability: Financial workflow automation
Score: 85/100
Claim: Virgin Atlantic CFO Oliver Byers shares how the airline is using AI to speed up development, improve decision-making, and elevate customer experience.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## OpenAI to acquire Neptune
Source: https://openai.com/index/openai-to-acquire-neptune
Publisher: OpenAI
Category: Vendor framing
Sector: Scientific research
Capability: Vendor platform capability signal
Score: 48/100
Claim: OpenAI is acquiring Neptune to deepen visibility into model behavior and strengthen the tools researchers use to track experiments and monitor training.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## How confessions can keep language models honest
Source: https://openai.com/index/how-confessions-can-keep-language-models-honest
Publisher: OpenAI
Category: Vendor framing
Sector: Scientific research
Capability: Vendor platform capability signal
Score: 48/100
Claim: OpenAI researchers are testing “confessions,” a method that trains models to admit when they make mistakes or act undesirably, helping improve AI honesty, transparency, and trust in model outputs.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Announcing the initial People-First AI Fund grantees
Source: https://openai.com/index/people-first-ai-fund-grantees
Publisher: OpenAI
Category: Vendor framing
Sector: Customer operations
Capability: Vendor platform capability signal
Score: 54/100
Claim: The OpenAI Foundation announces the initial recipients of the People-First AI Fund, awarding $40.5M in unrestricted grants to 208 nonprofits supporting community innovation and opportunity.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Snowflake and Anthropic announce $200 million partnership to bring agentic AI to global enterprises
Source: https://www.anthropic.com/news/snowflake-anthropic-expanded-partnership
Publisher: Anthropic
Category: Deployments
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 96/100
Claim: Today, we announce a significant expansion of our strategic partnership with Snowflake. The multi-year, $200 million agreement will not only make Anthropic’s Claude models available in the Snowflake platform to more than 12,600 global customers across Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Azure, but also establishes a joint go-to-market.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Anthropic acquires Bun as Claude Code reaches $1B milestone
Source: https://www.anthropic.com/news/anthropic-acquires-bun-as-claude-code-reaches-usd1b-milestone
Publisher: Anthropic
Category: Deployments
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 96/100
Claim: Claude is the world’s smartest and most capable AI model for developers, startups, and enterprises. Claude Code represents a new era of agentic coding, fundamentally changing how teams build software. In November, Claude Code achieved a significant milestone: just six months after becoming available to the public, it reached $1 billion in run-rate revenue.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Claude for Nonprofits
Source: https://www.anthropic.com/news/claude-for-nonprofits
Publisher: Anthropic
Category: Deployments
Sector: General AI capability
Capability: Production AI deployment signal
Score: 75/100
Claim: Nonprofits tackle some of society’s most difficult problems, often with limited resources. In partnership with the global generosity movement GivingTuesday , we’re launching Claude for Nonprofits to help organizations across the world maximize their impact. Many nonprofits already use Claude to meet their goals. The Epilepsy Foundation is providing 24/7.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Inside Mirakl's agentic commerce vision
Source: https://openai.com/index/mirakl
Publisher: OpenAI
Category: Deployments
Sector: Customer operations
Capability: Enterprise workflow automation
Score: 96/100
Claim: Mirakl is redefining commerce through AI agents and ChatGPT Enterprise—achieving faster documentation, smarter customer support, and building toward agent-native commerce with Mirakl Nexus.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Funding grants for new research into AI and mental health
Source: https://openai.com/index/ai-mental-health-research-grants
Publisher: OpenAI
Category: Vendor framing
Sector: Healthcare and life sciences
Capability: Healthcare and life-sciences reasoning
Score: 30/100
Claim: OpenAI is awarding up to $2 million in grants for research at the intersection of AI and mental health. The program supports projects that study real-world risks, benefits, and applications to improve safety and well-being.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## OpenAI and NORAD team up to bring new magic to “NORAD Tracks Santa”
Source: https://openai.com/index/norad-holiday-collaboration
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 64/100
Claim: OpenAI and NORAD are bringing new magic to “NORAD Tracks Santa” with three ChatGPT holiday tools that let families create festive elves, toy coloring pages, and custom Christmas stories.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## OpenAI takes an ownership stake in Thrive Holdings to accelerate enterprise AI adoption
Source: https://openai.com/index/thrive-holdings
Publisher: OpenAI
Category: Benchmarks
Sector: Scientific research
Capability: Frontier model release and benchmark movement
Score: 83/100
Claim: OpenAI takes an ownership stake in Thrive Holdings to accelerate enterprise AI adoption, embedding frontier research and engineering directly into accounting and IT services to boost speed, accuracy, and efficiency while creating a scalable model for industry-wide transformation.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Accenture and OpenAI accelerate enterprise AI success
Source: https://openai.com/index/accenture-partnership
Publisher: OpenAI
Category: Deployments
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 95/100
Claim: Accenture and OpenAI are collaborating to help enterprises bring agentic AI capabilities into the core of their business and unlock new levels of growth.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Mixpanel security incident: what OpenAI users need to know
Source: https://openai.com/index/mixpanel-incident
Publisher: OpenAI
Category: Vendor framing
Sector: Financial services
Capability: Financial workflow automation
Score: 64/100
Claim: OpenAI shares details about a Mixpanel security incident involving limited API analytics data. No API content, credentials, or payment details were exposed. Learn what happened and how we’re protecting users.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Expanding data residency access to business customers worldwide
Source: https://openai.com/index/expanding-data-residency-access-to-business-customers-worldwide
Publisher: OpenAI
Category: Deployments
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 85/100
Claim: OpenAI expands data residency for ChatGPT Enterprise, ChatGPT Edu, and the API Platform, enabling eligible customers to store data at rest in-region.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Our approach to mental health-related litigation
Source: https://openai.com/index/mental-health-litigation-approach
Publisher: OpenAI
Category: Vendor framing
Sector: Healthcare and life sciences
Capability: Healthcare and life-sciences reasoning
Score: 36/100
Claim: We’re sharing our approach to mental health-related litigation. O handle sensitive cases with care, transparency, and respect while continuing to strengthen safety and support in ChatGPT.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Inside JetBrains—the company reshaping how the world writes code
Source: https://openai.com/index/jetbrains-2025
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 80/100
Claim: JetBrains is integrating GPT-5 across its coding tools, helping millions of developers design, reason, and build software faster.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Introducing shopping research in ChatGPT
Source: https://openai.com/index/chatgpt-shopping-research
Publisher: OpenAI
Category: Benchmarks
Sector: Commerce and marketplace
Capability: Model and benchmark capability movement
Score: 76/100
Claim: Shopping research in ChatGPT helps you explore, compare, and discover products with personalized buyer’s guides that simplify decision-making.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## GPT-5 and the future of mathematical discovery
Source: https://openai.com/index/gpt-5-mathematical-discovery
Publisher: OpenAI
Category: Benchmarks
Sector: Scientific research
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: UCLA Professor Ernest Ryu and GPT-5 solved a key question in optimization theory, showcasing AI’s role in accelerating mathematical discovery.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Introducing Claude Opus 4.5
Source: https://www.anthropic.com/news/claude-opus-4-5
Publisher: Anthropic
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: Our newest model, Claude Opus 4.5, is available today. It’s intelligent, efficient, and the best model in the world for coding, agents, and computer use. It’s also meaningfully better at everyday tasks like deep research and working with slides and spreadsheets. Opus 4.5 is a step forward in what AI systems can do, and a preview of larger changes to how.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## OpenAI and Foxconn collaborate to strengthen U.S. manufacturing across the AI supply chain
Source: https://openai.com/index/openai-and-foxconn-collaborate
Publisher: OpenAI
Category: Deployments
Sector: AI infrastructure
Capability: Production AI deployment signal
Score: 85/100
Claim: OpenAI and Foxconn are collaborating to design and manufacture next-generation AI infrastructure hardware in the U.S. The partnership will develop multiple generations of data-center systems, strengthen U.S. supply chains, and build key components domestically to accelerate advanced AI infrastructure.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Helping 1,000 small businesses build with AI
Source: https://openai.com/index/small-business-ai-jam
Publisher: OpenAI
Category: Vendor framing
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 64/100
Claim: OpenAI is partnering with DoorDash, SCORE, and local organizations to help 1,000 small businesses build with AI. The Small Business AI Jam gives Main Street business owners hands-on tools and training to compete and grow.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Early experiments in accelerating science with GPT-5
Source: https://openai.com/index/accelerating-science-gpt-5
Publisher: OpenAI
Category: Benchmarks
Sector: Scientific research
Capability: Frontier model release and benchmark movement
Score: 88/100
Claim: OpenAI introduces the first research cases showing how GPT-5 accelerates scientific progress across math, physics, biology, and computer science. Explore how AI and researchers collaborate to generate proofs, uncover new insights, and reshape the pace of discovery.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Strengthening our safety ecosystem with external testing
Source: https://openai.com/index/strengthening-safety-with-external-testing
Publisher: OpenAI
Category: Vendor framing
Sector: Cybersecurity
Capability: Frontier model release and benchmark movement
Score: 36/100
Claim: OpenAI works with independent experts to evaluate frontier AI systems. Third-party testing strengthens safety, validates safeguards, and increases transparency in how we assess model capabilities and risks.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## How evals drive the next chapter in AI for businesses
Source: https://openai.com/index/evals-drive-next-chapter-of-ai
Publisher: OpenAI
Category: Benchmarks
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 80/100
Claim: Learn how evals help businesses define, measure, and improve AI performance—reducing risk, boosting productivity, and driving strategic advantage.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## OpenAI and Target team up on new AI-powered experiences
Source: https://openai.com/index/target-partnership
Publisher: OpenAI
Category: Deployments
Sector: Commerce and marketplace
Capability: Enterprise workflow automation
Score: 89/100
Claim: OpenAI and Target are partnering to bring a new Target app to ChatGPT, offering personalized shopping and faster checkout. Target will also expand its use of ChatGPT Enterprise to boost productivity and guest experiences.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## How Scania accelerates work with AI across its global workforce
Source: https://openai.com/index/scania
Publisher: OpenAI
Category: Labour market
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 61/100
Claim: Global manufacturer Scania is scaling AI with ChatGPT Enterprise. With team-based onboarding and strong guardrails, AI is boosting productivity, quality, and innovation.
Oracle verdict: This is a labour-market context signal rather than a single workflow proof point. It helps the thesis track whether adoption, education, wages, and institutional behaviour are moving in the same direction as the capability curve.
Thesis relevance: Appendix III, section five: labour-market and adoption evidence

## Building more with GPT-5.1-Codex-Max
Source: https://openai.com/index/gpt-5-1-codex-max
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 90/100
Claim: Introducing GPT-5.1-Codex-Max, a faster, more intelligent agentic coding model for Codex. The model is designed for long-running, project-scale work with enhanced reasoning and token efficiency.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## GPT-5.1-Codex-Max System Card
Source: https://openai.com/index/gpt-5-1-codex-max-system-card
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 58/100
Claim: This system card outlines the comprehensive safety measures implemented for GPT‑5.1-CodexMax. It details both model-level mitigations, such as specialized safety training for harmful tasks and prompt injections, and product-level mitigations like agent sandboxing and configurable network access.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## A free version of ChatGPT built for teachers
Source: https://openai.com/index/chatgpt-for-teachers
Publisher: OpenAI
Category: Vendor framing
Sector: Education
Capability: Education and workforce adoption
Score: 38/100
Claim: ChatGPT for Teachers is a secure workspace with education‑grade privacy and admin controls. Free for verified U.S. K–12 educators through June 2027.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Intuit and OpenAI join forces on new AI-powered experiences
Source: https://openai.com/index/intuit-partnership
Publisher: OpenAI
Category: Deployments
Sector: Financial services
Capability: Frontier model release and benchmark movement
Score: 85/100
Claim: OpenAI and Intuit have entered a $100M+ multi-year partnership to launch Intuit app experiences in ChatGPT and expand Intuit’s use of OpenAI’s frontier models to power personalized financial tools.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Anthropic partners with Rwandan Government and ALX to bring AI education to hundreds of thousands of learners across Africa
Source: https://www.anthropic.com/news/rwandan-government-partnership-ai-education
Publisher: Anthropic
Category: Deployments
Sector: Education
Capability: Education and workforce adoption
Score: 85/100
Claim: Anthropic is announcing a new partnership with the Government of Rwanda and African tech training provider ALX to bring Chidi—a learning companion built on Claude—to hundreds of thousands of learners across Africa. Rwanda's ICT & Innovation and Education ministries are deploying Chidi within their national education system, while ALX will bring the tool to.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Microsoft, NVIDIA, and Anthropic announce strategic partnerships
Source: https://www.anthropic.com/news/microsoft-nvidia-anthropic-announce-strategic-partnerships
Publisher: Anthropic
Category: Deployments
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 89/100
Claim: Today Microsoft, NVIDIA, and Anthropic announced new strategic partnerships. Anthropic is scaling its rapidly-growing Claude AI model on Microsoft Azure, powered by NVIDIA, which will broaden access to Claude and provide Azure enterprise customers with expanded model choice and new capabilities. Anthropic has committed to purchase $30 billion of Azure.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Claude now available in Microsoft Foundry and Microsoft 365 Copilot
Source: https://www.anthropic.com/news/claude-in-microsoft-foundry
Publisher: Anthropic
Category: Deployments
Sector: Enterprise operations
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: Today we announced that Microsoft and Anthropic are expanding our partnership . As part of the partnership, Claude Sonnet 4.5, Haiku 4.5, and Opus 4.1 models are now available in public preview in Microsoft Foundry, where Azure customers can build production applications and enterprise agents. This enables companies to build with Claude, the world's best.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## OpenAI named Emerging Leader in Generative AI
Source: https://openai.com/index/gartner-2025-emerging-leader
Publisher: OpenAI
Category: Deployments
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 89/100
Claim: OpenAI has been named an Emerging Leader in Gartner’s 2025 Innovation Guide for Generative AI Model Providers. The recognition reflects our enterprise momentum, with over 1 million companies building with ChatGPT.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Introducing OpenAI for Ireland
Source: https://openai.com/index/openai-for-ireland
Publisher: OpenAI
Category: Vendor framing
Sector: Public sector
Capability: Enterprise workflow automation
Score: 68/100
Claim: OpenAI launches OpenAI for Ireland, partnering with the Irish Government, Dogpatch Labs and Patch to help SMEs, founders and young builders use AI to innovate, boost productivity and build the next generation of Irish tech startups.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Understanding neural networks through sparse circuits
Source: https://openai.com/index/understanding-neural-networks-through-sparse-circuits
Publisher: OpenAI
Category: Vendor framing
Sector: Customer operations
Capability: Vendor platform capability signal
Score: 64/100
Claim: OpenAI is exploring mechanistic interpretability to understand how neural networks reason. Our new sparse model approach could make AI systems more transparent and support safer, more reliable behavior.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Introducing GPT-5.1 for developers
Source: https://openai.com/index/gpt-5-1-for-developers
Publisher: OpenAI
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: GPT-5.1 is now available in the API, bringing faster adaptive reasoning, extended prompt caching, improved coding performance, and new apply_patch and shell tools.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## How Philips is scaling AI literacy across 70,000 employees
Source: https://openai.com/index/philips
Publisher: OpenAI
Category: Deployments
Sector: Healthcare and life sciences
Capability: Enterprise workflow automation
Score: 85/100
Claim: Philips is scaling AI literacy with ChatGPT Enterprise, training 70,000 employees to use AI responsibly and improve healthcare outcomes worldwide.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Introducing group chats in ChatGPT
Source: https://openai.com/index/group-chats-in-chatgpt
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 64/100
Claim: Collaborate with others, and ChatGPT, in the same conversation.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Measuring political bias in Claude
Source: https://www.anthropic.com/news/political-even-handedness
Publisher: Anthropic
Category: Benchmarks
Sector: General AI capability
Capability: Model and benchmark capability movement
Score: 86/100
Claim: We want Claude to be seen as fair and trustworthy by people across the political spectrum, and to be unbiased and even-handed in its approach to political topics. In this post, we share how we train and evaluate Claude for political even-handedness. We also report the results of a new, automated, open-source evaluation for political neutrality that we’ve.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## The state of Maryland partners with Anthropic to better serve residents
Source: https://www.anthropic.com/news/maryland-partnership
Publisher: Anthropic
Category: Deployments
Sector: Public sector
Capability: Enterprise workflow automation
Score: 89/100
Claim: The state of Maryland has announced it will use Anthropic's advanced AI models to improve government operations and better serve its more than six million residents. Under the new partnership, the state will deploy Claude across multiple state agencies to address several priorities: The partnership builds on Maryland’s existing use of Claude to improve its.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Disrupting the first reported AI-orchestrated cyber espionage campaign
Source: https://www.anthropic.com/news/disrupting-AI-espionage
Publisher: Anthropic
Category: Benchmarks
Sector: Cybersecurity
Capability: Enterprise workflow automation
Score: 76/100
Claim: We recently argued that an inflection point had been reached in cybersecurity: a point at which AI models had become genuinely useful for cybersecurity operations, both for good and for ill. This was based on systematic evaluations showing cyber capabilities doubling in six months; we’d also been tracking real-world cyberattacks, observing how malicious.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Neuro drives national retail wins with ChatGPT Business
Source: https://openai.com/index/neurogum
Publisher: OpenAI
Category: Deployments
Sector: Commerce and marketplace
Capability: Enterprise workflow automation
Score: 82/100
Claim: Neuro uses ChatGPT Business to scale nationwide with fewer than 70 employees, saving time, reducing costs, and turning faster execution across sales and operations into growth.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Fighting the New York Times’ invasion of user privacy
Source: https://openai.com/index/fighting-nyt-user-privacy-invasion
Publisher: OpenAI
Category: Vendor framing
Sector: Cybersecurity
Capability: Cyber defence and misuse monitoring
Score: 42/100
Claim: OpenAI is fighting the New York Times’ demand for 20 million private ChatGPT conversations and accelerating new security and privacy protections to protect your data.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## GPT-5.1: A smarter, more conversational ChatGPT
Source: https://openai.com/index/gpt-5-1
Publisher: OpenAI
Category: Benchmarks
Sector: General AI capability
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: We’re upgrading the GPT-5 series with warmer, more capable models and new ways to customize ChatGPT’s tone and style. GPT-5.1 starts rolling out today to paid users.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## GPT-5.1 Instant and GPT-5.1 Thinking System Card Addendum
Source: https://openai.com/index/gpt-5-system-card-addendum-gpt-5-1
Publisher: OpenAI
Category: Vendor framing
Sector: Healthcare and life sciences
Capability: Frontier model release and benchmark movement
Score: 48/100
Claim: This GPT-5 system card addendum provides updated safety metrics for GPT-5.1 Instant and Thinking, including new evaluations for mental health and emotional reliance.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Anthropic invests $50 billion in American AI infrastructure
Source: https://www.anthropic.com/news/anthropic-invests-50-billion-in-american-ai-infrastructure
Publisher: Anthropic
Category: Benchmarks
Sector: Scientific research
Capability: Frontier model release and benchmark movement
Score: 80/100
Claim: Today, we are announcing a $50 billion investment in American computing infrastructure, building data centers with Fluidstack in Texas and New York, with more sites to come. These facilities are custom built for Anthropic with a focus on maximizing efficiency for our workloads, enabling continued research and development at the frontier. The project will.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Free ChatGPT for transitioning U.S. servicemembers and veterans
Source: https://openai.com/index/chatgpt-for-veterans
Publisher: OpenAI
Category: Vendor framing
Sector: Education
Capability: Education and workforce adoption
Score: 64/100
Claim: OpenAI is offering U.S. servicemembers and veterans within 12 months of retirement or separation a free year of ChatGPT Plus to support their transition to civilian life. The tools can help with resumes, interviews, education, and planning for what’s next.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Understanding prompt injections: a frontier security challenge
Source: https://openai.com/index/prompt-injections
Publisher: OpenAI
Category: Vendor framing
Sector: Cybersecurity
Capability: Frontier model release and benchmark movement
Score: 36/100
Claim: Prompt injections are a frontier security challenge for AI systems. Learn how these attacks work and how OpenAI is advancing research, training models, and building safeguards for users.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Notion’s GPT‑5 rebuild unlocks autonomous AI workflows
Source: https://openai.com/index/notion
Publisher: OpenAI
Category: Deployments
Sector: Enterprise operations
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: Notion rebuilt its AI architecture with GPT-5 to create agents that reason, act, and adapt across workflows, unlocking faster and more flexible productivity in Notion 3.0.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## New offices in Paris and Munich expand Anthropic’s European presence
Source: https://www.anthropic.com/news/new-offices-in-paris-and-munich-expand-european-presence
Publisher: Anthropic
Category: Vendor framing
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 42/100
Claim: Today, we're announcing plans to open offices in Paris and Munich as our global operations expand across Europe. These new hubs follow recent office openings in Tokyo , Seoul , and Bengaluru and will further grow our European footprint alongside our offices in London, Dublin, and Zurich. They’re the latest example of Anthropic’s extraordinary momentum in.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## AI progress and recommendations
Source: https://openai.com/index/ai-progress-and-recommendations
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 36/100
Claim: AI is advancing fast. We have the chance to shape its progress—toward discovery, safety, and a better future for everyone.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Introducing the Teen Safety Blueprint
Source: https://openai.com/index/introducing-the-teen-safety-blueprint
Publisher: OpenAI
Category: Vendor framing
Sector: Cybersecurity
Capability: Vendor platform capability signal
Score: 36/100
Claim: Discover OpenAI’s Teen Safety Blueprint—a roadmap for building AI responsibly with safeguards, age-appropriate design, and collaboration to protect and empower young people online.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## How CRED is tapping AI to deliver premium customer experiences
Source: https://openai.com/index/cred-swamy-seetharaman
Publisher: OpenAI
Category: Vendor framing
Sector: Customer operations
Capability: Vendor platform capability signal
Score: 71/100
Claim: CRED is improving premium customer experiences in India with OpenAI, using GPT-powered tools to boost support accuracy, cut response times, and raise customer satisfaction.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## How Chime is redefining marketing through AI
Source: https://openai.com/index/chime-vineet-mehra
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Agent platform and API infrastructure
Score: 74/100
Claim: Chime CMO Vineet Mehra shares how AI is reshaping marketing into an agent-driven model and why leaders who prioritize AI literacy and thoughtful adoption will drive growth.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## 1 million business customers putting AI to work
Source: https://openai.com/index/1-million-businesses-putting-ai-to-work
Publisher: OpenAI
Category: Benchmarks
Sector: Financial services
Capability: Enterprise workflow automation
Score: 87/100
Claim: More than 1 million business customers around the world now use OpenAI. Across healthcare, life sciences, financial services, and more, ChatGPT and our APIs are driving a new era of intelligent, AI-powered work.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Launching the Anthropic Economic Futures Programme in the UK and Europe
Source: https://www.anthropic.com/news/economic-futures-uk-europe
Publisher: Anthropic
Category: Vendor framing
Sector: Scientific research
Capability: Enterprise workflow automation
Score: 40/100
Claim: AI adoption is increasing rapidly in Europe and the UK, but the conversation about how to manage its effects on labor and the economy is still at a very early stage. This matters: the decisions politicians make today will affect the continent’s labor force, productivity, and growth for years to come. We want to help researchers, policymakers, and.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Cognizant will make Claude available to 350,000 employees, accelerating enterprise AI adoption and internal transformation
Source: https://www.anthropic.com/news/cognizant-partnership
Publisher: Anthropic
Category: Deployments
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 95/100
Claim: Cognizant, a leading information technology consulting company, announced today that it will use Claude to help its enterprise customers and internal teams move from AI experimentation to production outcomes. Cognizant will deploy Claude to up to 350,000 employees globally, combining Claude with agentic tooling, Cognizant's engineering platforms, and.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Anthropic and Iceland announce one of the world’s first national AI education pilots
Source: https://www.anthropic.com/news/anthropic-and-iceland-announce-one-of-the-world-s-first-national-ai-education-pilots
Publisher: Anthropic
Category: Deployments
Sector: Education
Capability: Education and workforce adoption
Score: 85/100
Claim: Today, Anthropic and Iceland's Ministry of Education and Children are announcing a partnership to bring Claude to teachers across the nation, launching one of the world's first comprehensive national AI education pilots. This initiative will give teachers from every region of Iceland—from Reykjavik to the most remote villages—access to advanced AI tools as.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Introducing IndQA
Source: https://openai.com/index/introducing-indqa
Publisher: OpenAI
Category: Benchmarks
Sector: General AI capability
Capability: Frontier model release and benchmark movement
Score: 76/100
Claim: OpenAI introduces IndQA, a new benchmark for evaluating AI systems in Indian languages. Built with domain experts, IndQA tests cultural understanding and reasoning across 12 languages and 10 knowledge areas.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## AWS and OpenAI announce multi-year strategic partnership
Source: https://openai.com/index/aws-and-openai-partnership
Publisher: OpenAI
Category: Deployments
Sector: Enterprise operations
Capability: Production AI deployment signal
Score: 89/100
Claim: OpenAI and AWS have entered a multi-year, $38 billion partnership to scale advanced AI workloads. AWS will provide world-class infrastructure and compute capacity to power OpenAI’s next generation of models.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Expanding Stargate to Michigan
Source: https://openai.com/index/expanding-stargate-to-michigan
Publisher: OpenAI
Category: Labour market
Sector: Education
Capability: Education and workforce adoption
Score: 62/100
Claim: OpenAI is expanding Stargate to Michigan with a new one-gigawatt campus that strengthens America’s AI infrastructure. The project will create jobs, drive investment, and support economic growth across the Midwest.
Oracle verdict: This is a labour-market context signal rather than a single workflow proof point. It helps the thesis track whether adoption, education, wages, and institutional behaviour are moving in the same direction as the capability curve.
Thesis relevance: Appendix III, section five: labour-market and adoption evidence

## Introducing Aardvark: OpenAI’s agentic security researcher
Source: https://openai.com/index/introducing-aardvark
Publisher: OpenAI
Category: Benchmarks
Sector: Software engineering
Capability: Cyber defence and misuse monitoring
Score: 86/100
Claim: OpenAI introduces Aardvark, an AI-powered security researcher that autonomously finds, validates, and helps fix software vulnerabilities at scale. The system is in private beta—sign up to join early testing.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## How we built OWL, the new architecture behind our ChatGPT-based browser, Atlas
Source: https://openai.com/index/building-chatgpt-atlas
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 74/100
Claim: A deep dive into OWL, the new architecture powering ChatGPT Atlas—decoupling Chromium, enabling fast startup, rich UI, and agentic browsing with ChatGPT.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## gpt-oss-safeguard technical report
Source: https://openai.com/index/gpt-oss-safeguard-technical-report
Publisher: OpenAI
Category: Vendor framing
Sector: Cybersecurity
Capability: Vendor platform capability signal
Score: 36/100
Claim: gpt-oss-safeguard-120b and gpt-oss-safeguard-20b are two open-weight reasoning models post-trained from the gpt-oss models and trained to reason from a provided policy in order to label content under that policy. In this report, we describe gpt-oss-safeguard’s capabilities and provide our baseline safety evaluations on the gpt-oss-safeguard models, using.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Introducing gpt-oss-safeguard
Source: https://openai.com/index/introducing-gpt-oss-safeguard
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Vendor platform capability signal
Score: 36/100
Claim: OpenAI introduces gpt-oss-safeguard—open-weight reasoning models for safety classification that let developers apply and iterate on custom policies.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Anthropic officially opens Tokyo office, signs Memorandum of Cooperation with the Japan AI Safety Institute
Source: https://www.anthropic.com/news/opening-our-tokyo-office
Publisher: Anthropic
Category: Vendor framing
Sector: General AI capability
Capability: Enterprise workflow automation
Score: 33/100
Claim: This week, we opened our first Asia-Pacific office in Tokyo, a milestone in Anthropic's international expansion. Our CEO and co-founder Dario Amodei traveled to Tokyo to meet with Prime Minister Takaichi, address members of the LDP Digitization Headquarters Committee, meet customers and sign a Memorandum of Cooperation with the Japan AI Safety Institute.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Advancing organizational transformation for business innovation
Source: https://openai.com/index/dai-nippon-printing
Publisher: OpenAI
Category: Benchmarks
Sector: Public sector
Capability: Enterprise workflow automation
Score: 96/100
Claim: DNP rolled out ChatGPT Enterprise across ten core departments, achieving 95% faster patent research, 10x processing volume, 87% automation, and 70% knowledge reuse in three months.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Doppel’s AI defense system stops attacks before they spread
Source: https://openai.com/index/doppel
Publisher: OpenAI
Category: Deployments
Sector: Cybersecurity
Capability: Frontier model release and benchmark movement
Score: 94/100
Claim: Doppel uses GPT-5 and reinforcement fine-tuning to stop deepfake and impersonation attacks, cutting analyst workloads by 80% and reducing response times from hours to minutes.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## The next chapter of the Microsoft–OpenAI partnership
Source: https://openai.com/index/next-chapter-of-microsoft-openai-partnership
Publisher: OpenAI
Category: Vendor framing
Sector: AI infrastructure
Capability: Vendor platform capability signal
Score: 55/100
Claim: Microsoft and OpenAI sign a new agreement that strengthens its long-term partnership, expands innovation, and ensures responsible AI progress.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Built to benefit everyone
Source: https://openai.com/index/built-to-benefit-everyone
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Agent platform and API infrastructure
Score: 64/100
Claim: OpenAI’s recapitalization strengthens mission-focused governance, expanding resources to ensure AI benefits everyone while advancing innovation responsibly.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Strengthening ChatGPT’s responses in sensitive conversations
Source: https://openai.com/index/strengthening-chatgpt-responses-in-sensitive-conversations
Publisher: OpenAI
Category: Vendor framing
Sector: Healthcare and life sciences
Capability: Healthcare and life-sciences reasoning
Score: 64/100
Claim: OpenAI collaborated with 170+ mental health experts to improve ChatGPT’s ability to recognize distress, respond empathetically, and guide users toward real-world support—reducing unsafe responses by up to 80%. Learn how we’re making ChatGPT safer and more supportive in sensitive moments.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Addendum to GPT-5 System Card: Sensitive conversations
Source: https://openai.com/index/gpt-5-system-card-sensitive-conversations
Publisher: OpenAI
Category: Vendor framing
Sector: Healthcare and life sciences
Capability: Frontier model release and benchmark movement
Score: 48/100
Claim: This system card details GPT-5’s improvements in handling sensitive conversations, including new benchmarks for emotional reliance, mental health, and jailbreak resistance.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Steuerrecht.com delivers client-ready legal analysis with ChatGPT
Source: https://openai.com/index/steuerrecht
Publisher: OpenAI
Category: Benchmarks
Sector: Scientific research
Capability: Enterprise workflow automation
Score: 90/100
Claim: Steuerrecht.com uses ChatGPT Business to streamline legal workflows, automate tax research, and deliver faster, client-ready analysis for law firms.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Advancing Claude for Financial Services
Source: https://www.anthropic.com/news/advancing-claude-for-financial-services
Publisher: Anthropic
Category: Benchmarks
Sector: Financial services
Capability: Financial workflow automation
Score: 86/100
Claim: We're expanding Claude for Financial Services with an Excel add-in, additional connectors to real-time market data and portfolio analytics, and new pre-built Agent Skills, like building discounted cash flow models and initiating coverage reports. These updates build on Sonnet 4.5’s state of the art performance on financial tasks, topping the Finance Agent.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## OpenAI acquires Software Applications Incorporated, maker of Sky
Source: https://openai.com/index/openai-acquires-software-applications-incorporated
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Vendor platform capability signal
Score: 64/100
Claim: OpenAI has acquired Software Applications Incorporated, maker of Sky—a natural language interface for Mac that brings AI directly into your desktop experience. Together, we’re integrating Sky’s deep macOS capabilities into ChatGPT to make AI more intuitive, contextual, and action-oriented.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Consensus accelerates research with GPT-5 and Responses API
Source: https://openai.com/index/consensus
Publisher: OpenAI
Category: Benchmarks
Sector: Scientific research
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: Consensus uses GPT-5 and OpenAI’s Responses API to power a multi-agent research assistant that reads, analyzes, and synthesizes evidence in minutes—helping over 8 million researchers accelerate scientific discovery.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Work smarter with your company knowledge in ChatGPT
Source: https://openai.com/index/introducing-company-knowledge
Publisher: OpenAI
Category: Vendor framing
Sector: Cybersecurity
Capability: Enterprise workflow automation
Score: 45/100
Claim: Company knowledge brings context from your apps into ChatGPT for answers specific to your business, with clear citations, security, privacy, and admin controls. Available now for Business, Enterprise, and Edu users.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## AI in South Korea—OpenAI’s Economic Blueprint
Source: https://openai.com/index/south-korea-economic-blueprint
Publisher: OpenAI
Category: Deployments
Sector: General AI capability
Capability: Education and workforce adoption
Score: 85/100
Claim: OpenAI's Korea Economic Blueprint outlines how South Korea can scale trusted AI through sovereign capabilities and strategic partnerships to drive growth.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Seoul becomes Anthropic’s third office in Asia-Pacific as we continue our international growth
Source: https://www.anthropic.com/news/seoul-becomes-third-anthropic-office-in-asia-pacific
Publisher: Anthropic
Category: Vendor framing
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 42/100
Claim: Today we're announcing plans to open an office in Seoul in early 2026 as our global operations expand into Korea. Seoul comes on the heels of new offices in Tokyo and Bengaluru, and together this expansion reflects the extraordinary momentum we're seeing across Asia-Pacific—our run rate revenue in the region has grown over 10x in the past year. The Korean.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Expanding our use of Google Cloud TPUs and Services
Source: https://www.anthropic.com/news/expanding-our-use-of-google-cloud-tpus-and-services
Publisher: Anthropic
Category: Benchmarks
Sector: Scientific research
Capability: Agent platform and API infrastructure
Score: 80/100
Claim: Today, we are announcing that we plan to expand our use of Google Cloud technologies, including up to one million TPUs, dramatically increasing our compute resources as we continue to push the boundaries of AI research and product development. The expansion is worth tens of billions of dollars and is expected to bring well over a gigawatt of capacity.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## The next chapter for UK sovereign AI
Source: https://openai.com/index/the-next-chapter-for-uk-sovereign-ai
Publisher: OpenAI
Category: Deployments
Sector: Customer operations
Capability: Enterprise workflow automation
Score: 85/100
Claim: OpenAI expands its UK partnership with a new Ministry of Justice agreement, bringing ChatGPT to civil servants. It also introduces UK data residency for ChatGPT Enterprise, ChatGPT Edu, and the API Platform to support trusted and secure AI adoption.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## AI in Japan—OpenAI’s Japan Economic Blueprint
Source: https://openai.com/index/japan-economic-blueprint
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Education and workforce adoption
Score: 64/100
Claim: OpenAI’s Japan Economic Blueprint outlines how Japan can harness AI to boost innovation, strengthen competitiveness, and enable sustainable, inclusive growth.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Continue your ChatGPT experience beyond WhatsApp
Source: https://openai.com/index/chatgpt-whatsapp-transition
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 64/100
Claim: ChatGPT will no longer be available on WhatsApp after January 15, 2026. Learn how to link your ChatGPT account and continue your conversations across devices.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Introducing ChatGPT Atlas, the browser with ChatGPT built in
Source: https://openai.com/index/introducing-chatgpt-atlas
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 38/100
Claim: ChatGPT Atlas, the browser with ChatGPT built it. Get instant answers, summaries, and smart web help—right from any page. With privacy settings you can control. Available now for MacOS.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## A statement from Dario Amodei on Anthropic's commitment to American AI leadership
Source: https://www.anthropic.com/news/statement-dario-amodei-american-ai-leadership
Publisher: Anthropic
Category: Vendor framing
Sector: Enterprise operations
Capability: Vendor platform capability signal
Score: 64/100
Claim: A statement from Anthropic CEO Dario Amodei on Anthropic’s commitment to advancing America's leadership in building powerful and beneficial AI. Anthropic is built on a simple principle: AI should be a force for human progress, not peril . That means making products that are genuinely useful , speaking honestly about risks and benefits, and working with.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Claude for Life Sciences
Source: https://www.anthropic.com/news/claude-for-life-sciences
Publisher: Anthropic
Category: Benchmarks
Sector: Healthcare and life sciences
Capability: Healthcare and life-sciences reasoning
Score: 86/100
Claim: Increasing the rate of scientific progress is a core part of Anthropic’s public benefit mission. We are focused on building the tools to allow researchers to make new discoveries – and eventually, to allow AI models to make these discoveries autonomously.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Plex Coffee delivers fast, personal service with ChatGPT
Source: https://openai.com/index/plex-coffee
Publisher: OpenAI
Category: Deployments
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 82/100
Claim: Learn how Plex Coffee uses ChatGPT Business to centralize knowledge, train staff faster, and preserve personal connections while expanding.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Introducing Claude Haiku 4.5
Source: https://www.anthropic.com/news/claude-haiku-4-5
Publisher: Anthropic
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: Claude Haiku 4.5, our latest small model, is available today to all users. What was recently at the frontier is now cheaper and faster. Five months ago, Claude Sonnet 4 was a state-of-the-art model. Today, Claude Haiku 4.5 gives you similar levels of coding performance but at one-third the cost and more than twice the speed.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Expert Council on Well-Being and AI
Source: https://openai.com/index/expert-council-on-well-being-and-ai
Publisher: OpenAI
Category: Benchmarks
Sector: Healthcare and life sciences
Capability: Healthcare and life-sciences reasoning
Score: 76/100
Claim: OpenAI’s new Expert Council on Well-Being and AI brings together leading psychologists, clinicians, and researchers to guide how ChatGPT supports emotional health, especially for teens. Learn how their insights are shaping safer, more caring AI experiences.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Anthropic and Salesforce expand partnership to bring Claude to regulated industries
Source: https://www.anthropic.com/news/salesforce-anthropic-expanded-partnership
Publisher: Anthropic
Category: Benchmarks
Sector: Financial services
Capability: Financial workflow automation
Score: 93/100
Claim: Anthropic and Salesforce today announced an expanded partnership to make Claude a preferred model for Salesforce's Agentforce platform, enabling Salesforce customers in financial services, healthcare, cybersecurity, and life sciences to use trusted AI while keeping sensitive data secure. Additionally, Salesforce is deploying Claude Code across its global.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## OpenAI and Broadcom announce strategic collaboration to deploy 10 gigawatts of OpenAI-designed AI accelerators
Source: https://openai.com/index/openai-and-broadcom-announce-strategic-collaboration
Publisher: OpenAI
Category: Deployments
Sector: AI infrastructure
Capability: Production AI deployment signal
Score: 85/100
Claim: OpenAI and Broadcom announce a multi-year partnership to deploy 10 gigawatts of OpenAI-designed AI accelerators, co-developing next-generation systems and Ethernet solutions to power scalable, energy-efficient AI infrastructure by 2029.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## HYGH speeds development and campaigns with ChatGPT Business
Source: https://openai.com/index/hygh
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Enterprise workflow automation
Score: 64/100
Claim: HYGH speeds up software development and campaign delivery with ChatGPT Business, cutting turnaround times, scaling output, and driving revenue growth.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Defining and evaluating political bias in LLMs
Source: https://openai.com/index/defining-and-evaluating-political-bias-in-llms
Publisher: OpenAI
Category: Benchmarks
Sector: General AI capability
Capability: Model and benchmark capability movement
Score: 76/100
Claim: Learn how OpenAI evaluates political bias in ChatGPT through new real-world testing methods that improve objectivity and reduce bias.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## HiBob turns 2,500 GPTs into product and team growth
Source: https://openai.com/index/hibob
Publisher: OpenAI
Category: Deployments
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 95/100
Claim: Discover how HiBob uses ChatGPT Enterprise and custom GPTs to scale AI adoption, boost revenue, streamline HR workflows, and deliver AI-powered features in the Bob platform.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Rahul Patil joins Anthropic as Chief Technology Officer
Source: https://www.anthropic.com/news/rahul-patil-joins-anthropic
Publisher: Anthropic
Category: Benchmarks
Sector: Cybersecurity
Capability: Enterprise workflow automation
Score: 61/100
Claim: We're excited to announce that Rahul Patil has joined Anthropic as our Chief Technology Officer. Rahul will oversee our engineering organization across product, compute, infrastructure, inference, data science, and security as we scale Claude to meet growing enterprise demand worldwide. Rahul brings over 20 years of experience building and maintaining.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Expanding our global operations to India with our second Asia Pacific office
Source: https://www.anthropic.com/news/expanding-global-operations-to-india
Publisher: Anthropic
Category: Vendor framing
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 42/100
Claim: Today we’re announcing that we’re expanding our global operations to India, with plans to open an office in Bengaluru in early 2026. Bengaluru will serve as our second office in Asia Pacific after Tokyo , which will open in the coming months. This expansion will help us serve India’s rapidly growing AI ecosystem and reflects the increasing international.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Codex is now generally available
Source: https://openai.com/index/codex-now-generally-available
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 52/100
Claim: OpenAI Codex is now generally available with powerful new features for developers: a Slack integration, Codex SDK, and admin tools like usage dashboards and workspace management—making Codex easier to use and manage at scale.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Introducing apps in ChatGPT and the new Apps SDK
Source: https://openai.com/index/introducing-apps-in-chatgpt
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Agent platform and API infrastructure
Score: 64/100
Claim: We’re introducing a new generation of apps you can chat with, right inside ChatGPT. Developers can start building them today with the new Apps SDK, available in preview.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## AMD and OpenAI announce strategic partnership to deploy 6 gigawatts of AMD GPUs
Source: https://openai.com/index/openai-amd-strategic-partnership
Publisher: OpenAI
Category: Deployments
Sector: AI infrastructure
Capability: Production AI deployment signal
Score: 85/100
Claim: AMD and OpenAI have announced a multi-year partnership to deploy 6 gigawatts of AMD Instinct GPUs, beginning with 1 gigawatt in 2026, to power OpenAI’s next-generation AI infrastructure and accelerate global AI innovation.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Introducing AgentKit, new Evals, and RFT for agents
Source: https://openai.com/index/introducing-agentkit
Publisher: OpenAI
Category: Benchmarks
Sector: Software engineering
Capability: Agent platform and API infrastructure
Score: 90/100
Claim: Today, we’re releasing new tools to help developers go from prototype to production faster: AgentKit, expanded evals capabilities, and reinforcement fine-tuning for agents.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Deloitte will make Claude available to 470,000 people across its global network
Source: https://www.anthropic.com/news/deloitte-anthropic-partnership
Publisher: Anthropic
Category: Deployments
Sector: Enterprise operations
Capability: Production AI deployment signal
Score: 78/100
Claim: Anthropic and Deloitte today announced an expanded alliance that will make Claude available to Deloitte people across its global network and develop new industry-specific solutions powered by Claude. As part of the collaboration, Deloitte will establish a Claude Center of Excellence with trained specialists who will develop implementation frameworks, share.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## With GPT-5, Wrtn builds lifestyle AI for millions in Korea
Source: https://openai.com/index/wrtn
Publisher: OpenAI
Category: Vendor framing
Sector: Enterprise operations
Capability: Frontier model release and benchmark movement
Score: 80/100
Claim: Wrtn scaled AI apps to 6.5M users in Korea with GPT-5, creating ‘Lifestyle AI’ that blends productivity, creativity, and learning—now expanding across East Asia.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Samsung and SK join OpenAI’s Stargate initiative to advance global AI infrastructure
Source: https://openai.com/index/samsung-and-sk-join-stargate
Publisher: OpenAI
Category: Vendor framing
Sector: AI infrastructure
Capability: Vendor platform capability signal
Score: 64/100
Claim: Samsung and SK join OpenAI’s Stargate initiative to expand global AI infrastructure, scaling advanced memory chip production and building next-gen data centers in Korea.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Sora 2 System Card
Source: https://openai.com/index/sora-2-system-card
Publisher: OpenAI
Category: Vendor framing
Sector: Media and content
Capability: Multimodal content generation and media workflows
Score: 26/100
Claim: Sora 2 is our new state of the art video and audio generation model. Building on the foundation of Sora, this new model introduces capabilities that have been difficult for prior video models to achieve– such as more accurate physics, sharper realism, synchronized audio, enhanced steerability, and an expanded stylistic range.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Launching Sora responsibly
Source: https://openai.com/index/launching-sora-responsibly
Publisher: OpenAI
Category: Vendor framing
Sector: Media and content
Capability: Multimodal content generation and media workflows
Score: 26/100
Claim: To address the novel safety challenges posed by a state-of-the-art video model as well as a new social creation platform, we’ve built Sora 2 and the Sora app with safety at the foundation. Our approach is anchored in concrete protections.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Sora 2 is here
Source: https://openai.com/index/sora-2
Publisher: OpenAI
Category: Vendor framing
Sector: Media and content
Capability: Multimodal content generation and media workflows
Score: 64/100
Claim: Our latest video generation model is more physically accurate, realistic, and controllable than prior systems. It also features synchronized dialogue and sound effects. Create with it in the new Sora app.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Building OpenAI with OpenAI
Source: https://openai.com/index/building-openai-with-openai
Publisher: OpenAI
Category: Vendor framing
Sector: Enterprise operations
Capability: Vendor platform capability signal
Score: 64/100
Claim: At OpenAI, we rely on our own technology to help streamline work, scale expertise, and drive outcomes. In our new series, OpenAI on OpenAI, we share lessons to help other organizations do the same.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Driving sales productivity and customer success at OpenAI
Source: https://openai.com/index/openai-gtm-assistant
Publisher: OpenAI
Category: Deployments
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 89/100
Claim: Learn how OpenAI boosts sales productivity by automating prep, centralizing knowledge, and scaling top-selling practices.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Converting inbound leads into customers at OpenAI
Source: https://openai.com/index/openai-inbound-sales-assistant
Publisher: OpenAI
Category: Deployments
Sector: General AI capability
Capability: Production AI deployment signal
Score: 85/100
Claim: Learn how OpenAI used AI to deliver personalized answers at scale, converting inbound leads into customers.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Improving support with every interaction at OpenAI
Source: https://openai.com/index/openai-support-model
Publisher: OpenAI
Category: Deployments
Sector: Customer operations
Capability: Production AI deployment signal
Score: 78/100
Claim: Learn how OpenAI uses AI to enhance support, cutting response times, improving quality, and scaling to meet hypergrowth.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Turning contracts into searchable data at OpenAI
Source: https://openai.com/index/openai-contract-data-agent
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 64/100
Claim: OpenAI built a system to extract contract data quickly, cutting turnaround times and making it easier for teams to access the details they need.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Empowering teams to unlock insights faster at OpenAI
Source: https://openai.com/index/openai-research-assistant
Publisher: OpenAI
Category: Benchmarks
Sector: Customer operations
Capability: Model and benchmark capability movement
Score: 80/100
Claim: OpenAI’s research assistant helps teams analyze millions of support tickets, surface insights faster, and scale curiosity across the company.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Combating online child sexual exploitation & abuse
Source: https://openai.com/index/combating-online-child-sexual-exploitation-abuse
Publisher: OpenAI
Category: Vendor framing
Sector: Cybersecurity
Capability: Cyber defence and misuse monitoring
Score: 64/100
Claim: Discover how OpenAI combats online child sexual exploitation and abuse with strict usage policies, advanced detection tools, and industry collaboration to block, report, and prevent AI misuse.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Introducing parental controls
Source: https://openai.com/index/introducing-parental-controls
Publisher: OpenAI
Category: Vendor framing
Sector: Enterprise operations
Capability: Vendor platform capability signal
Score: 64/100
Claim: We’re rolling out parental controls and a new parent resource page to help families guide how ChatGPT works in their homes.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Buy it in ChatGPT: Instant Checkout and the Agentic Commerce Protocol
Source: https://openai.com/index/buy-it-in-chatgpt
Publisher: OpenAI
Category: Vendor framing
Sector: Commerce and marketplace
Capability: Enterprise workflow automation
Score: 74/100
Claim: We’re taking first steps toward agentic commerce in ChatGPT with new ways for people, AI agents, and businesses to shop together.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Enabling Claude Code to work more autonomously
Source: https://www.anthropic.com/news/enabling-claude-code-to-work-more-autonomously
Publisher: Anthropic
Category: Deployments
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 88/100
Claim: We’re introducing several upgrades to Claude Code : a native VS Code extension, version 2.0 of our terminal interface, and checkpoints for autonomous operation. Powered by Sonnet 4.5 , Claude Code now handles longer, more complex development tasks in your terminal and IDE. VS Code extension.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Introducing Claude Sonnet 4.5
Source: https://www.anthropic.com/news/claude-sonnet-4-5
Publisher: Anthropic
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: Claude Sonnet 4.5 is the best coding model in the world. It's the strongest model for building complex agents. It’s the best model at using computers. And it shows substantial gains in reasoning and math. Code is everywhere. It runs every application, spreadsheet, and software tool you use. Being able to use those tools and reason through hard problems is.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Partnering with AARP to help keep older adults safe online
Source: https://openai.com/index/aarp-partnership-older-adults-online-safety
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 64/100
Claim: OpenAI and AARP are partnering to help older adults stay safe online with new AI training, scam-spotting tools, and nationwide programs through OpenAI Academy and OATS’s Senior Planet initiative.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Anthropic expands global leadership in enterprise AI, naming Chris Ciauri as Managing Director of International
Source: https://www.anthropic.com/news/anthropic-expands-global-leadership-in-enterprise-ai-naming-chris-ciauri-as-managing-director-of
Publisher: Anthropic
Category: Deployments
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 63/100
Claim: Today we're announcing Anthropic's expanded global presence with key leadership appointments, enterprise customer momentum, and new international offices across multiple continents. This expansion reflects Anthropic's growth trajectory and increasing international demand for Claude. Anthropic has the top market share in enterprise AI*, and our run-rate.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## More ways to work with your team and tools in ChatGPT
Source: https://openai.com/index/more-ways-to-work-with-your-team
Publisher: OpenAI
Category: Vendor framing
Sector: Cybersecurity
Capability: Cyber defence and misuse monitoring
Score: 64/100
Claim: New shared projects, smarter connectors, and compliance and security updates help teams get more done.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Measuring the performance of our models on real-world tasks
Source: https://openai.com/index/gdpval
Publisher: OpenAI
Category: Benchmarks
Sector: General AI capability
Capability: Education and workforce adoption
Score: 76/100
Claim: OpenAI introduces GDPval, a new evaluation that measures model performance on real-world economically valuable tasks across 44 occupations.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Introducing ChatGPT Pulse
Source: https://openai.com/index/introducing-chatgpt-pulse
Publisher: OpenAI
Category: Benchmarks
Sector: Scientific research
Capability: Model and benchmark capability movement
Score: 76/100
Claim: Today we're releasing a preview of ChatGPT Pulse to Pro users on mobile. Pulse is a new experience where ChatGPT proactively does research to deliver personalized updates based on your chats, feedback, and connected apps like your calendar.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## ENEOS Materials brings ChatGPT Enterprise to manufacturing
Source: https://openai.com/index/eneos-materials
Publisher: OpenAI
Category: Vendor framing
Sector: Scientific research
Capability: Enterprise workflow automation
Score: 53/100
Claim: ENEOS Materials uses ChatGPT Enterprise to speed research, improve plant design safety, and cut HR analysis time by 90%, with 80% reporting better workflows.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## OpenAI, Oracle, and SoftBank expand Stargate with five new AI datacenter sites
Source: https://openai.com/index/five-new-stargate-sites
Publisher: OpenAI
Category: Labour market
Sector: Financial services
Capability: Financial workflow automation
Score: 72/100
Claim: OpenAI, Oracle, and SoftBank announce five new Stargate AI datacenter sites, accelerating a $500B, 10-gigawatt U.S. infrastructure buildout to power next-generation AI and create tens of thousands of jobs.
Oracle verdict: This is a labour-market context signal rather than a single workflow proof point. It helps the thesis track whether adoption, education, wages, and institutional behaviour are moving in the same direction as the capability curve.
Thesis relevance: Appendix III, section five: labour-market and adoption evidence

## CNA is transforming its newsroom with AI
Source: https://openai.com/index/cna-walter-fernandez
Publisher: OpenAI
Category: Vendor framing
Sector: Media and content
Capability: Vendor platform capability signal
Score: 64/100
Claim: In this Executive Function series from OpenAI, discover how CNA is transforming its newsroom with AI. Editor-in-Chief Walter Fernandez shares insights on AI adoption, culture, and the future of journalism.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## SchoolAI builds an AI platform that empowers teachers
Source: https://openai.com/index/schoolai
Publisher: OpenAI
Category: Deployments
Sector: Education
Capability: Education and workforce adoption
Score: 82/100
Claim: SchoolAI uses GPT-4.1, image generation, and TTS to power safe, teacher-guided AI tools for over 1 million classrooms, improving engagement, oversight, and personalized learning.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## OpenAI and NVIDIA announce strategic partnership to deploy 10 gigawatts of NVIDIA systems
Source: https://openai.com/index/openai-nvidia-systems-partnership
Publisher: OpenAI
Category: Deployments
Sector: AI infrastructure
Capability: Production AI deployment signal
Score: 85/100
Claim: OpenAI and NVIDIA announce a strategic partnership to deploy 10 gigawatts of AI datacenters powered by NVIDIA systems, with the first phase launching in 2026.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Detecting and reducing scheming in AI models
Source: https://openai.com/index/detecting-and-reducing-scheming-in-ai-models
Publisher: OpenAI
Category: Vendor framing
Sector: Scientific research
Capability: Frontier model release and benchmark movement
Score: 48/100
Claim: Apollo Research and OpenAI developed evaluations for hidden misalignment (“scheming”) and found behaviors consistent with scheming in controlled tests across frontier models. The team shared concrete examples and stress tests of an early method to reduce scheming.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Introducing Stargate UK
Source: https://openai.com/index/introducing-stargate-uk
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 64/100
Claim: Official OpenAI release: Introducing Stargate UK.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Building towards age prediction
Source: https://openai.com/index/building-towards-age-prediction
Publisher: OpenAI
Category: Vendor framing
Sector: Customer operations
Capability: Vendor platform capability signal
Score: 64/100
Claim: Learn how OpenAI is building age prediction and parental controls in ChatGPT to create safer, age-appropriate experiences for teens while supporting families with new tools.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Teen safety, freedom, and privacy
Source: https://openai.com/index/teen-safety-freedom-and-privacy
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 26/100
Claim: Explore OpenAI’s approach to balancing teen safety, freedom, and privacy in AI use.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Introducing upgrades to Codex
Source: https://openai.com/index/introducing-upgrades-to-codex
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 78/100
Claim: Codex just got faster, more reliable, and better at real-time collaboration and tackling tasks independently anywhere you develop—whether via the terminal, IDE, web, or even your phone.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## How people are using ChatGPT
Source: https://openai.com/index/how-people-are-using-chatgpt
Publisher: OpenAI
Category: Benchmarks
Sector: Scientific research
Capability: Education and workforce adoption
Score: 76/100
Claim: New research from the largest study of ChatGPT use shows how the tool creates economic value through both personal and professional use. Adoption is broadening beyond early users, closing gaps and making AI a part of everyday life.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Addendum to GPT-5 system card: GPT-5-Codex
Source: https://openai.com/index/gpt-5-system-card-addendum-gpt-5-codex
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 58/100
Claim: This addendum to the GPT-5 system card shares a new model: GPT-5-Codex, a version of GPT-5 further optimized for agentic coding in Codex. GPT-5-Codex adjusts its thinking effort more dynamically based on task complexity, responding quickly to simple conversational queries or small tasks, while independently working for longer on more complex tasks.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Claude is now generally available in Xcode
Source: https://www.anthropic.com/news/claude-in-xcode
Publisher: Anthropic
Category: Deployments
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 90/100
Claim: Developers can now connect their Claude account to Xcode 26 to power coding intelligence features with Claude Sonnet 4. Xcode is Apple's integrated development environment (IDE) and offers the tools you need to develop, test, and distribute apps for Apple platforms. This integration lets developers use Claude's coding capabilities directly in their.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Working with US CAISI and UK AISI to build more secure AI systems
Source: https://openai.com/index/us-caisi-uk-aisi-ai-update
Publisher: OpenAI
Category: Vendor framing
Sector: Cybersecurity
Capability: Cyber defence and misuse monitoring
Score: 43/100
Claim: OpenAI shares progress on the partnership with the US CAISI and UK AISI to strengthen AI safety and security.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Strengthening our safeguards through collaboration with US CAISI and UK AISI
Source: https://www.anthropic.com/news/strengthening-our-safeguards-through-collaboration-with-us-caisi-and-uk-aisi
Publisher: Anthropic
Category: Vendor framing
Sector: Cybersecurity
Capability: Cyber defence and misuse monitoring
Score: 43/100
Claim: Over the past year, we've collaborated with the US Center for AI Standards and Innovation (CAISI) and UK AI Security Institute (AISI), government bodies established to measure and improve the security of AI systems. Our voluntary work together began as initial consultations, but over time evolved to an ongoing partnership where CAISI and AISI teams were.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## A joint statement from OpenAI and Microsoft
Source: https://openai.com/index/joint-statement-from-openai-and-microsoft
Publisher: OpenAI
Category: Vendor framing
Sector: AI infrastructure
Capability: Vendor platform capability signal
Score: 43/100
Claim: OpenAI and Microsoft sign a new MOU, reinforcing their partnership and shared commitment to AI safety and innovation.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Statement on OpenAI’s Nonprofit and PBC
Source: https://openai.com/index/statement-on-openai-nonprofit-and-pbc
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 54/100
Claim: OpenAI reaffirms its nonprofit leadership with a new structure granting equity in its PBC, enabling over $100B in resources to advance safe, beneficial AI for humanity.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## SafetyKit scales risk agents with OpenAI’s most capable models
Source: https://openai.com/index/safetykit
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Frontier model release and benchmark movement
Score: 58/100
Claim: Discover how SafetyKit leverages OpenAI GPT-5 to enhance content moderation, enforce compliance, and outpace legacy safety systems with greater accuracy .
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## A People-First AI Fund: $50M to support nonprofits
Source: https://openai.com/index/people-first-ai-fund
Publisher: OpenAI
Category: Vendor framing
Sector: Education
Capability: Education and workforce adoption
Score: 54/100
Claim: Applications are now open for OpenAI’s People-First AI Fund, a $50M initiative supporting U.S. nonprofits advancing education, community innovation, and economic opportunity. Apply by October 8, 2025, for unrestricted grants that help communities shape AI for the public good.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Anthropic is endorsing SB 53
Source: https://www.anthropic.com/news/anthropic-is-endorsing-sb-53
Publisher: Anthropic
Category: Vendor framing
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 64/100
Claim: Anthropic is endorsing SB 53 , the California bill that governs powerful AI systems built by frontier AI developers like Anthropic. We’ve long advocated for thoughtful AI regulation and our support for this bill comes after careful consideration of the lessons learned from California's previous attempt at AI regulation ( SB 1047 ). While we believe that.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Why language models hallucinate
Source: https://openai.com/index/why-language-models-hallucinate
Publisher: OpenAI
Category: Vendor framing
Sector: Scientific research
Capability: Vendor platform capability signal
Score: 36/100
Claim: OpenAI’s new research explains why language models hallucinate. The findings show how improved evaluations can enhance AI reliability, honesty, and safety.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Expanding economic opportunity with AI
Source: https://openai.com/index/expanding-economic-opportunity-with-ai
Publisher: OpenAI
Category: Labour market
Sector: Enterprise operations
Capability: Education and workforce adoption
Score: 72/100
Claim: OpenAI is launching a Jobs Platform and new Certifications to connect workers with jobs, training, and certifications. Learn how we’re expanding economic opportunity and making AI skills more accessible.
Oracle verdict: This is a labour-market context signal rather than a single workflow proof point. It helps the thesis track whether adoption, education, wages, and institutional behaviour are moving in the same direction as the capability curve.
Thesis relevance: Appendix III, section five: labour-market and adoption evidence

## Updating restrictions of sales to unsupported regions
Source: https://www.anthropic.com/news/updating-restrictions-of-sales-to-unsupported-regions
Publisher: Anthropic
Category: Vendor framing
Sector: Cybersecurity
Capability: Cyber defence and misuse monitoring
Score: 42/100
Claim: Anthropic's Terms of Service prohibit use of our services in certain regions due to legal, regulatory, and security risks. However, companies from these restricted regions—including adversarial nations like China—continue accessing our services in various ways, such as through subsidiaries incorporated in other countries. Companies subject to control from.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Anthropic Signs White House Pledge to America's Youth: Investing in AI Education
Source: https://www.anthropic.com/news/anthropic-signs-pledge-to-americas-youth-investing-in-ai-education
Publisher: Anthropic
Category: Vendor framing
Sector: Education
Capability: Education and workforce adoption
Score: 54/100
Claim: Following our August signing of the White House's ' Pledge to America's Youth: Investing in AI Education' , today we joined companies across the country at the White House's AI Education Taskforce event, deepening our commitment to helping America's students build essential skills to excel and lead with AI. Anthropic has made three concrete commitments.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Vijaye Raji to become CTO of Applications with acquisition of Statsig
Source: https://openai.com/index/vijaye-raji-to-become-cto-of-applications-with-acquisition-of-statsig
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 64/100
Claim: Vijaye Raji will step into a new role as CTO of Applications, reporting to CEO of Applications, Fidji Simo, following the acquisition of Statsig.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Building more helpful ChatGPT experiences for everyone
Source: https://openai.com/index/building-more-helpful-chatgpt-experiences-for-everyone
Publisher: OpenAI
Category: Benchmarks
Sector: General AI capability
Capability: Model and benchmark capability movement
Score: 76/100
Claim: We’re partnering with experts, strengthening protections for teens with parental controls, and routing sensitive conversations to reasoning models in ChatGPT.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Anthropic raises $13B Series F at $183B post-money valuation
Source: https://www.anthropic.com/news/anthropic-raises-series-f-at-usd183b-post-money-valuation
Publisher: Anthropic
Category: Vendor framing
Sector: Scientific research
Capability: Vendor platform capability signal
Score: 56/100
Claim: Anthropic has completed a Series F fundraising of $13 billion led by ICONIQ. This financing values Anthropic at $183 billion post-money. Along with ICONIQ, the round was co-led by Fidelity Management & Research Company and Lightspeed Venture Partners. The investment reflects Anthropic’s continued momentum and reinforces our position as the leading.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Introducing gpt-realtime and Realtime API updates
Source: https://openai.com/index/introducing-gpt-realtime
Publisher: OpenAI
Category: Vendor framing
Sector: Customer operations
Capability: Multimodal content generation and media workflows
Score: 64/100
Claim: We’re releasing a more advanced speech-to-speech model and new API capabilities including MCP server support, image input, and SIP phone calling support.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Supporting nonprofit and community innovation
Source: https://openai.com/index/supporting-nonprofit-and-community-innovation
Publisher: OpenAI
Category: Vendor framing
Sector: Healthcare and life sciences
Capability: Healthcare and life-sciences reasoning
Score: 54/100
Claim: OpenAI launches a $50M People-First AI Fund to help U.S. nonprofits scale impact with AI. Applications open Sept 8–Oct 8, 2025 for grants in education, healthcare, research, and more.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Updates to Consumer Terms and Privacy Policy
Source: https://www.anthropic.com/news/updates-to-our-consumer-terms
Publisher: Anthropic
Category: Vendor framing
Sector: Cybersecurity
Capability: Vendor platform capability signal
Score: 26/100
Claim: Today, we're rolling out updates to our Consumer Terms and Privacy Policy that will help us deliver even more capable, useful AI models. We're now giving users the choice to allow their data to be used to improve Claude and strengthen our safeguards against harmful usage like scams and abuse. Adjusting your preferences is easy and can be done at any time.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Collective alignment: public input on our Model Spec
Source: https://openai.com/index/collective-alignment-aug-2025-updates
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Agent platform and API infrastructure
Score: 64/100
Claim: OpenAI surveyed over 1,000 people worldwide on how AI should behave and compared their views to our Model Spec. Learn how collective alignment is shaping AI defaults to better reflect diverse human values and perspectives.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## OpenAI and Anthropic share findings from a joint safety evaluation
Source: https://openai.com/index/openai-anthropic-safety-evaluation
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Vendor platform capability signal
Score: 36/100
Claim: OpenAI and Anthropic share findings from a first-of-its-kind joint safety evaluation, testing each other’s models for misalignment, instruction following, hallucinations, jailbreaking, and more—highlighting progress, challenges, and the value of cross-lab collaboration.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Introducing the Anthropic National Security and Public Sector Advisory Council
Source: https://www.anthropic.com/news/introducing-the-anthropic-national-security-and-public-sector-advisory-council
Publisher: Anthropic
Category: Vendor framing
Sector: Cybersecurity
Capability: Cyber defence and misuse monitoring
Score: 36/100
Claim: Today, we are announcing the formation of the Anthropic National Security and Public Sector Advisory Council, a group of leading bipartisan national security and public policy practitioners who will help Anthropic support the U.S. government and closely allied democracies in building and maintaining enduring technological advantages in an era of strategic.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Detecting and countering misuse of AI: August 2025
Source: https://www.anthropic.com/news/detecting-countering-misuse-aug-2025
Publisher: Anthropic
Category: Vendor framing
Sector: Cybersecurity
Capability: Cyber defence and misuse monitoring
Score: 36/100
Claim: We’ve developed sophisticated safety and security measures to prevent the misuse of our AI models. But cybercriminals and other malicious actors are actively attempting to find ways around them. Today, we’re releasing a report that details how. Our Threat Intelligence report discusses several recent examples of Claude being misused, including a large-scale.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Anthropic Education Report: How educators use Claude
Source: https://www.anthropic.com/news/anthropic-education-report-how-educators-use-claude
Publisher: Anthropic
Category: Labour market
Sector: Education
Capability: Education and workforce adoption
Score: 76/100
Claim: Understandably, much of the conversation of AI in education focuses on how students are using large language models to help them study and write. But educators use AI too. In a recent Gallup survey, teachers reported that AI tools saved them an average of 5.9 hours per week. And in an inversion of the usual discussion, students have begun expressing.
Oracle verdict: This is a labour-market context signal rather than a single workflow proof point. It helps the thesis track whether adoption, education, wages, and institutional behaviour are moving in the same direction as the capability curve.
Thesis relevance: Appendix III, section five: labour-market and adoption evidence

## Helping people when they need it most
Source: https://openai.com/index/helping-people-when-they-need-it-most
Publisher: OpenAI
Category: Vendor framing
Sector: Enterprise operations
Capability: Vendor platform capability signal
Score: 36/100
Claim: How we think about safety for users experiencing mental or emotional distress, the limits of today’s systems, and the work underway to refine them.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Accelerating life sciences research
Source: https://openai.com/index/accelerating-life-sciences-research-with-retro-biosciences
Publisher: OpenAI
Category: Benchmarks
Sector: Healthcare and life sciences
Capability: Healthcare and life-sciences reasoning
Score: 76/100
Claim: Discover how a specialized AI model, GPT-4b micro, helped OpenAI and Retro Bio engineer more effective proteins for stem cell therapy and longevity research.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Scaling domain expertise in complex, regulated domains
Source: https://openai.com/index/blue-j
Publisher: OpenAI
Category: Benchmarks
Sector: Scientific research
Capability: Model and benchmark capability movement
Score: 76/100
Claim: Discover how Blue J is transforming tax research with AI-powered tools built on GPT-4.1. By combining domain expertise with Retrieval-Augmented Generation, Blue J delivers fast, accurate, and fully-cited tax answers—trusted by professionals across the US, Canada, and the UK.
Oracle verdict: This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Developing nuclear safeguards for AI through public-private partnership
Source: https://www.anthropic.com/news/developing-nuclear-safeguards-for-ai-through-public-private-partnership
Publisher: Anthropic
Category: Vendor framing
Sector: Cybersecurity
Capability: Cyber defence and misuse monitoring
Score: 33/100
Claim: Nuclear technology is inherently dual-use: the same physics principles that power nuclear reactors can be misused for weapons development. As AI models become more capable, we need to keep a close eye on whether they can provide users with dangerous technical knowledge in ways that could threaten national security. Information relating to nuclear weapons.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Anthropic launches higher education advisory board and AI Fluency courses
Source: https://www.anthropic.com/news/anthropic-higher-education-initiatives
Publisher: Anthropic
Category: Vendor framing
Sector: Education
Capability: Education and workforce adoption
Score: 42/100
Claim: The choices made in the next few years about how AI enters the classroom will shape a generation's relationship with both technology and learning. Today, we're announcing two initiatives for AI in education to help navigate these critical decisions: a Higher Education Advisory Board to guide Claude's development for education, and three AI Fluency courses.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Mixi reimagines communication with ChatGPT
Source: https://openai.com/index/mixi
Publisher: OpenAI
Category: Deployments
Sector: Enterprise operations
Capability: Enterprise workflow automation
Score: 89/100
Claim: Discover how MIXI, a leader in digital entertainment and lifestyle services in Japan, uses ChatGPT Enterprise to transform productivity, boost AI adoption across teams, and create a secure environment for innovation.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Claude Code and new admin controls for business plans
Source: https://www.anthropic.com/news/claude-code-on-team-and-enterprise
Publisher: Anthropic
Category: Deployments
Sector: Software engineering
Capability: Autonomous software engineering and computer-use agents
Score: 95/100
Claim: Enterprise and Team customers can now upgrade to premium seats that include more usage and Claude Code—bringing our app and powerful coding agent together under one subscription. Users can move seamlessly between ideation and implementation, while admins get the visibility and controls they need to scale Claude across their organization. We are also.
Oracle verdict: Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Q&A with DoorDash’s CPO, Mariana Garavaglia
Source: https://openai.com/index/doordash-mariana-garavaglia
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Enterprise workflow automation
Score: 46/100
Claim: Learn how DoorDash is scaling AI adoption to empower employees to build, learn, and innovate faster in a conversation with Chief People Officer Mariana Garavaglia.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Usage policy update
Source: https://www.anthropic.com/news/usage-policy-update
Publisher: Anthropic
Category: Vendor framing
Sector: Enterprise operations
Capability: Vendor platform capability signal
Score: 26/100
Claim: Today, we’re sharing some updates to our Usage Policy that reflect the growing capabilities and evolving usage of our products. Our Usage Policy serves as a framework for how Claude should and shouldn’t be used, providing clear guidance for everyone who uses Anthropic’s products. In this update, our goal is to provide greater clarity and detail on our.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Scaling accounting capacity with OpenAI
Source: https://openai.com/index/basis
Publisher: OpenAI
Category: Vendor framing
Sector: General AI capability
Capability: Frontier model release and benchmark movement
Score: 86/100
Claim: Built with OpenAI o3, o3-Pro, GPT-4.1, and GPT-5, Basis’ AI agents help accounting firms save up to 30% of their time and expand capacity for advisory and growth.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Offering expanded Claude access across all three branches of the U.S. government
Source: https://www.anthropic.com/news/offering-expanded-claude-access-across-all-three-branches-of-government
Publisher: Anthropic
Category: Deployments
Sector: Public sector
Capability: Enterprise workflow automation
Score: 85/100
Claim: Today we are removing barriers to government AI adoption by offering Claude for Enterprise and Claude for Government to all three branches of government, including federal civilian executive branch agencies, as well as legislative and judiciary branches of government, for $1. As AI adoption leads to transformation across industries, we want to ensure that.
Oracle verdict: This is useful evidence because it moves AI from demo space into an actual organisational workflow. Treat it as a displacement-pressure signal where the near-term effect is task compression, supervision thinning, and fewer handoffs.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Building safeguards for Claude
Source: https://www.anthropic.com/news/building-safeguards-for-claude
Publisher: Anthropic
Category: Vendor framing
Sector: Cybersecurity
Capability: Cyber defence and misuse monitoring
Score: 40/100
Claim: Claude empowers millions of users to tackle complex challenges, spark creativity, and deepen their understanding of the world. We want to amplify human potential while ensuring our models’ capabilities are channeled toward beneficial outcomes. This means continuously refining how we support our users’ learning and problem-solving, while preventing misuse.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## GPT-5 and the new era of work
Source: https://openai.com/index/gpt-5-new-era-of-work
Publisher: OpenAI
Category: Benchmarks
Sector: Enterprise operations
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: GPT-5 is OpenAI’s most advanced model—transforming enterprise AI, automation, and workforce productivity in the new era of intelligent work.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Introducing GPT-5 for developers
Source: https://openai.com/index/introducing-gpt-5-for-developers
Publisher: OpenAI
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: Introducing GPT-5 in our API platform—offering high reasoning performance, new controls for devs, and best-in-class results on real coding tasks.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## Coding and design with GPT-5
Source: https://openai.com/index/gpt-5-coding-design
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 76/100
Claim: Learn how GPT-5 unlocks new possibilities in coding and design.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Creative writing with GPT-5
Source: https://openai.com/index/gpt-5-creative-writing
Publisher: OpenAI
Category: Vendor framing
Sector: Media and content
Capability: Frontier model release and benchmark movement
Score: 76/100
Claim: Learn how GPT-5 assists with creative writing.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## Medical research with GPT-5
Source: https://openai.com/index/gpt-5-medical-research
Publisher: OpenAI
Category: Benchmarks
Sector: Healthcare and life sciences
Capability: Frontier model release and benchmark movement
Score: 88/100
Claim: Learn how GPT-5 is used for medical research.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence

## First look at GPT-5
Source: https://openai.com/index/gpt-5-first-look
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 76/100
Claim: See how a group of leading developers use GPT-5 for the first time.
Oracle verdict: This is a lower-to-mid strength vendor signal for the capability register. It does not prove displacement on its own, but it records another platform step that can later show up as workflow automation, procurement change, or organisational dependency.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## GPT-5 System Card
Source: https://openai.com/index/gpt-5-system-card
Publisher: OpenAI
Category: Vendor framing
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 48/100
Claim: This GPT-5 system card explains how a unified model routing system powers fast and smart responses using gpt-5-main, gpt-5-thinking, and lightweight versions like gpt-5-thinking-nano, optimized for different tasks and developer use.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## From hard refusals to safe-completions: toward output-centric safety training
Source: https://openai.com/index/gpt-5-safe-completions
Publisher: OpenAI
Category: Vendor framing
Sector: AI infrastructure
Capability: Frontier model release and benchmark movement
Score: 48/100
Claim: Discover how OpenAI's new safe-completions approach in GPT-5 improves both safety and helpfulness in AI responses—moving beyond hard refusals to nuanced, output-centric safety training for handling dual-use prompts.
Oracle verdict: This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Thesis relevance: Appendix III, section two: vendor threshold and platform capability evidence

## How Cursor uses GPT-5
Source: https://openai.com/index/gpt-5-cursor
Publisher: OpenAI
Category: Deployments
Sector: General AI capability
Capability: Frontier model release and benchmark movement
Score: 90/100
Claim: Learn how Cursor uses GPT-5.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## How Amgen uses GPT-5
Source: https://openai.com/index/gpt-5-amgen
Publisher: OpenAI
Category: Deployments
Sector: General AI capability
Capability: Frontier model release and benchmark movement
Score: 90/100
Claim: Learn how Amgen uses GPT-5.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section four: enterprise deployment evidence

## Introducing GPT-5
Source: https://openai.com/index/introducing-gpt-5
Publisher: OpenAI
Category: Benchmarks
Sector: Software engineering
Capability: Frontier model release and benchmark movement
Score: 96/100
Claim: We are introducing GPT‑5, our best AI system yet. GPT‑5 is a significant leap in intelligence over all our previous models, featuring state-of-the-art performance across coding, math, writing, health, visual perception, and more.
Oracle verdict: OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Thesis relevance: Appendix III, section one: model and benchmark capability evidence
Claude Fable 5 and Claude Mythos 5

Benchmarks Software engineering Frontier model release and benchmark movement [CONFIRMING]
Today we’re launching Claude Fable 5 : a Mythos-class 1 model that we’ve made safe for general use. Fable 5’s capabilities exceed those of any model we’ve ever made generally available. It is state-of-the-art on nearly all tested benchmarks of AI capability, showing exceptional performance in software engineering, knowledge work, vision, scientific.
Anthropic is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Read verdict Source
Dreaming: Better memory for a more helpful ChatGPT

Benchmarks General AI capability Model and benchmark capability movement [CONFIRMING]
ChatGPT introduces a new memory system to better remember preferences, keeping context fresh and relevant across conversations.
This belongs in the register because benchmark and model-release claims set the ceiling for the next wave of deployment stories. The labour-market effect is indirect today, but it becomes direct when these gains are packaged into agents, APIs, and enterprise tools.
Read verdict Source
Introducing new capabilities to GPT-Rosalind

Benchmarks Healthcare and life sciences Enterprise workflow automation [CONFIRMING]
GPT-Rosalind advances life sciences research with enhanced biological reasoning, medicinal chemistry expertise, genomics analysis, and experimental workflow capabilities.
OpenAI is describing a frontier or production capability that pushes directly on the thesis. The important signal is not the marketing language; it is the widening set of tasks now being routed through model-driven execution rather than ordinary software or headcount.
Read verdict Source
Introducing the Services Track and Partner Hub of the Claude Partner Network

Benchmarks Enterprise operations Enterprise workflow automation [CONFIRMING]
Almost every large enterprise is moving AI into production, and many have discovered something important: a successful pilot is not the same as a system a business can run on. The real work—and the real opportunity—is in the integration, the evaluation, and the way people's work evolves. That's why the companies getting AI integration right are doing it.
Read verdict Source
Codex is becoming a productivity tool for everyone

Benchmarks Software engineering Autonomous software engineering and computer-use agents [CONFIRMING]
The Next Era of Knowledge Work report explores how Codex is transforming productivity through AI-powered research, data analysis, workflow automation, and content creation.
Read verdict Source
OpenAI frontier models and Codex are now available on AWS

Benchmarks Software engineering Frontier model release and benchmark movement [CONFIRMING]
OpenAI frontier models and Codex are now generally available on AWS, giving enterprises a new path to build with OpenAI through the AWS environments, controls, and procurement workflows they already use. Customers can get started with OpenAI on AWS and move faster from evaluation to production.
Read verdict Source
Introducing Claude Opus 4.8

Benchmarks General AI capability Frontier model release and benchmark movement [CONFIRMING]
We’re upgrading Claude Opus to a new version: Claude Opus 4.8. It builds on Opus 4.7 with improvements across benchmarks, and is a more effective collaborator. It’s available today for the same price. Opus 4.8 launches alongside several new features. Users on claude.ai now have control over the amount of effort Claude puts into a task. Claude Code has a.
Read verdict Source
Anthropic opens Milan office to support Italian enterprise, research, and developers

Benchmarks Software engineering Enterprise workflow automation [CONFIRMING]
Anthropic will open a new office in Milan, our sixth in Europe alongside London, Dublin, Paris, Zurich, and Munich. The Milan team will work with Italian companies and the country's developer community on building and scaling with Claude responsibly, and contribute to a conversation about AI that is already underway across Italian industry and public life.
Read verdict Source
An OpenAI model has disproved a central conjecture in discrete geometry

Benchmarks Scientific research Model and benchmark capability movement [CONFIRMING]
An OpenAI model solved the 80-year-old unit distance problem, disproving a major conjecture in discrete geometry and marking a milestone in AI-driven mathematics.
Read verdict Source
Databricks brings GPT-5.5 to enterprise agent workflows

Benchmarks Enterprise operations Frontier model release and benchmark movement [CONFIRMING]
Databricks uses GPT-5.5 for enterprise agent workflows after the model set a new state of the art on the OfficeQA Pro benchmark.
Read verdict Source
What Parameter Golf taught us about AI-assisted research

Benchmarks Software engineering Autonomous software engineering and computer-use agents [CONFIRMING]
Parameter Golf brought together 1,000+ participants and 2,000+ submissions to explore AI-assisted machine learning research, coding agents, quantization, and novel model design under strict constraints.
Read verdict Source
How NVIDIA engineers and researchers build with Codex

Benchmarks Software engineering Frontier model release and benchmark movement [CONFIRMING]
Teams use Codex with GPT-5.5 to ship production systems and turn research ideas into runnable experiments.
Read verdict Source
Scaling Trusted Access for Cyber with GPT-5.5 and GPT-5.5-Cyber

Benchmarks Cybersecurity Frontier model release and benchmark movement [CONFIRMING]
OpenAI expands Trusted Access for Cyber with GPT-5.5 and GPT-5.5-Cyber, helping verified defenders accelerate vulnerability research and protect critical infrastructure.
Read verdict Source
Introducing ChatGPT Futures: Class of 2026

Benchmarks Education Education and workforce adoption [CONFIRMING]
Meet the ChatGPT Futures Class of 2026—26 student innovators using AI to build, research, and drive real-world impact. Discover how this generation is redefining learning, creativity, and opportunity with ChatGPT.
Read verdict Source
How frontier firms are pulling ahead

Benchmarks Software engineering Frontier model release and benchmark movement [CONFIRMING]
OpenAI’s B2B Signals research shows how frontier enterprises deepen AI adoption, scale Codex-powered agentic workflows, and build durable competitive advantage.
Read verdict Source
GPT-5.5 Instant: smarter, clearer, and more personalized

Benchmarks General AI capability Frontier model release and benchmark movement [CONFIRMING]
GPT-5.5 Instant updates ChatGPT’s default model with smarter, more accurate answers, reduced hallucinations, and improved personalization controls.
Read verdict Source
Introducing GPT-5.5

Benchmarks Software engineering Frontier model release and benchmark movement [CONFIRMING]
Introducing GPT-5.5, our smartest model yet—faster, more capable, and built for complex tasks like coding, research, and data analysis across tools.
Read verdict Source
Making ChatGPT better for clinicians

Benchmarks Healthcare and life sciences Healthcare and life-sciences reasoning [CONFIRMING]
OpenAI makes ChatGPT for Clinicians free for verified U.S. physicians, nurse practitioners, and pharmacists, supporting clinical care, documentation, and research.
Read verdict Source
Introducing Claude Design by Anthropic Labs

Benchmarks Scientific research Frontier model release and benchmark movement [CONFIRMING]
Today, we’re launching Claude Design, a new Anthropic Labs product that lets you collaborate with Claude to create polished visual work like designs, prototypes, slides, one-pagers, and more. Claude Design is powered by our most capable vision model, Claude Opus 4.7 , and is available in research preview for Claude Pro, Max, Team, and Enterprise.
Read verdict Source
Introducing GPT-Rosalind for life sciences research

Benchmarks Healthcare and life sciences Frontier model release and benchmark movement [CONFIRMING]
OpenAI introduces GPT-Rosalind, a frontier reasoning model built to accelerate drug discovery, genomics analysis, protein reasoning, and scientific research workflows.
Read verdict Source
Introducing Claude Opus 4.7

Benchmarks Software engineering Frontier model release and benchmark movement [CONFIRMING]
Our latest model, Claude Opus 4.7, is now generally available. Opus 4.7 is a notable improvement on Opus 4.6 in advanced software engineering, with particular gains on the most difficult tasks. Users report being able to hand off their hardest coding work—the kind that previously needed close supervision—to Opus 4.7 with confidence. Opus 4.7 handles.
Read verdict Source
Anthropic’s Long-Term Benefit Trust appoints Vas Narasimhan to Board of Directors

Benchmarks Healthcare and life sciences Enterprise workflow automation [CONFIRMING]
Vas Narasimhan has been appointed to Anthropic's Board of Directors by the Anthropic Long-Term Benefit Trust. He is a physician-scientist and the Chief Executive Officer of Novartis—one of the world's leading innovative medicines companies—and shares Anthropic’s conviction that healthcare and life sciences are among the areas where AI has the greatest.
This is a low-signal vendor radar item. Keep it as context only unless a later benchmark, deployment, procurement change, or labour-market datapoint turns it into direct Appendix III evidence.
Read verdict Source
Introducing GPT-5.4 mini and nano

Benchmarks Software engineering Frontier model release and benchmark movement [CONFIRMING]
GPT-5.4 mini and nano are smaller, faster versions of GPT-5.4 optimized for coding, tool use, multimodal reasoning, and high-volume API and sub-agent workloads.
Read verdict Source
Introducing The Anthropic Institute

Benchmarks Scientific research Model and benchmark capability movement [CONFIRMING]
We’re launching The Anthropic Institute , a new effort to confront the most significant challenges that powerful AI will pose to our societies. The Anthropic Institute will draw on research from across Anthropic to provide information that other researchers and the public can use during our transition to a world containing much more powerful AI systems. In.
Read verdict Source
New ways to learn math and science in ChatGPT

Benchmarks Education Education and workforce adoption [CONFIRMING]
ChatGPT introduces interactive visual explanations for math and science, helping students explore formulas, variables, and concepts in real time.
Read verdict Source
Codex Security: now in research preview

Benchmarks Software engineering Autonomous software engineering and computer-use agents [CONFIRMING]
Codex Security is an AI application security agent that analyzes project context to detect, validate, and patch complex vulnerabilities with higher confidence and less noise.
Read verdict Source
How Balyasny Asset Management built an AI research engine

Benchmarks Scientific research Enterprise workflow automation [CONFIRMING]
By combining rigorous model evaluation, full-platform use of OpenAI, and agent workflows, Balyasny is reinventing investment research.
Read verdict Source
How Descript engineers multilingual video dubbing at scale

Benchmarks Media and content Multimodal content generation and media workflows [CONFIRMING]
Using OpenAI reasoning models, Descript unlocked automatic localization of large content libraries without losing timing or meaning.
Read verdict Source
Partnering with Mozilla to improve Firefox’s security

Benchmarks Software engineering Cyber defence and misuse monitoring [CONFIRMING]
AI models can now independently identify high-severity vulnerabilities in complex software. As we recently documented, Claude found more than 500 zero-day vulnerabilities (security flaws that are unknown to the software’s maintainers) in well-tested open-source software. In this post, we share details of a collaboration with researchers at Mozilla in which.
Read verdict Source
Introducing GPT-5.4

Benchmarks Software engineering Frontier model release and benchmark movement [CONFIRMING]
Introducing GPT-5.4, OpenAI’s most most capable and efficient frontier model for professional work, with state-of-the-art coding, computer use, tool search, and 1M-token context.
Read verdict Source
Introducing ChatGPT for Excel and new financial data integrations

Benchmarks Financial services Frontier model release and benchmark movement [CONFIRMING]
OpenAI introduces ChatGPT for Excel and new financial app integrations, powered by GPT-5.4 to accelerate modeling, research, and analysis in regulated environments.
Read verdict Source
GPT-5.3 Instant: Smoother, more useful everyday conversations

Benchmarks General AI capability Frontier model release and benchmark movement [CONFIRMING]
Official OpenAI release: GPT-5.3 Instant: Smoother, more useful everyday conversations.
Read verdict Source
Joint Statement from OpenAI and Microsoft

Benchmarks Scientific research Model and benchmark capability movement [CONFIRMING]
Microsoft and OpenAI continue to work closely across research, engineering, and product development, building on years of deep collaboration and shared success.
Read verdict Source
Pacific Northwest National Laboratory and OpenAI partner to accelerate federal permitting

Benchmarks Software engineering Frontier model release and benchmark movement [CONFIRMING]
OpenAI and Pacific Northwest National Laboratory introduce DraftNEPABench, a new benchmark evaluating how AI coding agents can accelerate federal permitting—showing potential to reduce NEPA drafting time by up to 15% and modernize infrastructure reviews.
Read verdict Source
Anthropic acquires Vercept to advance Claude's computer use capabilities

Benchmarks Media and content Autonomous software engineering and computer-use agents [CONFIRMING]
People are using Claude for increasingly complex work—writing and running code across entire repositories, synthesizing research from dozens of sources, and managing workflows that span multiple tools and teams. Computer use enables Claude to do all of that inside live applications, the way a person at a keyboard would. That means Claude can take on.
Read verdict Source
Why we no longer evaluate SWE-bench Verified

Benchmarks Software engineering Frontier model release and benchmark movement [CONFIRMING]
SWE-bench Verified is increasingly contaminated and mismeasures frontier coding progress. Our analysis shows flawed tests and training leakage. We recommend SWE-bench Pro.
Read verdict Source
Our First Proof submissions

Benchmarks Scientific research Model and benchmark capability movement [CONFIRMING]
We share our AI model’s proof attempts for the First Proof math challenge, testing research-grade reasoning on expert-level problems.
Read verdict Source
Making frontier cybersecurity capabilities available to defenders

Benchmarks Software engineering Frontier model release and benchmark movement [CONFIRMING]
Claude Code Security , a new capability built into Claude Code on the web, is now available in a limited research preview. It scans codebases for security vulnerabilities and suggests targeted software patches for human review, allowing teams to find and fix security issues that traditional methods often miss. Security teams face a common challenge: too.
Read verdict Source
Introducing EVMbench

Benchmarks General AI capability Frontier model release and benchmark movement [CONFIRMING]
OpenAI and Paradigm introduce EVMbench, a benchmark evaluating AI agents’ ability to detect, patch, and exploit high-severity smart contract vulnerabilities.
Read verdict Source
Introducing Claude Sonnet 4.6

Benchmarks Software engineering Frontier model release and benchmark movement [CONFIRMING]
Claude Sonnet 4.6 is our most capable Sonnet model yet . It’s a full upgrade of the model’s skills across coding, computer use, long-context reasoning, agent planning, knowledge work, and design. Sonnet 4.6 also features a 1M token context window in beta. For those on our Free and Pro plans , Claude Sonnet 4.6 is now the default model in claude.ai and.
Read verdict Source
Anthropic opens Bengaluru office and announces new partnerships across India

Benchmarks Software engineering Enterprise workflow automation [CONFIRMING]
India is the second-largest market for Claude.ai , home to a developer community doing some of the most technically intense AI work we see anywhere. Nearly half of Claude usage in India comprises computer and mathematical tasks: building applications, modernizing systems, and shipping production software. Today, as we officially open our Bengaluru office.
Read verdict Source
GPT-5.2 derives a new result in theoretical physics

Benchmarks Scientific research Frontier model release and benchmark movement [CONFIRMING]
A new preprint shows GPT-5.2 proposing a new formula for a gluon amplitude, later formally proved and verified by OpenAI and academic collaborators.
Read verdict Source
Scaling social science research

Benchmarks Media and content Multimodal content generation and media workflows [CONFIRMING]
GABRIEL is a new open-source toolkit from OpenAI that uses GPT to turn qualitative text and images into quantitative data, helping social scientists analyze research at scale.
Read verdict Source
Anthropic partners with CodePath to bring Claude to the US’s largest collegiate computer science program

Benchmarks Software engineering Autonomous software engineering and computer-use agents [CONFIRMING]
Anthropic is partnering with CodePath, the nation’s largest provider of collegiate computer science education, to redesign its coding curriculum as AI reshapes the field of software development. CodePath will put Claude and Claude Code at the center of its courses and career programs, giving more than 20,000 students at community colleges, state schools.
Read verdict Source
Introducing GPT-5.3-Codex-Spark

Benchmarks Software engineering Frontier model release and benchmark movement [CONFIRMING]
Introducing GPT-5.3-Codex-Spark—our first real-time coding model. 15x faster generation, 128k context, now in research preview for ChatGPT Pro users.
Read verdict Source
Anthropic is donating $20 million to Public First Action

Benchmarks Cybersecurity Cyber defence and misuse monitoring [CONFIRMING]
AI will bring enormous benefits —for science, technology, medicine, economic growth, and much more. But a technology this powerful also comes with considerable risks . Those risks might come from the misuse of the models: AI is already being exploited to automate cyberattacks ; in the future it might assist in the production of dangerous weapons . Risks.
Read verdict Source
GPT-5 lowers the cost of cell-free protein synthesis

Benchmarks Healthcare and life sciences Frontier model release and benchmark movement [CONFIRMING]
An autonomous lab combining OpenAI’s GPT-5 with Ginkgo Bioworks’ cloud automation cut cell-free protein synthesis costs by 40% through closed-loop experimentation.
Read verdict Source
Introducing GPT-5.3-Codex

Benchmarks Software engineering Frontier model release and benchmark movement [CONFIRMING]
GPT-5.3-Codex is a Codex-native agent that pairs frontier coding performance with general reasoning to support long-horizon, real-world technical work.
Read verdict Source
Introducing Claude Opus 4.6

Benchmarks Software engineering Frontier model release and benchmark movement [CONFIRMING]
We’re upgrading our smartest model. The new Claude Opus 4.6 improves on its predecessor’s coding skills. It plans more carefully, sustains agentic tasks for longer, can operate more reliably in larger codebases, and has better code review and debugging skills to catch its own mistakes. And, in a first for our Opus-class models, Opus 4.6 features a 1M token.
Read verdict Source
Anthropic partners with Allen Institute and Howard Hughes Medical Institute to accelerate scientific discovery

Benchmarks Healthcare and life sciences Healthcare and life-sciences reasoning [CONFIRMING]
Modern biological research generates data at unprecedented scale—from single-cell sequencing to whole-brain connectomics—yet transforming that data into validated biological insights remains a fundamental bottleneck. Knowledge synthesis, hypothesis generation, and experimental interpretation still depend on manual processes that can't keep pace with the.
Read verdict Source
Introducing Prism

Benchmarks Scientific research Frontier model release and benchmark movement [CONFIRMING]
Prism is a free LaTeX-native workspace with GPT-5.2 built in, helping researchers write, collaborate, and reason in one place.
Read verdict Source
How scientists are using Claude to accelerate research and discovery

Benchmarks Healthcare and life sciences Healthcare and life-sciences reasoning [CONFIRMING]
Last October we launched Claude for Life Sciences—a suite of connectors and skills that made Claude a better scientific collaborator. Since then, we've invested heavily in making Claude the most capable model for scientific work , with Opus 4.5 showing significant improvements in figure interpretation, computational biology, and protein understanding.
Read verdict Source
Advancing Claude in healthcare and the life sciences

Benchmarks Healthcare and life sciences Healthcare and life-sciences reasoning [CONFIRMING]
In October, we announced Claude for Life Sciences , our latest step in making Claude a productive research partner for scientists and clinicians, and in helping Claude to support those in industry bringing new scientific advancements to the public. Now, we’re expanding that feature set in two ways. First, we’re introducing Claude for Healthcare , a.
Read verdict Source
Deepening our collaboration with the U.S. Department of Energy

Benchmarks Public sector Model and benchmark capability movement [CONFIRMING]
OpenAI and the U.S. Department of Energy have signed a memorandum of understanding to deepen collaboration on AI and advanced computing in support of scientific discovery. The agreement builds on ongoing work with national laboratories and helps establish a framework for applying AI to high-impact research across the DOE ecosystem.
Read verdict Source
Introducing GPT-5.2-Codex

Benchmarks Software engineering Frontier model release and benchmark movement [CONFIRMING]
GPT-5.2-Codex is OpenAI’s most advanced coding model, offering long-horizon reasoning, large-scale code transformations, and enhanced cybersecurity capabilities.
Read verdict Source
Working with the US Department of Energy to unlock the next era of scientific discovery

Benchmarks Healthcare and life sciences Enterprise workflow automation [CONFIRMING]
Anthropic and the US Department of Energy (DOE) are announcing a multi-year partnership as part of the Genesis Mission— the Department’s initiative to use AI to cement America’s leadership in science. Our partnership focuses on three domains—American energy dominance, the biological and life sciences, and scientific productivity—and has the potential to.
Read verdict Source
Evaluating AI’s ability to perform scientific research tasks

Benchmarks Scientific research Frontier model release and benchmark movement [CONFIRMING]
OpenAI introduces FrontierScience, a benchmark testing AI reasoning in physics, chemistry, and biology to measure progress toward real scientific research.
Read verdict Source
Measuring AI’s capability to accelerate biological research

Benchmarks Scientific research Frontier model release and benchmark movement [CONFIRMING]
OpenAI introduces a real-world evaluation framework to measure how AI can accelerate biological research in the wet lab. Using GPT-5 to optimize a molecular cloning protocol, the work explores both the promise and risks of AI-assisted experimentation.
Read verdict Source
Advancing science and math with GPT-5.2

Benchmarks Scientific research Frontier model release and benchmark movement [CONFIRMING]
GPT-5.2 is OpenAI’s strongest model yet for math and science, setting new state-of-the-art results on benchmarks like GPQA Diamond and FrontierMath. This post shows how those gains translate into real research progress, including solving an open theoretical problem and generating reliable mathematical proofs.
Read verdict Source
Introducing GPT-5.2

Benchmarks Software engineering Frontier model release and benchmark movement [CONFIRMING]
GPT-5.2 is our most advanced frontier model for everyday professional work, with state-of-the-art reasoning, long-context understanding, coding, and vision. Use it in ChatGPT and the OpenAI API to power faster, more reliable agentic workflows.
Read verdict Source
Ten years

Benchmarks Scientific research Model and benchmark capability movement [CONFIRMING]
OpenAI reflects on ten years of progress, from early research breakthroughs to widely used AI systems that reshaped what’s possible. We share lessons from the past decade and why we remain optimistic about building AGI that benefits all of humanity.
Read verdict Source
OpenAI takes an ownership stake in Thrive Holdings to accelerate enterprise AI adoption

Benchmarks Scientific research Frontier model release and benchmark movement [CONFIRMING]
OpenAI takes an ownership stake in Thrive Holdings to accelerate enterprise AI adoption, embedding frontier research and engineering directly into accounting and IT services to boost speed, accuracy, and efficiency while creating a scalable model for industry-wide transformation.
Read verdict Source
Introducing shopping research in ChatGPT

Benchmarks Commerce and marketplace Model and benchmark capability movement [CONFIRMING]
Shopping research in ChatGPT helps you explore, compare, and discover products with personalized buyer’s guides that simplify decision-making.
Read verdict Source
GPT-5 and the future of mathematical discovery

Benchmarks Scientific research Frontier model release and benchmark movement [CONFIRMING]
UCLA Professor Ernest Ryu and GPT-5 solved a key question in optimization theory, showcasing AI’s role in accelerating mathematical discovery.
Read verdict Source
Introducing Claude Opus 4.5

Benchmarks Software engineering Frontier model release and benchmark movement [CONFIRMING]
Our newest model, Claude Opus 4.5, is available today. It’s intelligent, efficient, and the best model in the world for coding, agents, and computer use. It’s also meaningfully better at everyday tasks like deep research and working with slides and spreadsheets. Opus 4.5 is a step forward in what AI systems can do, and a preview of larger changes to how.
Read verdict Source
Early experiments in accelerating science with GPT-5

Benchmarks Scientific research Frontier model release and benchmark movement [CONFIRMING]
OpenAI introduces the first research cases showing how GPT-5 accelerates scientific progress across math, physics, biology, and computer science. Explore how AI and researchers collaborate to generate proofs, uncover new insights, and reshape the pace of discovery.
Read verdict Source
How evals drive the next chapter in AI for businesses

Benchmarks Enterprise operations Enterprise workflow automation [CONFIRMING]
Learn how evals help businesses define, measure, and improve AI performance—reducing risk, boosting productivity, and driving strategic advantage.
Read verdict Source
Introducing GPT-5.1 for developers

Benchmarks Software engineering Frontier model release and benchmark movement [CONFIRMING]
GPT-5.1 is now available in the API, bringing faster adaptive reasoning, extended prompt caching, improved coding performance, and new apply_patch and shell tools.
Read verdict Source
Measuring political bias in Claude

Benchmarks General AI capability Model and benchmark capability movement [CONFIRMING]
We want Claude to be seen as fair and trustworthy by people across the political spectrum, and to be unbiased and even-handed in its approach to political topics. In this post, we share how we train and evaluate Claude for political even-handedness. We also report the results of a new, automated, open-source evaluation for political neutrality that we’ve.
Read verdict Source
Disrupting the first reported AI-orchestrated cyber espionage campaign

Benchmarks Cybersecurity Enterprise workflow automation [CONFIRMING]
We recently argued that an inflection point had been reached in cybersecurity: a point at which AI models had become genuinely useful for cybersecurity operations, both for good and for ill. This was based on systematic evaluations showing cyber capabilities doubling in six months; we’d also been tracking real-world cyberattacks, observing how malicious.
Read verdict Source
GPT-5.1: A smarter, more conversational ChatGPT

Benchmarks General AI capability Frontier model release and benchmark movement [CONFIRMING]
We’re upgrading the GPT-5 series with warmer, more capable models and new ways to customize ChatGPT’s tone and style. GPT-5.1 starts rolling out today to paid users.
Read verdict Source
Anthropic invests $50 billion in American AI infrastructure

Benchmarks Scientific research Frontier model release and benchmark movement [CONFIRMING]
Today, we are announcing a $50 billion investment in American computing infrastructure, building data centers with Fluidstack in Texas and New York, with more sites to come. These facilities are custom built for Anthropic with a focus on maximizing efficiency for our workloads, enabling continued research and development at the frontier. The project will.
Read verdict Source
1 million business customers putting AI to work

Benchmarks Financial services Enterprise workflow automation [CONFIRMING]
More than 1 million business customers around the world now use OpenAI. Across healthcare, life sciences, financial services, and more, ChatGPT and our APIs are driving a new era of intelligent, AI-powered work.
Read verdict Source
Introducing IndQA

Benchmarks General AI capability Frontier model release and benchmark movement [CONFIRMING]
OpenAI introduces IndQA, a new benchmark for evaluating AI systems in Indian languages. Built with domain experts, IndQA tests cultural understanding and reasoning across 12 languages and 10 knowledge areas.
Read verdict Source
Introducing Aardvark: OpenAI’s agentic security researcher

Benchmarks Software engineering Cyber defence and misuse monitoring [CONFIRMING]
OpenAI introduces Aardvark, an AI-powered security researcher that autonomously finds, validates, and helps fix software vulnerabilities at scale. The system is in private beta—sign up to join early testing.
Read verdict Source
Advancing organizational transformation for business innovation

Benchmarks Public sector Enterprise workflow automation [CONFIRMING]
DNP rolled out ChatGPT Enterprise across ten core departments, achieving 95% faster patent research, 10x processing volume, 87% automation, and 70% knowledge reuse in three months.
Read verdict Source
Steuerrecht.com delivers client-ready legal analysis with ChatGPT

Benchmarks Scientific research Enterprise workflow automation [CONFIRMING]
Steuerrecht.com uses ChatGPT Business to streamline legal workflows, automate tax research, and deliver faster, client-ready analysis for law firms.
Read verdict Source
Advancing Claude for Financial Services

Benchmarks Financial services Financial workflow automation [CONFIRMING]
We're expanding Claude for Financial Services with an Excel add-in, additional connectors to real-time market data and portfolio analytics, and new pre-built Agent Skills, like building discounted cash flow models and initiating coverage reports. These updates build on Sonnet 4.5’s state of the art performance on financial tasks, topping the Finance Agent.
Read verdict Source
Consensus accelerates research with GPT-5 and Responses API

Benchmarks Scientific research Frontier model release and benchmark movement [CONFIRMING]
Consensus uses GPT-5 and OpenAI’s Responses API to power a multi-agent research assistant that reads, analyzes, and synthesizes evidence in minutes—helping over 8 million researchers accelerate scientific discovery.
Read verdict Source
Expanding our use of Google Cloud TPUs and Services

Benchmarks Scientific research Agent platform and API infrastructure [CONFIRMING]
Today, we are announcing that we plan to expand our use of Google Cloud technologies, including up to one million TPUs, dramatically increasing our compute resources as we continue to push the boundaries of AI research and product development. The expansion is worth tens of billions of dollars and is expected to bring well over a gigawatt of capacity.
Read verdict Source
Claude for Life Sciences

Benchmarks Healthcare and life sciences Healthcare and life-sciences reasoning [CONFIRMING]
Increasing the rate of scientific progress is a core part of Anthropic’s public benefit mission. We are focused on building the tools to allow researchers to make new discoveries – and eventually, to allow AI models to make these discoveries autonomously.
Read verdict Source
Introducing Claude Haiku 4.5

Benchmarks Software engineering Frontier model release and benchmark movement [CONFIRMING]
Claude Haiku 4.5, our latest small model, is available today to all users. What was recently at the frontier is now cheaper and faster. Five months ago, Claude Sonnet 4 was a state-of-the-art model. Today, Claude Haiku 4.5 gives you similar levels of coding performance but at one-third the cost and more than twice the speed.
Read verdict Source
Expert Council on Well-Being and AI

Benchmarks Healthcare and life sciences Healthcare and life-sciences reasoning [CONFIRMING]
OpenAI’s new Expert Council on Well-Being and AI brings together leading psychologists, clinicians, and researchers to guide how ChatGPT supports emotional health, especially for teens. Learn how their insights are shaping safer, more caring AI experiences.
Read verdict Source
Anthropic and Salesforce expand partnership to bring Claude to regulated industries

Benchmarks Financial services Financial workflow automation [CONFIRMING]
Anthropic and Salesforce today announced an expanded partnership to make Claude a preferred model for Salesforce's Agentforce platform, enabling Salesforce customers in financial services, healthcare, cybersecurity, and life sciences to use trusted AI while keeping sensitive data secure. Additionally, Salesforce is deploying Claude Code across its global.
Read verdict Source
Defining and evaluating political bias in LLMs

Benchmarks General AI capability Model and benchmark capability movement [CONFIRMING]
Learn how OpenAI evaluates political bias in ChatGPT through new real-world testing methods that improve objectivity and reduce bias.
Read verdict Source
Rahul Patil joins Anthropic as Chief Technology Officer

Benchmarks Cybersecurity Enterprise workflow automation [CONFIRMING]
We're excited to announce that Rahul Patil has joined Anthropic as our Chief Technology Officer. Rahul will oversee our engineering organization across product, compute, infrastructure, inference, data science, and security as we scale Claude to meet growing enterprise demand worldwide. Rahul brings over 20 years of experience building and maintaining.
Read verdict Source
Introducing AgentKit, new Evals, and RFT for agents

Benchmarks Software engineering Agent platform and API infrastructure [CONFIRMING]
Today, we’re releasing new tools to help developers go from prototype to production faster: AgentKit, expanded evals capabilities, and reinforcement fine-tuning for agents.
Read verdict Source
Empowering teams to unlock insights faster at OpenAI

Benchmarks Customer operations Model and benchmark capability movement [CONFIRMING]
OpenAI’s research assistant helps teams analyze millions of support tickets, surface insights faster, and scale curiosity across the company.
Read verdict Source
Introducing Claude Sonnet 4.5

Benchmarks Software engineering Frontier model release and benchmark movement [CONFIRMING]
Claude Sonnet 4.5 is the best coding model in the world. It's the strongest model for building complex agents. It’s the best model at using computers. And it shows substantial gains in reasoning and math. Code is everywhere. It runs every application, spreadsheet, and software tool you use. Being able to use those tools and reason through hard problems is.
Read verdict Source
Measuring the performance of our models on real-world tasks

Benchmarks General AI capability Education and workforce adoption [CONFIRMING]
OpenAI introduces GDPval, a new evaluation that measures model performance on real-world economically valuable tasks across 44 occupations.
Read verdict Source
Introducing ChatGPT Pulse

Benchmarks Scientific research Model and benchmark capability movement [CONFIRMING]
Today we're releasing a preview of ChatGPT Pulse to Pro users on mobile. Pulse is a new experience where ChatGPT proactively does research to deliver personalized updates based on your chats, feedback, and connected apps like your calendar.
Read verdict Source
How people are using ChatGPT

Benchmarks Scientific research Education and workforce adoption [CONFIRMING]
New research from the largest study of ChatGPT use shows how the tool creates economic value through both personal and professional use. Adoption is broadening beyond early users, closing gaps and making AI a part of everyday life.
Read verdict Source
Building more helpful ChatGPT experiences for everyone

Benchmarks General AI capability Model and benchmark capability movement [CONFIRMING]
We’re partnering with experts, strengthening protections for teens with parental controls, and routing sensitive conversations to reasoning models in ChatGPT.
Read verdict Source
Accelerating life sciences research

Benchmarks Healthcare and life sciences Healthcare and life-sciences reasoning [CONFIRMING]
Discover how a specialized AI model, GPT-4b micro, helped OpenAI and Retro Bio engineer more effective proteins for stem cell therapy and longevity research.
Read verdict Source
Scaling domain expertise in complex, regulated domains

Benchmarks Scientific research Model and benchmark capability movement [CONFIRMING]
Discover how Blue J is transforming tax research with AI-powered tools built on GPT-4.1. By combining domain expertise with Retrieval-Augmented Generation, Blue J delivers fast, accurate, and fully-cited tax answers—trusted by professionals across the US, Canada, and the UK.
Read verdict Source
GPT-5 and the new era of work

Benchmarks Enterprise operations Frontier model release and benchmark movement [CONFIRMING]
GPT-5 is OpenAI’s most advanced model—transforming enterprise AI, automation, and workforce productivity in the new era of intelligent work.
Read verdict Source
Introducing GPT-5 for developers

Benchmarks Software engineering Frontier model release and benchmark movement [CONFIRMING]
Introducing GPT-5 in our API platform—offering high reasoning performance, new controls for devs, and best-in-class results on real coding tasks.
Read verdict Source
Medical research with GPT-5

Benchmarks Healthcare and life sciences Frontier model release and benchmark movement [CONFIRMING]
Learn how GPT-5 is used for medical research.
Read verdict Source
Introducing GPT-5

Benchmarks Software engineering Frontier model release and benchmark movement [CONFIRMING]
We are introducing GPT‑5, our best AI system yet. GPT‑5 is a significant leap in intelligence over all our previous models, featuring state-of-the-art performance across coding, math, writing, health, visual perception, and more.
Read verdict Source