Anthropic and OpenAI have officially entered a new phase of direct competition. On February 5, 2026, both AI heavyweights released major new models on the same day: Claude Opus 4.6 on one side, GPT-5.3-Codex on the other.
Two launches, two ambitious roadmaps, and a flood of benchmarks designed to prove technical superiority—especially in software development, agentic workflows, and professional use cases.
Beyond the marketing noise, what do these models actually bring to the table? And more importantly, which one truly pulls ahead when the numbers are examined closely?
Let’s break it down.
Claude Opus 4.6: One Million Tokens and Coordinated AI Agents
Anthropic is moving fast. Just three months after Claude Opus 4.5, the company has released Claude Opus 4.6, and the headline feature is hard to miss: a 1-million-token context window, currently available in beta.
This massive context size allows the model to ingest entire codebases, large documentation sets, or long-running conversations without losing coherence. In practical terms, it dramatically reduces “context rot”—the performance degradation that occurs when models struggle to retain early information in long prompts.
On the MRCR v2 benchmark, which measures the ability to retrieve buried information from extremely large inputs, Opus 4.6 scores 76%, compared to just 18.5% for Sonnet 4.5. That gap alone highlights a major leap in long-context reasoning.
Agent Teams: Parallel AI Workflows
Another major addition is the introduction of Agent Teams in Claude Code. Instead of relying on a single sequential agent, Opus 4.6 can now coordinate multiple agents working in parallel.
For example:
- One agent handles frontend logic
- Another manages APIs
- A third focuses on migrations or refactoring
These agents automatically communicate and synchronize their progress, enabling faster and more structured execution of complex engineering tasks.
Real-World Engineering Results
Anthropic backed up its claims with real-world use cases:
- SentinelOne reported that Opus 4.6 completed a multi-million-line codebase migration “like a senior engineer,” cutting execution time in half.
- Rakuten stated that the model autonomously closed 13 issues and assigned 12 more in a single day across six repositories.
- In cybersecurity testing, Opus 4.6 reportedly identified over 500 zero-day vulnerabilities in open-source projects during preliminary evaluations.
- Norway’s sovereign wealth fund (NBIM) tested the model across 40 cybersecurity investigations, where Opus 4.6 outperformed version 4.5 in 38 out of 40 blind comparisons.
Enterprise Productivity Features
Anthropic also unveiled a new product integration: Claude for PowerPoint, currently available as a research preview. Combined with recent Excel improvements, users can structure data in spreadsheets and generate fully branded presentations directly—aligned with existing templates and corporate styles.
GPT-5.3-Codex: Faster Execution and Self-Improving AI

OpenAI launched GPT-5.3-Codex on the same day, positioning it as a major evolution over GPT-5.2-Codex. The model merges advanced coding capabilities with stronger reasoning and professional knowledge, while delivering a claimed 25% performance boost.
This gain comes from infrastructure optimizations and improved token efficiency, allowing the model to do more work with fewer tokens.
The First Self-Improving OpenAI Model
The most notable innovation is self-improvement. GPT-5.3-Codex is the first OpenAI model to actively assist in its own development.
According to OpenAI:
- Early versions helped debug training runs
- Assisted with deployment workflows
- Analyzed evaluation results
- Helped refine testing frameworks
Engineers reportedly saw meaningful acceleration across multiple stages of the development pipeline—a milestone for AI-assisted AI research.
Benchmark Performance
GPT-5.3-Codex shows clear improvements across multiple coding benchmarks:
- Terminal-Bench 2.0: 77.3% (up from 64%)
- SWE-Bench Pro: 56.8% (slightly up from 56.4%)
- OSWorld-Verified: 64.7% (up from 38.2%)
OpenAI also highlights improved efficiency, with fewer tokens consumed for equivalent tasks.
Interactive Agentic Coding
Collaboration is another focus area. GPT-5.3-Codex allows users to interact with the model while it’s working, without losing context. Inside the Codex app, the model provides live progress updates, enabling real-time discussion, clarification, and course correction during execution.
Security and Access Limitations
In cybersecurity terms, GPT-5.3-Codex is rated “High Capability” under OpenAI’s Preparedness Framework. While OpenAI states there is no definitive proof that the model can fully automate cyberattacks, it is taking a precautionary approach.
As a result:
- API access is temporarily delayed
- The model is available via the Codex app, CLI, IDE extensions, and the web
- Access is limited to paid ChatGPT tiers (Plus, Pro, Business, Enterprise, Edu), with temporary availability for Free and Go users
Benchmark Face-Off: Claude Opus 4.6 vs GPT-5.3-Codex
Direct comparisons are difficult due to different evaluation choices, but a few benchmarks allow partial alignment:
| Benchmark | Claude Opus 4.6 | GPT-5.3-Codex | Winner |
|---|---|---|---|
| Terminal-Bench 2.0 (Agentic coding) | 65.4% | 77.3% | 🏆 OpenAI |
| OSWorld-Verified (Computer use agents) | 72.7% | 64.7% | 🏆 Anthropic |
| SWE-Bench (Real-world software) | 80.8% (Verified) | 56.8% (Pro) | Hard to compare |
| GDPval (High-value work tasks) | 1606 Elo | 70.9% wins/ties | Different metrics |
The data shows specialization rather than domination.


Pricing and Availability
Both models share identical base API pricing:
- $5 per million input tokens
- $25 per million output tokens
Anthropic applies a premium tier for requests exceeding 200,000 tokens to support the 1-million-token context window. OpenAI has not yet announced special pricing for GPT-5.3-Codex.
Claude Opus 4.6 is available via:
- claude.ai (Pro, Max, Team, Enterprise)
- API
- Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry
GPT-5.3-Codex is available via:
- Codex app
- CLI and IDE extensions
- Web interface for ChatGPT subscribers
(API access coming later)
So… Who Actually Wins?
There is no single winner—only different strengths.
- GPT-5.3-Codex excels in speed, execution efficiency, and interactive agentic coding.
- Claude Opus 4.6 dominates long-context reasoning, large codebase analysis, and coordinated multi-agent workflows.
This simultaneous release highlights just how fierce the competition between Anthropic and OpenAI has become. Each model pushes the other forward—and for developers and enterprises, that’s ultimately the real win.
And if you'd like to go a step further in supporting us, you can treat us to a virtual coffee ☕️. Thank you for your support ❤️!
We do not support or promote any form of piracy, copyright infringement, or illegal use of software, video content, or digital resources.
Any mention of third-party sites, tools, or platforms is purely for informational purposes. It is the responsibility of each reader to comply with the laws in their country, as well as the terms of use of the services mentioned.
We strongly encourage the use of legal, open-source, or official solutions in a responsible manner.


Comments