Claude Opus 4.6 vs GPT-5.3 Codex

Greg Isenberg

14,692 views • yesterday

Video Summary

Today's massive day in AI saw the simultaneous release of Anthropic's Opus 4.6 and OpenAI's GPT 5.3 Codeex. This video delves into the nuances of these powerful new models, comparing their capabilities and providing practical tips for developers to leverage them effectively. Expert Morgan Linton, with experience at Sonos and as an AI investor, joins the discussion, offering deep technical insights. A key takeaway is that Opus 4.6 excels in autonomous, deep planning agent teams, while GPT 5.3 Codeex shines as an interactive collaborator, allowing for mid-execution steering and faster iteration, catering to different engineering methodologies.

A fascinating demonstration involved rebuilding a multi-billion dollar app, Poly Market, to showcase the distinct approaches of each model. Opus 4.6, with its agent teams, took a research-intensive, multi-agent approach, while GPT 5.3 Codeex adopted a more direct, iterative development path. The experiment revealed that while Codeex built a functional prototype significantly faster, Opus 4.6 ultimately delivered a more polished and feature-rich final product in this specific test.

Short Highlights

Understanding Opus 4.6 Configuration [03:04]

To ensure you are using Opus 4.6, update your cloud CLI to version 2.1.32 or higher.
The settings.json file is crucial for configuration, allowing you to specify claude-opus-4-6 or simply opus as the model.
The key feature "agent teams" must be explicitly enabled in settings.json by setting claude-code.experimental.agent-teams to 1.
For API users, "adaptive thinking" is a new feature in Opus 4.6 that allows users to select an "effort" level for the model's processing.

"The key thing that you want to do is the in my opinion the coolest feature that they added with 46 is agent teams."

Philosophical Divergence: Opus 4.6 vs. GPT 5.3 Codeex [08:30]

The two models are philosophically diverging, mirroring a split in how engineers approach AI-assisted development.
Codeex 53 is framed as an interactive collaborator, where users steer the model mid-execution and course-correct as it works.
Opus 4.6 emphasizes a more autonomous, thoughtful system that plans deeply, runs longer, and requires less direct human input.
This split reflects a preference for either tight human-in-the-loop control or delegating entire chunks of work for review.

"Some want tight human and loop control. Others want to delegate whole chunks of work and review the result."

Model Capabilities and Benchmarks [11:38]

Opus 4.6 boasts a significantly larger context window of 1 million tokens, designed for reasoning over entire documents and repositories.
GPT 5.3 Codeex's context window is around 200,000 tokens, optimized for progressive execution rather than total recall.
For coding benchmarks, Opus 4.6 excels in code comprehension, architectural sensitivity in refactors, and explaining system behavior.
GPT 5.3 Codeex performed better overall on coding benchmarks, including SWDbench Pro and Terminal Bench, suggesting it might be better for end-to-end app generation.

"Claude is better when the task is understand everything first and then decide. GBT3 53 Codeex is probably better when the task is decide fast act iterate."

Agentic Behavior and User Interaction [13:49]

Opus 4.6's standout feature is multi-agent orchestration, allowing for the setup of multiple agents to work autonomously.
GPT 5.3 Codeex focuses on task-driven autonomy with "task steering," enabling users to intervene and correct the model mid-task.
Opus 4.6's approach is less forgiving for mid-task corrections, often requiring a restart, while Codeex allows for in-line fixes.

"With Opus, you'll kind of be stopping it and then starting somewhat fresh, but it has a pretty big context window, so so it knows what it did."

Demo: Rebuilding Poly Market [15:31]

The experiment involved prompting Opus 4.6 to "build a team" of agents for specific roles (architecture, market understanding, UX, testing).
GPT 5.3 Codeex was prompted to "think deeply" about the same aspects without explicit agent creation.
Opus 4.6 launched parallel research agents, performing web searches and gathering information across multiple functions.
Codeex immediately began scaffolding the project and wiring the core market math and trading engine.

"So you've got uh Codeex is out here building and is is like building the engine. Uh with Opus 46, it still has agents out there like doing research work."

Codeex's Rapid Development and Output [22:13]

GPT 5.3 Codeex completed its version of the Poly Market competitor in 3 minutes and 47 seconds.
It generated a functional prototype, including a test suite (10 out of 10 tests passed), an LMSR market maker engine, and a REST API router.
The generated application was functional and passed its internal tests, demonstrating a rapid, end-to-end development capability.

"Codeex built a competitor to Poly Market in 3 minutes and 47 seconds."

Opus 4.6's Thoroughness and Token Usage [26:40]

Opus 4.6's multi-agent approach consumed significantly more tokens, with each agent using over 25,000 tokens, totaling over 100,000 tokens for the initial research phase.
The model's output included a comprehensive set of 96 tests and a more detailed breakdown of delivered components from each "team member."
The final Opus 4.6 output was considered more polished and feature-rich, with elements like a leaderboard and portfolio section that were not explicitly requested.

"Each one of these agents has used over 25,000 tokens... you're talking about over a 100,000 tokens used uh in doing this."

UI Design and Iteration Comparison [34:33]

A prompt to Codeex to perform a "major design refresh" in the style of Jack Dorsey resulted in a cleaner, more elegant interface with interactive elements.
Opus 4.6's redesign also improved the UI, but the interaction with Codeex for iterative design changes, including mid-stream corrections and persona-based design requests, was noted as a strength.
The test revealed that Codeex could adapt to specific design inspirations like Jack Dorsey's aesthetic, a capability that seemed more challenging for Opus 4.6 in this instance.

"I was looking for a Capslock major upgrade. Uh that that might mean uh way more copy, way more images. Way more storytelling."

Final Impressions and Winner [45:09]

While Codeex was significantly faster in initial development, Opus 4.6 was deemed the winner in this specific test due to its more comprehensive output, detailed testing, and refined user interface.
The comparison highlighted that Opus 4.6's agent orchestration is a key differentiating feature for complex, research-intensive tasks.
The video suggests that the choice between the models depends on the user's preferred development methodology: speed and iteration (Codeex) versus deep planning and autonomous execution (Opus 4.6).