Comment by austinbaggio

Not really. But, You can go viral again with a "Coding Agents with memory build better software using less tokens" showcasing how you benchmarked a "twitter rebuild" -

1. Setup Claude Code to build some layers of the stack

2. Setup Codex to build others.

In one instance equip them both with your product. Maybe bake in some tribal knowledge.

In another instance let them work raw.

In both instances, capture:

     - Time to completion
     - Tokens spent
     - Ability to meet original spec
     - Subjective quality 
     - Number of errors and categorize between the layers, to state something like "raw-claude's backend kept failing with raw-codex's frontend" etc

I imagine this benchmark working well in your favor.