Comment by sheepscreek

Comment by sheepscreek 3 days ago

It seems their tests rely on Claude alone. It’s not safe to assume that Codex or Gemini will behave the same way as Claude. I use all three and each has its own idiosyncrasies.

verdverm 3 days ago

I've done very similar things with my custom agent that uses Gemini and have gotten very similar results. Working on the evals to back that claim up

Reply View 0 replies