Comment by jasonjmcghee

Comment by jasonjmcghee 2 days ago

Idk if I'm just holding it wrong, but calling Gemini 3 "the best model in the world" doesn't line up with my experience at all.

It seems to just be worse at actually doing what you ask.

cj 2 days ago

It's like saying "Star Wars is the best movie in the world" - to some people it is. To others it's terrible.

I feel like it would be advantageous to move away from a "one model fits all" mindset, and move towards a world where we have different genres of models that we use for different things.

The benchmark scores are turning into being just as useful as tomatometer movie scores. Something can score high, but if that's not the genre you like, the high score doesn't guarantee you'll like it.

Reply View 4 replies

everdrive 2 days ago

Outside of experience and experimentation, is there a good way to know what models are strong for what tasks?

Reply View | 3 replies
- grahamplace 2 days ago
  
  See: https://lmarena.ai/leaderboard
  
  Reply View | 1 reply
  
  jasonjmcghee 2 days ago
  
  Unless you overfit to benchmark style scenarios and are worse for real-world use.
  
  Reply View | 0 replies
- jpollock 2 days ago
  
  Not really, it's like asking which C compiler was best back in the 90s.
  You had Watcom, Intel, GCC, Borland, Microsoft, etc.
  They all had different optimizations and different target markets.
  Best to make your tooling model agnostic. I understand that tuned prompts are model _version_ specific, so you will need this anyways.
  
  Reply View | 0 replies

wrsh07 2 days ago

It's a good model. Zvi also thought it was the best model until Opus 4.5 was announced a few hours after he wrote his post

https://thezvi.substack.com/p/gemini-3-pro-is-a-vast-intelli...

Reply View 0 replies

matwood 2 days ago

What I like most about Gemini is it's perfectly happy to say what I asked it to proofread or improve is good as it is. Never has ChatGPT said, 'this is good to go', even its own output that it just said was good to go.

Reply View 0 replies