Comment by CamperBob2
Comment by CamperBob2 18 hours ago
Having read the reply in 2.5 Pro, I have to agree with you there. I'm surprised it whiffed on those details. They are fairly basic and rather important. It could have provided a better answer (I fed your reply back to it at https://g.co/gemini/share/7f87b5e9d699 ), but it did a crappy job deciding what to include in its initial response.
I don't agree that you can pick one cherry example and use it to illustrate anything general about the progress of the models in general, though. There are far too many counterexamples to enumerate.
(Actually I suspect what will happen is that we'll change the way we write documentation to make it easy for LLMs to assimilate. I know I'm already doing that myself.)
> I don't agree that you can pick one cherry example
Benchmarks and evaluations are made of cherry picked examples. What makes my example invalid, and benchmark prompts valid? (it's a rethorical question, you don't need to answer).
> write documentation to make it easy for LLMs to assimilate.
If we ever do that, it means LLMs failed at their job. They are supposed to help and understand us, not the other way around.