Comment by melodyogonna
Comment by melodyogonna 10 months ago
How can it be specifically trained on benchmarks when it is leading on blind chatbot tests?
The post you quoted is not a Grok problem if other LLMs are also failing, it seems, to me, to be a fundamental failure in the current approach to AI model development.
Any LLM that is uncensored does well on Chatbot tests because a refusal is an automatic loss.
And since 30% of people using Chatbots are Gooning it up theres far more refusals...