Comment by melodyogonna
Comment by melodyogonna 2 days ago
How can it be specifically trained on benchmarks when it is leading on blind chatbot tests?
The post you quoted is not a Grok problem if other LLMs are also failing, it seems, to me, to be a fundamental failure in the current approach to AI model development.
Any LLM that is uncensored does well on Chatbot tests because a refusal is an automatic loss.
And since 30% of people using Chatbots are Gooning it up theres far more refusals...