Comment by zozbot234
I'm aware of the issue with their purported benchmarks, in fact some testing had Reflection 70B performing a bit worse than plain Llama-3.1 70B. Does G1 do any better?
I'm aware of the issue with their purported benchmarks, in fact some testing had Reflection 70B performing a bit worse than plain Llama-3.1 70B. Does G1 do any better?
g1 is not a model, it's a prompt, so not sure what you would be comparing. Claude vs Claude w/ g1 promp?