Comment by zozbot234

Comment by zozbot234 a year ago

How does this benchmark against Reflection, which was fine-tuned to do the same thing-- provide a detailed Chain of Thought with self-corrections, then write out a final answer?

kkzz99 a year ago

Pretty sure Reflection-70B was a complete scam. They did the ole bait and switch. The model that they uploaded was completely under-performing compared to their own benchmarks and the "secret API" was just a GPT-4 & Claude wrapper.

Reply View 2 replies

zozbot234 a year ago

I'm aware of the issue with their purported benchmarks, in fact some testing had Reflection 70B performing a bit worse than plain Llama-3.1 70B. Does G1 do any better?

Reply View | 1 reply
- Yiin a year ago
  
  g1 is not a model, it's a prompt, so not sure what you would be comparing. Claude vs Claude w/ g1 promp?
  
  Reply View | 0 replies

m3kw9 a year ago

You still believe it was real? They had a model then they said it couldn’t reproduce those results lmao

Reply View 1 reply

zozbot234 a year ago

They seem to have a fine-tune of Llama 3 70B that's available for download, so obviously "real" in that sense. That ought to be better behaved than a pure system prompt approach.

Reply View | 0 replies