Comment by zozbot234
How does this benchmark against Reflection, which was fine-tuned to do the same thing-- provide a detailed Chain of Thought with self-corrections, then write out a final answer?
How does this benchmark against Reflection, which was fine-tuned to do the same thing-- provide a detailed Chain of Thought with self-corrections, then write out a final answer?
Pretty sure Reflection-70B was a complete scam. They did the ole bait and switch. The model that they uploaded was completely under-performing compared to their own benchmarks and the "secret API" was just a GPT-4 & Claude wrapper.