sebzim4500 5 days ago

This is basically CoT, so it's already the norm for a lot of benchmarks. I think the value proposition here is that it puts a nice UX around using it in a chat interface.

  • ehsanu1 5 days ago

    That was my initial position too, but I think there is a search efficiency story here as well. CoT comes in many flavors and improves when tailored to the problem domain. If the LLM can instead figure out the right strategy to use to problem solve for a given problem, this may improve performance per compute vs discovering this at inference time.

    Tailoring prompts is likely still the best way to maximize performance when you can, but in broader domains you'd work around this through strategies like asking the LLM to combine predefined reasoning modules, or creating multiple reasoning chains and merging/comparing them, explicit MCTS etc. I think those strategies will still be useful for a good while, but pieces of that search process, especially directing the search more efficiently, move to the LLMs over time as they get trained with this kind of data.

  • Meganet 4 days ago

    Its like saying geometry is just math. Proofs are just math.

    They didn't train a model for millions from experts to just basically use CoT now. Thats a harsh simplification, probably.