Comment by dinobones
Comment by dinobones 4 days ago
OAI revealed on Twitter that there is no "system" at inference time, this is just a model.
Did they maybe expand to a tree during training to learn more robust reasoning? Maybe. But it still comes down to a regular transformer model at inference time.
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
https://arxiv.org/pdf/2403.09629
> In the Self-Taught Reasoner (STaR, Zelikman et al. 2022), useful thinking is learned by inferring rationales from few-shot examples in question-answering and learning from those that lead to a correct answer. This is a highly constrained setting – ideally, a language model could instead learn to infer unstated rationales in arbitrary text. We present Quiet-STaR, a generalization of STaR in which LMs learn to generate rationales at each token to explain future text, improving their predictions.
>[...]
>We generate thoughts, in parallel, following all tokens in the text (think). The model produces a mixture of its next-token predictions with and without a thought (talk). We apply REINFORCE, as in STaR, to increase the likelihood of thoughts that help the model predict future text while discarding thoughts that make the future text less likely (learn).