Comment by dinobones

Comment by dinobones 4 days ago

14 replies

OAI revealed on Twitter that there is no "system" at inference time, this is just a model.

Did they maybe expand to a tree during training to learn more robust reasoning? Maybe. But it still comes down to a regular transformer model at inference time.

ValentinA23 4 days ago

Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

> In the Self-Taught Reasoner (STaR, Zelikman et al. 2022), useful thinking is learned by inferring rationales from few-shot examples in question-answering and learning from those that lead to a correct answer. This is a highly constrained setting – ideally, a language model could instead learn to infer unstated rationales in arbitrary text. We present Quiet-STaR, a generalization of STaR in which LMs learn to generate rationales at each token to explain future text, improving their predictions.


>We generate thoughts, in parallel, following all tokens in the text (think). The model produces a mixture of its next-token predictions with and without a thought (talk). We apply REINFORCE, as in STaR, to increase the likelihood of thoughts that help the model predict future text while discarding thoughts that make the future text less likely (learn).

quantadev 4 days ago

I don't think you can claim you know what's happening internally when OpenAI processes a request. They are a competitive company and will lie for competitive reasons. Most people think Q-Star is doing multiple inferences to accomplish a single task, and that's what all the evidence suggests. Whatever Sam Altman says means absolutely nothing, but I don't think he's claimed they use only a single inference either.

pizza 4 days ago


  • nell 4 days ago

    > I wouldn't call o1 a "system". It's a model, but unlike previous models, it's trained to generate a very long chain of thought before returning a final answer

    • astrange 4 days ago

      That answer seems to conflict with "in the future we'd like to give users more control over the thinking time".

      I've gotten mini to think harder by asking it to, but it didn't make a better answer. Though now I've run out of usage limits for both of them so can't try any more…

      • qeternity 4 days ago

        I'm not convinced there isn't more going on behind the scenes but influencing test-time compute via prompt is a pretty universal capability.

        • whimsicalism 4 days ago

          not in a way that it is effectively used - in real life all of the papers using CoT compare against a weak baseline and the benefits level off extremely quickly.

          nobody except for recent deepmind research has shown test time scaling like o1

      • bratwurst3000 4 days ago

        i am telling claude to give me not the obvious answer. that put thinking time up and the quality of answers is better. hope it helps.