Comment by dinobones

Comment by dinobones a year ago

OAI revealed on Twitter that there is no "system" at inference time, this is just a model.

Did they maybe expand to a tree during training to learn more robust reasoning? Maybe. But it still comes down to a regular transformer model at inference time.

ValentinA23 a year ago

Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

https://arxiv.org/pdf/2403.09629

> In the Self-Taught Reasoner (STaR, Zelikman et al. 2022), useful thinking is learned by inferring rationales from few-shot examples in question-answering and learning from those that lead to a correct answer. This is a highly constrained setting – ideally, a language model could instead learn to infer unstated rationales in arbitrary text. We present Quiet-STaR, a generalization of STaR in which LMs learn to generate rationales at each token to explain future text, improving their predictions.

>[...]

>We generate thoughts, in parallel, following all tokens in the text (think). The model produces a mixture of its next-token predictions with and without a thought (talk). We apply REINFORCE, as in STaR, to increase the likelihood of thoughts that help the model predict future text while discarding thoughts that make the future text less likely (learn).

Reply View 0 replies

quantadev a year ago

I don't think you can claim you know what's happening internally when OpenAI processes a request. They are a competitive company and will lie for competitive reasons. Most people think Q-Star is doing multiple inferences to accomplish a single task, and that's what all the evidence suggests. Whatever Sam Altman says means absolutely nothing, but I don't think he's claimed they use only a single inference either.

Reply View 6 replies

whimsicalism a year ago

what is “all the evidence”? please share

Reply View | 5 replies
- quantadev a year ago
  
  I recommend getting on Twitter to follow closely the leading individuals in the field of AI, and also watch the leading Youtube channels dedicated to AI research.
  
  Reply View | 4 replies
  
  whimsicalism a year ago
  
  can you link to one speculating about multiple inferences for their CoT? i am curious
  e: answer to my own question https://x.com/_xjdr/status/1835352391648158189
  
  Reply View | 3 replies

pizza a year ago

Source?

Reply View 5 replies

nell a year ago

> I wouldn't call o1 a "system". It's a model, but unlike previous models, it's trained to generate a very long chain of thought before returning a final answer
https://x.com/polynoamial/status/1834641202215297487

Reply View | 4 replies
- astrange a year ago
  
  That answer seems to conflict with "in the future we'd like to give users more control over the thinking time".
  I've gotten mini to think harder by asking it to, but it didn't make a better answer. Though now I've run out of usage limits for both of them so can't try any more…
  
  Reply View | 3 replies
  
  qeternity a year ago
  
  I'm not convinced there isn't more going on behind the scenes but influencing test-time compute via prompt is a pretty universal capability.
  
  Reply View | 1 reply
  
  whimsicalism a year ago
  
  not in a way that it is effectively used - in real life all of the papers using CoT compare against a weak baseline and the benefits level off extremely quickly.
  nobody except for recent deepmind research has shown test time scaling like o1
  
  Reply View | 0 replies
  
  bratwurst3000 a year ago
  
  i am telling claude to give me not the obvious answer. that put thinking time up and the quality of answers is better. hope it helps.
  
  Reply View | 0 replies