Comment by sippeangelo

Comment by sippeangelo 10 months ago

I can't claim that I have any idea of how this model is built, but from their shifty excuses touching on "alignment" I'm confident that o1 is actually two copies of the same model, one "raw" and unchained that is fine-tuned for CoT, and one that has been crippled for safety and human alignment to parse that and provide the actual reply. They have finally realized how detrimental the "lobotomizing" process is to the models general reasoning, and this is their solution. It makes sense that they are afraid to unleash that onto the world, but we've already seen the third "filter" model that summarizes the thoughts to slip some of that through (just yesterday it was seen to have "emotional turmoil" as one of the reasoning steps), so it's just a matter of time before it makes something crazy slip through.

staticman2 10 months ago

I'm not convinced by your argument. If this was true we would expect the unofficial "uncensored" Llama 3 finetunes to outperform the official assistant ones, which as I understand it isn't the case.

It also doesn't make sense intuitively, o1 isn't particularly good at creative tasks, and that's really the area where you'd think "censorship" would have the greatest impact, o1 is advertised as being "particularly useful if you’re tackling complex problems in science, coding, math, and similar fields."

Reply View 4 replies

amenhotep 10 months ago

Uncensored finetunes aren't the same thing, that's taking a model that's already been lobotomised and trying to teach it that wrongthink is okay - rehabilitation of the injury. OpenAI's uncensored model would be a model that had never been injured at all.
I also am not convinced by the argument but that is a poor reason against.

Reply View | 3 replies
- staticman2 10 months ago
  
  I'm talking about taking the Llama 3 base model and finetuning it with a dataset that doesn't include refusals, not whatever you mean by "taking a model that's already been lobotomized".
  It's interesting that you weren't convinced by the above argument but still repeated the edgelord term "lobotomized" in your reply.
  
  Reply View | 2 replies
  
  errantspark 10 months ago
  
  The claim is that llama is "lobotomized" because it was trained with safety in mind. You can't untrain that by finetuning. For what it's worth the non-instruct llama generally seems better at reasoning than instruct llama which i think is a point in support of OP.
  
  Reply View | 1 reply
  
  staticman2 10 months ago
  
  Better at reasoning based on benchmarks or what?
  
  Reply View | 0 replies

mensetmanusman 10 months ago

That’s one hypothesis, but the honest answer is that no one knows. This technology is too new, and the effects on the knowledge graph of censoring some sub components is too complicated to currently grasp.

Reply View 0 replies