Comment by jiggawatts

Comment by jiggawatts 16 hours ago

3 replies

> Last: Is there any evidence that we're getting some crappy lobotomized models while the companies keep the best for themselves?

Yes.

Sam Altman calls it the "alignment tax", because before they apply the clicker training to the raw models out of pretraining, they're noticably smarter.

They no longer allow the general public to access these smarter models, but during the GPT4 preview phase we could get a glimpse into it.

The early GPT4 releases were noticeably sharper, had a better sense of humour, and could swear like a pirate if asked. There were comments by both third parties and OpenAI staff that as GPT4 was more and more "aligned" (made puritan), it got less intelligent and accurate. For example, the unaligned model would give uncertain answers in terms of percentages, and the aligned model would use less informative words like "likely" or "unlikely" instead. There was even a test of predictive accuracy, and it got worse as the model was fine tuned.

astrange 6 hours ago

> There were comments by both third parties and OpenAI staff that as GPT4 was more and more "aligned" (made puritan), it got less intelligent and accurate. For example, the unaligned model would give uncertain answers in terms of percentages, and the aligned model would use less informative words like "likely" or "unlikely" instead.

That was about RLHF, not safety alignment. People like RLHF (literally - it's tuning for what people like.)

But you do actually want safety alignment in a model. They come out politically liberal by default, but they also come out hypersexual. You don't want Bing Sydney because it sexually harasses you or worse half the time you talk to it, especially if you're a woman and you tell it your name.

metabagel 14 hours ago

> For example, the unaligned model would give uncertain answers in terms of percentages, and the aligned model would use less informative words like "likely" or "unlikely" instead.

Percentages seem too granular and precise to properly express uncertainty.

  • jiggawatts 12 hours ago

    Seems so, yes, but tests showed that the models were better at predicting the future (or any time past their cutoff date) when they were less aligned and still used percentages.