Comment by cma

Comment by cma 4 days ago

From Anthropic a couple days ago too, self finetuning:

Uninen 3 days ago

This is wild!

"when assessed by Claude 3.5 Sonnet’s production-grade RM, our unsupervised assistant policy wins 60% of head-to-head comparisons against the policy trained with the human-supervised RM." So now the models can even post-train the new models better than a human can

Reply View 1 reply

cma 2 days ago

Everytop model in ARC AGI used a test time finery king approach. They they had one example pair though and would usually do transformations (color, mirroring, etc) of it for the finetuning, and that might have been coded by hand

Reply View | 0 replies

dang 3 days ago

Related ongoing thread:

Unsupervised Elicitation of Language Models - https://news.ycombinator.com/item?id=44276041

Reply View 0 replies

[removed] 3 days ago

[deleted]

Reply View 0 replies