Uninen 3 days ago

This is wild!

"when assessed by Claude 3.5 Sonnet’s production-grade RM, our unsupervised assistant policy wins 60% of head-to-head comparisons against the policy trained with the human-supervised RM." So now the models can even post-train the new models better than a human can

  • cma 2 days ago

    Everytop model in ARC AGI used a test time finery king approach. They they had one example pair though and would usually do transformations (color, mirroring, etc) of it for the finetuning, and that might have been coded by hand