Comment by justusm

Comment by justusm 5 days ago

nice! Training models using reward signals for code correctness is obviously very common; I'm very curious to see how good things can get using a reward signal obtained from visual feedback