HN Top New Show Ask Jobs

settings

Theme

Light

Dark

System

Hand Mode

Lefty

Righty

Feed

Show Preview Images Highlight Last Visited Post

Comment by pegasus

Comment by pegasus 3 days ago

View on Hacker News

Look into RLVR (Reinforcement Learning with Verifiable Rewards). It happens during model post-training.