Comment by userbinator
Comment by userbinator 2 months ago
Scores have skyrocketed
I suggest making the problems more unique ones that humans would be able to solve but easily trip up an AI --- minor variations of existing ones seem to work well. There's some fun with that sort of idea here: https://news.ycombinator.com/item?id=38766512
It's really already very difficult to write good problem material for evaluations. Having to find a way where difficulty is intermediate for the target audience (not too easy, not too hard) but also too hard for LLMs would be very challenging / impossible for most disciplines.