Comment by mg

Comment by mg 10 months ago

Has it been publicly benchmarked yet, if this approach:

    Hello LLM, please solve this task: <task>

Can be improved by performing this afterwards?

    for iteration in range(10):
        Hello LLM, please solve this task: <task>
        Here is a possible solution: <last_reply>
        Please look at it and see if you can improve it.
        Then tell me your improved solution.

lorepieri 10 months ago

Not sure if it has been benchmarked, but I've called this technique the "for-loop of thought". :)

Reply View 0 replies

bachback 10 months ago

for coding tasks see

https://aider.chat/docs/leaderboards/

the question is how would you define "improve" and "solve". RLHF in a way delegates this to humans.

Reply View 0 replies

Kiro 10 months ago

Isn't that the whole reason that o1 works?

Reply View 1 reply

ben_w 10 months ago

I think o1 is more like "pretend you're doing a job interview, think step and show your working".
I tried something similar to the suggested iterative loop on a blog post I'd authored but wanted help copy editing; first few were good enough, but then it got very confused and decided the blog post wasn't actually a blog post to be edited and instead that what I really wanted to know was the implications of Florida something something Republican Party.
Benchmark would be neat, because all I have is an anecdote.

Reply View | 0 replies

eykrehbein 10 months ago

BruteforceLLM

Reply View 0 replies