Comment by the_fall

Comment by the_fall 11 hours ago

It might be an interesting LLM benchmark: how many can they list without breaking the rules (repetition or non-animals). Although I bet that big bucks would be then thrown at pointlessly optimizing for that benchmark, so...

bronco21016 11 hours ago

Might be an interesting problem for understanding how various models perform recollection of prior tokens within the context window. I'm sure they could list animals until their window is full but what I'm not sure of is how much of the window they could fill without repeating.

Reply View 2 replies

helloplanets 11 hours ago

I guess it could be generalized to filling up the context window with any token, but just making sure none of the tokens repeat.
An interesting twist could be making sure a specific token is an anagram of the token N tokens back. This could possibly measure how much a model can actually plan forwards.

Reply View | 0 replies
pbmonster 8 hours ago

Even more interesting is if a thinking LLM would come up with tricks mitigating its own known limits - like listing animals in alphabetical order, or launching a shell/interpreter with a list that contains previous answers (which it then checks each new answer against).

Reply View | 0 replies

OxfordOutlander 11 hours ago

you might like https://github.com/aidanmclaughlin/AidanBench

Reply View 0 replies