Comment by o11c

Comment by o11c 3 days ago

3 replies

The problem with LLMs is not a data problem. LLMs are stupid even on data they just generated.

One recent catastrophic failure I found: Ask an LLM to generate 10 pieces of data. Then in a second input, ask it to select (say) only numbers 1, 3, and 5 from the list. The LLM will probably return results numbered 1, 3, and 5, but chances are at least one of them will actually copy the data from a different number.

wsve 3 days ago

I'm absolutely not bullish on LLMs, but I think this is kinda judging a fish on its ability to climb a tree.

LLMs are looking at typical constructions of text, not an understanding of what it means. If you ask it what color the sky is, it'll find what text usually follows a sentence like that, and tries to construct a response from it.

If you ask it the answer to a math question, the only way it could reliably figure it out is if it has in its database an exact copy of that math question. Asking it to choose things from a list is kinda like that, but one could imagine that the designers would try to supplement that manually with a different technique from pure LLM.

smcin 3 days ago

Any ideas why that misnumbering happens? It sounds a very basic thing to get wrong. And as a fallback, could be brute-force kludged with an extra pass which appends the output list to the prompt.

  • o11c 3 days ago

    It's an LLM, we cannot expect any real idea.

    Unless of course we rephrase it as "when I roll 2d6, why do I sometimes get snake eyes?"