Comment by sdesol
> On a similar note, has anyone found themselves absolutely not trusting non-code LLM output?
I'm working on a LLM chat app that is built around mistrust. The basic idea is that it is unlikely a supermajority of quality LLMs can get it wrong.
This isn't foolproof though, but it does provide some level of confidence in the answer.
Here is a quick example in which I analyze results from multiple LLMs that answered, "When did Homer Simpson go to Mars?"
https://beta.gitsense.com/?chat=4d28f283-24f4-4657-89e0-5abf...
If you look at the yes and no table, all except GPT-4o and GPT-4o mini said no. After asking GPT-4o who was correct, it provided "evidence" on an episode so I asked for more information on that episode. Based on what it said, it looks like the mission to Mars was a hoax and when I challenged GPT-4o on this, it agreed and said Homer never went to Mars, like others have said.
I then asked Sonnet 3.5 about the episode and it said GPT-4o misinterpreted the plot.
https://beta.gitsense.com/?chat=4d28f283-24f4-4657-89e0-5abf...
At this point, I am confident (but not 100% sure) Homer never went to Mars and if I really needed to know, I'll need to search the web.
It's the backwards reasoning that really frustrates me when using LLMs. You ask a question, it says sure do these things, they don't work out and you ask the LLM why not, and it replies yes that thing I told you to do wouldn't work because of these clear reasons.
It would be nice to start at the end of that chain of reasoning instead of the other side.
Another regular example is when it "invents" functions or classes that don't exist, when pressed about them, it will reply of course that won't work, that function doesn't exist.
Okay great, so don't tell me it does with such certainty, is what I would tell a human feeding me imagination as facts all the time. But of course an LLM is not reasoning in the same sense, so this reverse chain of thought is the outcome.
I am finding LLMs far more useful for soft skill topics than engineering type work, simply because of how often it leads me down a path that is eventually a dead end, because of some small detail that was wrong at the very beginning.