Comment by accurrent

Comment by accurrent 2 days ago

7 replies

I gave a prompt and it stright up hallucinated. My prompt was about writing an article about the advantages and disadvantages of rust in the robotics ecosystem. It claimed that google cartographer was written in rust. The annoying thing about this is that it was quite convincing, I found the citation it used to be geeks for geeks blogspam that did not mention cartographer any where so I went and checked it was a C++ only project. Its worrisome when you see people relying on llms for knowledge.

jeroenhd 2 days ago

People trusting LLMs to tell the truth is the advanced version of people taking the first link on Google as indubitable facts.

This whole trend is going to get much worse before it gets better.

  • tikkun 2 days ago

    I'm optimistic that hallucination rates will go down quite a bit again with the next gen of models (gpt5 / claude 4 / gemini 2 / llama 4).

    I've noticed that the hallucination rate of newer more SOTA models is much lower.

    3.5 sonnet hallucinates less than gpt 4 which hallucinates less than gpt 3.5 which hallucinates less than llama 70b which hallucinates less than gpt 3.

    • nytesky 2 days ago

      Eventually won’t most training data be AI generated? Will we see feedback issues?

leettools 2 days ago

We are actually working on a tool that provides similar functions (although we focus more on the knowledgebase curation part). Here is an article we generated from the prompt "the advantages and disadvantages of rust in the robotics ecosystem" (https://svc.leettools.com/#/share/leettools/research?id=9886...): the basic flow is to query Google using the prompt, generate the article outline using the search result summaries, and then generate each section separately. Interested to see your opinions on the differences, thanks!

  • accurrent 2 days ago

    I'm impressed, its better than the article I found written by Storm. That being said both tend to rely on whats available on the internet, so lack things that are more subtle. Its impressive that your article picked on Pixi. Of course as a practicing roboticist my arguments would be different, but at this point I'm knitpicking.

    • leettools 2 days ago

      Thanks for the feedback! Yeah, by default this kind of survey articles are generated by publicly available information through search results. So the quality depends a lot of Google's ranking mostly and your search terms. Right now we can add expert-picked documents to the KB and generate the results from the curated KB instead directly from the search. Better prompting (specific to the target field of study) and more iterations (have a quality check and rewrite accordingly) should also be very helpful.