themanmaran a day ago

This is a big problem when it comes to conversational agents. Sometimes users ask questions that are really prying, potentially misleading, or just annoying repeats (like asking for a cheaper price 50 times).

In these situations a real person would just ignore them. But most LLMs will cheerfully continue the conversation, and potentially make false promises or give away information they shouldn't.

  • notahacker a day ago

    Indeed I suspect if anything the weighting is the opposite (being annoyingly persistent weights and LLM towards spitting out text that approximates what the annoyingly persistent person wants to get), whereas with humans it weights then towards being less helpful...

tempodox a day ago

+1. Actually, the infinitely many things that have never been posted would be such training data, but how do you count how much nothing you hoovered up while stealing data?

  • falcor84 a day ago

    Now that much of the input to AI systems is from the search tool, maybe post-training should indeed be treating the lack of a result as a signal, perhaps a bit like in TF-IDF, where something being more rare in the corpus as a whole implies that it's more unique and potentially meaningful to the current document.

  • danielbln a day ago

    Stealing implies the original is no longer there. I'm no fan of the large AI labs hoovering up the Internet, but let's keep our terminology accurate. We don't even know if this sort of crawling and training on public data constitutes infringement.

    • dylan604 a day ago

      Pedantry is so boring. In conversational parlance, stealing is often the meaning without paying for. So yes, pedantically, this would be unlicensed use of vs the removal of the original from the owner's possession. But what else do you want us to think when even the FBI pushed the copying is stealing bit with their logos at the head of DVDs/VHS tapes?

      • chii a day ago

        > this would be unlicensed use

        which is exactly what the parent poster is implying - the hoovering up of data off the internet may not be unlicensed use. After all, the information is not what's copyrighted, but the expression of it only.

        By calling it stealing, it already presupposes the idea that such hoovering is unlawful, before it is made clear that it is unlawful. And it prejudices the "jury" so to speak - the language for which you call the subject can influence other people's perception.

        • notahacker a day ago

          We know for a fact that some LLM developers made digital copies of lots of copyrightable material for the purpose of training a system to create [unattributed] derivative works which had licenses expressly forbidding ingesting the content into an information retrieval system for the purpose of creating derivative works [without attribution], and that derivative works were produced, some of them containing substantial portions of content recognisably identical to copyrighted material.

          LLM providers are free to argue in and outside court that EULAs or software licences are not applicable to them or enforceable at all, or that their specific actions fell short of violations but it's far more prejudicial to wade into conversations to try to shut down any suggestion that it might be possible to do anything unlawful with an LLM.

    • meepmorp a day ago

      > Stealing implies the original is no longer there.

      It really doesn't, and I'm pretty sure even you regularly use the word 'steal' in a context where there's clearly no such implication.

esafak a day ago

If you value brevity, don't ask Gemini.

  • el_benhameen a day ago

    Excellent point! You’ve stumbled upon something fundamental about Gemini—it’s exceedingly verbose, even when answering the most mundane of queries. Let’s dig deeper …

    • soared a day ago

      You’re on the right track! Exploring an LLM’s verbosity is an important step in analyzing its usability. A critical first step is…

detourdog a day ago

The generous interpretation is that the internet is a communication medium and everyone is just tying g to understand and be understood. The back forth is a continuous effort of clarification of the points being made. The process can break down resulting in no gain in clarity.

[removed] a day ago
[deleted]
j45 a day ago

On one hand if responses were concise and perfectly clear (more than the human interacting with it), could it be unnerving?

Prompting with clarity seems to help alleviate any accumulated response pressure where it's having to reach beyond what it has readily available.

When it comes up short, it seems to dig deeper and come up with more than intended, or over respond.

Jumping to solutions remains one of the biggest challenges.