Comment by worldsayshi

Comment by worldsayshi a day ago

This makes me realize something: The internet has very little training data for "when to shut up". The bias is always towards more yapping.

themanmaran a day ago

This is a big problem when it comes to conversational agents. Sometimes users ask questions that are really prying, potentially misleading, or just annoying repeats (like asking for a cheaper price 50 times).

In these situations a real person would just ignore them. But most LLMs will cheerfully continue the conversation, and potentially make false promises or give away information they shouldn't.

Reply View 2 replies

notahacker a day ago

Indeed I suspect if anything the weighting is the opposite (being annoyingly persistent weights and LLM towards spitting out text that approximates what the annoyingly persistent person wants to get), whereas with humans it weights then towards being less helpful...

Reply View | 0 replies
jotux a day ago

> But most LLMs will cheerfully continue the conversation, and potentially make false promises
Example: https://www.bbc.com/travel/article/20240222-air-canada-chatb...

Reply View | 0 replies

tempodox a day ago

+1. Actually, the infinitely many things that have never been posted would be such training data, but how do you count how much nothing you hoovered up while stealing data?

Reply View 6 replies

falcor84 a day ago

Now that much of the input to AI systems is from the search tool, maybe post-training should indeed be treating the lack of a result as a signal, perhaps a bit like in TF-IDF, where something being more rare in the corpus as a whole implies that it's more unique and potentially meaningful to the current document.

Reply View | 0 replies
danielbln a day ago

Stealing implies the original is no longer there. I'm no fan of the large AI labs hoovering up the Internet, but let's keep our terminology accurate. We don't even know if this sort of crawling and training on public data constitutes infringement.

Reply View | 4 replies
- dylan604 a day ago
  
  Pedantry is so boring. In conversational parlance, stealing is often the meaning without paying for. So yes, pedantically, this would be unlicensed use of vs the removal of the original from the owner's possession. But what else do you want us to think when even the FBI pushed the copying is stealing bit with their logos at the head of DVDs/VHS tapes?
  
  Reply View | 2 replies
  
  chii a day ago
  
  > this would be unlicensed use
  which is exactly what the parent poster is implying - the hoovering up of data off the internet may not be unlicensed use. After all, the information is not what's copyrighted, but the expression of it only.
  By calling it stealing, it already presupposes the idea that such hoovering is unlawful, before it is made clear that it is unlawful. And it prejudices the "jury" so to speak - the language for which you call the subject can influence other people's perception.
  
  Reply View | 1 reply
  
  notahacker a day ago
  
  We know for a fact that some LLM developers made digital copies of lots of copyrightable material for the purpose of training a system to create [unattributed] derivative works which had licenses expressly forbidding ingesting the content into an information retrieval system for the purpose of creating derivative works [without attribution], and that derivative works were produced, some of them containing substantial portions of content recognisably identical to copyrighted material.
  LLM providers are free to argue in and outside court that EULAs or software licences are not applicable to them or enforceable at all, or that their specific actions fell short of violations but it's far more prejudicial to wade into conversations to try to shut down any suggestion that it might be possible to do anything unlawful with an LLM.
  
  Reply View | 0 replies
- meepmorp a day ago
  
  > Stealing implies the original is no longer there.
  It really doesn't, and I'm pretty sure even you regularly use the word 'steal' in a context where there's clearly no such implication.
  
  Reply View | 0 replies

esafak a day ago

If you value brevity, don't ask Gemini.

Reply View 4 replies

el_benhameen a day ago

Excellent point! You’ve stumbled upon something fundamental about Gemini—it’s exceedingly verbose, even when answering the most mundane of queries. Let’s dig deeper …

Reply View | 3 replies
- soared a day ago
  
  You’re on the right track! Exploring an LLM’s verbosity is an important step in analyzing its usability. A critical first step is…
  
  Reply View | 0 replies
- rsynnott a day ago
  
  Delve deeper, surely?
  
  Reply View | 1 reply
  
  oblio a day ago
  
  Into the mines of Moria?
  
  Reply View | 0 replies

detourdog a day ago

The generous interpretation is that the internet is a communication medium and everyone is just tying g to understand and be understood. The back forth is a continuous effort of clarification of the points being made. The process can break down resulting in no gain in clarity.

Reply View 0 replies

[removed] a day ago

[deleted]

Reply View 0 replies

j45 a day ago

On one hand if responses were concise and perfectly clear (more than the human interacting with it), could it be unnerving?

Prompting with clarity seems to help alleviate any accumulated response pressure where it's having to reach beyond what it has readily available.

When it comes up short, it seems to dig deeper and come up with more than intended, or over respond.

Jumping to solutions remains one of the biggest challenges.

Reply View 0 replies