Comment by heavyset_go

Comment by heavyset_go 2 days ago

4 replies

> Is it confirmed that site loads go into the training database?

Would you trust OpenAI if they told you it doesn't?

If you would, would you also trust Meta to tell you if its multibillion dollar investment was trained on terabytes of pirated media the company downloaded over BitTorrent?

viraptor 2 days ago

We don't have to trust it or not. If there's such claim, surely someone can point at least at a pcap file with an unknown connection. Or at some decompiled code. Otherwise it's just a conspiracy theory.

  • _flux 2 days ago

    Surely the data must go to the OpenAI servers, how else would they use LLMs on it? We cannot see if that data ends up in the training data.

    Personally I would just believe what they say for the time being; there would be backlash in doing something else, possibly legal one.

    • viraptor 2 days ago

      I think the original claim was about something different. "Is it confirmed that site loads..." - I read it as the author taking about general browsing, not just explicit questions, with the context of the page.

  • heavyset_go 2 days ago

    Whatever is included in context is in OpenAI's control from that point forward, and you just have to trust them not to do anything with it.

    That isn't a conspiracy theory, it's fundamentally how interfacing with 3rd party hosted LLMs works.