Comment by strogonoff
Comment by strogonoff 12 hours ago
I keep waiting for the day when software stops being compared to a human person (a being with agency, free will, consciousness, and human rights of its own) for the purposes of justifying IP law circumvention.
Yes, there is no problem when a person reads some book and recalls pieces[0] of it in a suitable context. How would that in any way address when certain people create and distribute commercial software, providing it that piece as input, to perform such recall on demand and at scale, laundering and/or devaluing copyright, is unclear.
Notably, the above is being done not just to a few high-profile authors, but to all of us no matter what we do (be it music, software, writing, visual art).
What’s even worse, is that imaginably they train (or would train) the models to specifically not output those things verbatim specifically to thwart attempts to detect the presence of said works in training dataset (which would naturally reveal the model and its output being a derivative work).
Perhaps one could find some way of justifying that (people justified all sorts of stuff throughout history), but let it be something better than “the model is assumed to be a thinking human when it comes to IP abuse but unthinking tool when it comes to using it for personal benefit”.
[0] Of course, if you find me a single person on this planet capable of recalling 42% of any Harry Potter book, I’d be very impressed if I ever believed it.
> I keep waiting for the day when software stops being compared to a human person (a being with agency, free will, consciousness, and human rights of its own) for the purposes of justifying IP law circumvention.
I mean, "agency" is a goal of some AI; "free will" is incoherent*; the word "consciousness" has about 40 different definitions, some of which are so broad they include thermostats and others so narrow that it's provably impossible for anything (including humans) to have it; and "human rights" are a purely legal concept.
> What’s even worse, is that imaginably they train (or would train) the models to specifically not output those things verbatim specifically to thwart attempts to detect the presence of said works in training dataset (which would naturally reveal the model and its output being a derivative work).
Some of the makers certainly do as you say; but also, the more verbatim quotations a model can produce, the more computational effort that model needs to spend to get the far more useful general purpose results.
* I'm not a fan of Aleister Crowley, but I think he was right to say that there's only one thing you can actually do that's truly your own will and not merely you allowing others to influence you: https://en.wikipedia.org/wiki/True_Will