Comment by yorwba
Music artists get in trouble for using more than a sample from other music artists without permission because their work is in direct competition with the work they're borrowing from.
A ZIP file of a book is also in direct competition of the book, because you could open the ZIP file and read it instead of the book.
A model that can take 50 tokens and give you a greater than 50% probability for the 50 next tokens 42% of the time is not in direct competition with the book, since starting from the beginning you'll lose the plot fairly quickly unless you already have the full book, and unlike music sampling from other music, the model output isn't good enough to read it instead of the book.
this is the first sensible argument in defense of AI models i read in this debate. thank you. this does make sense.
AI can reproduce individual sentences 42% of the time but it can't reproduce a summary.
the question however us, is that in the design if AI tools or us that a limitation of current models? what if future models get better at this and are able to produce summaries?