Comment by admaiora
Comment by admaiora 3 hours ago
And it's a question of do we accept breaking law for the possibility to have the greatest technological advancement of the 21st century. In my opinion, legal system has become a blocker for a lot of innovation, not only in AI but elsewhere as well.
This is a point that I don't see discussed enough. I think anthropic decided to purchase books in bulk, tear them apart to scan them, and then destroy those copies. And that's the only source of copyrighted material I've ever heard of that is actually legal to use for training LLMs.
Most LLMs were trained on vast troves of pirated copyrighted material. Folks point this out, but they don't ever talk about what the alternative was. The content industries, like music, movies, and books, have done nothing to research or make their works available for analysis and innovation, and have in fact fought industries that seek to do so tooth and nail.
Further, they use the narrative that people that pirate works are stealing from the artists, where the vast majority of money that a customer pays for a piece of copyrighted content goes to the publishing industry. This is essentially the definition of rent seeking.
Those industries essentially tried to stop innovation entirely, and they tried to use the law to do that (and still do). So, other companies innovated over the copyright holder's objections, and now we have to sort it out in the courts.