Comment by barbazoo
Comment by barbazoo 18 hours ago
> Using unowned training data (e.g., celebrity faces, copyrighted art)
How would one ever know that the GenAI output is not influenced or based on copyrighted content.
Comment by barbazoo 18 hours ago
> Using unowned training data (e.g., celebrity faces, copyrighted art)
How would one ever know that the GenAI output is not influenced or based on copyrighted content.
Getty and Adobe offer models that were trained only on images that they have the rights to. Those models might meet Netflix’s standards?
Doesn’t seem likely that adobe has a owned collection of content big enough. Seems very likely that they just deemed the legal risk to be outweighed by commercial opportunity. They kinda had to - a product that generates stuff that gets you sued is not worth paying whatever they charge for their subscription
I kind of wonder if that even works.
If you take a model trained on Getty and ask it for Indiana Jones or Harry Potter, what does it give you? These things are popular enough that it's likely to be present in any large set of training data, either erroneously or because some specific works incorporated them in a way that was licensed or fair use for those particular works even if it isn't in general.
And then when it conjures something like that by description rather than by name, how are you any better off than something trained from random social media? It's not like you get to make unlicensed AI India Jones derivatives just because Getty has a photo of Harrison Ford.
I work in this space. In traditional diffusion-based regimes (paired image and text), one can absolutely check the text to remove all occurrences of Indiana Jones. Likewise, Adobe Stock has content moderation that ensures (up to human moderation limit) no dirty content. It is a world without Indiana Jones to the model
If you ask the Adobe stock image generation for "Adventurer with a whip and hat portrait view , Brown leather hat, jacket, close-up"
It gives you an image of Harrison Ford dressed like Indiana Jones.
https://stock.adobe.com/ca/images/adventurer-with-a-whip-and...
> one can absolutely check the text to remove all occurrences of Indiana Jones
How do you handle this kind of prompt:
“Generate an image of a daring, whip-wielding archaeologist and adventurer, wearing a fedora hat and leather jacket. Here's some back-story about him: With a sharp wit and a knack for languages, he travels the globe in search of ancient artifacts, often racing against rival treasure hunters and battling supernatural forces. His adventures are filled with narrow escapes, booby traps, and encounters with historical and mythical relics. He’s equally at home in a university lecture hall as he is in a jungle temple or a desert ruin, blending academic expertise with fearless action. His journey is as much about uncovering history’s secrets as it is about confronting his own fears and personal demons.”
Try copy-pasting it in any image generation model. It looks awfully like Indiana Jones for all my attempts, yet I've not referenced Indiana Jones even once!
It comes down to who is liable for the edge cases, I suspect. Adobe will compensate the end user if they get sued for using a Firefly-generated image (probably up to some limit).
Getting sued occasionally is a cost of doing business in some industries. It’s about risk mitigation rather than risk elimination.
Feels like "paying extra for the extended warranty" vibes. What it covers isn't much (do you expect someone to come after you in small claims court and if they do, was that your main concern?) meanwhile the big claim you're actually worried about is what it doesn't cover.
And if you really wanted insurance then why not get it from an actual insurance company?
Because almost everything is risk mitigation or reduction, not elimination.
In particular, in the US, the legal apparatus has been gamified to the point that the expectation becomes people will sue if their expected value out of it is positive even if the case is insane on its merits, because it's much more likely someone with enough risk and cost will settle as the cheaper option.
And in that world, there is nothing that completely eliminates the risk of being sued in bad faith - but the more things you put in your mitigation basket, the narrower the error bars are on the risk even if the 99.999th percentile is still the same.
All the indemnities I’ve read have clauses though that say if you intentionally use it to make something copyrighted they won’t protect you.
So if you put obviously copyrighted things in the prompt you’ll still be on your own.
Lionsgate tried that and found that even their entire archive wasn't nearly enough to produce a useful model: https://www.thewrap.com/lionsgate-runway-ai-deal-ip-model-co... and https://futurism.com/artificial-intelligence/lionsgate-movie...
This amuses me.
Consumers have long wanted a single place to access all content. Netflix was probably the closest that ever got, and even then it had regional difficulties. As competitors rose, they stopped licensing their content to netflix, and netflix is now arguably just another face in the crowd.
Now they want to go and leverage AI to produce more content and bam, stung by the same bee. No one is going to license their content for training, if the results of that training will be used in perpetuity. They will want a permanent cut. Which means they either need to support fair use, or more likely, they will all put up a big wall and suck eggs.
I think it would be very, very difficult - almost impossible - to create a dataset to train an image generator that doesn't contain any copyrighted material that you don't have the rights to. There's the obvious stuff like Mickey Mouse or Superman, you just run some other tool over it to filter them out, but there are so many ridiculous things that can be copyrighted (depictions of buildings, tattoos), things like crowd shots, pictures of cities that have ads in the background, that I don't know how you could do it. I'm sure even Adobe's stock library would have a lot of violations like that.