Comment by defrost
> I think you would be hard-pressed to prove that any model trained on internet scrapes definitively wasn't trained on any CSAM whatsoever.
I'd be hard pressed to prove that you definitely hadn't killed anybody ever.
Legally if it's asserted that these images are criminal because they are the result of being the product of an LLM trained on sources that contained CSAM then the requirement would be to prove that assertion.
With text and speech you could prompt the model to exactly reproduce a Sarah Silverman monologue and assert that proves her content was used in the training set, etc.
Here the defense would ask the prosecution to demonstrate how to extract a copy of original CSAM.
But your point is well taken, it's likely most image generation programs of this nature have been fed at least one image that was borderline jailbait and likely at least one that was well below the line.
> Legally if it's asserted that these images are criminal because they are the result of being the product of an LLM trained on sources that contained CSAM then the requirement would be to prove that assertion.
Legally, possession of CSAM is against the law because there is an assumption that possession proves contribution to market demand, with an understanding that demand incentives production of supply, meaning there that with demand children will be harmed again to produce more content to satisfy the demand. In other words, the intent is to stop future harm. This is why people have been prosecuted for things like suggestive cartoons that have no real-life events behind them. It is not illegal on the grounds of past events. The actual abuse is illegal on its own standing.
The provenance of the imagery is irrelevant. What you need to prove is that your desire to have such imagery won't stimulate yourself or others to create new content with real people. If you could somehow prove that LLM content will satisfy all future thirst, problem solved! That would be world changing.