Comment by postalcoder

Comment by postalcoder a day ago

7 replies

From the paper:

> Datasets. We construct a diverse and high-quality collection of video datasets to train STARFlow-V. Specifically, we leverage the high-quality subset of Panda (Chen et al., 2024b) mixed with an in-house stock video dataset, with a total number of 70M text-video pairs.

justinclift a day ago

> in-house stock video dataset

Wonder if "iCloud backups" would be counted as "stock video" there? ;)

  • anon7000 a day ago

    I have to delete as many videos as humanly possible before backing up to avoid blowing through my iCloud storage quota so I guess I’m safe

  • fragmede a day ago

    Turn on advanced data protection so they don't train on yours.

    • givinguflac a day ago

      That has nothing to do with it, and Apple wouldn’t train on user content, they’re not Google. If they ever did there would be opt in at best. There’s a reason they’re walking and observing, not running and trying to be the forefront cloud AI leader, like some others.

      • gaigalas 13 hours ago

        Why should I buy this "ethical Apple" argument?

        They shared audio Siri recordings with contractors in 2019. It became opt-in only after backlash, similar to other privacy controversies.

        This shows that they clearly prioritize not being sued or caught, which is slightly different from prioritizing user choices.