Comment by ainiriand
... Training their own models out of your code...
... Training their own models out of your code...
> If you're publishing your code anywhere, it's getting trained on
citation needed. first they need to know my code exists... spend time and traffic crawling it because it's sure as hell not going to be hosted on azure... probably get detected and banned.
No citation needed. It should be an assumption and thought as a malicious cybersecurity threat.
> It should be an assumption and thought as a malicious cybersecurity threat.
If you believe in absolute cybersecurity for anything you keep online boy I've got news for you. Literally all you can do is make it tougher but it will never be uncrackable. The degree of it depends on how much you can invest and suffer.
same here. codeberg makes in tougher so it's a measure.
Most people don't care about the AI being trained on their FOSS repos. If they did, they would have mass migrated when Microsoft announced it. The timing suggests that the downtime and the performance issues are definitely the irritants here.
This is not to say that people shouldn't care about AI training. I was disappointed by the public response when they announced it. The GH ToS has conditions that allow them to use your code, overriding its license. Even worse, that still applies if somebody else mirrors your code there from some other forge. And they don't stop at that. I have noticed that they just scrape off code from source registries like crates.io in the name of security. I would be surprised if they didn't use that too for training their AI.
I personally expected the AI stuff to be a fad that would go away quickly, and thus didn't get out the second they did that (for the same reason that distro-hopping is unhealthy). It's more a symptom of the frog recognising that okay yeah the temperature's grown definitely too high.
If you're publishing your code anywhere, it's getting trained on. MS does not restrict themselves to only training on GH-hosted code.