Comment by ainiriand

Comment by ainiriand 3 days ago

... Training their own models out of your code...

bloppe 3 days ago

If you're publishing your code anywhere, it's getting trained on. MS does not restrict themselves to only training on GH-hosted code.

Reply View 6 replies

bayindirh 3 days ago

Yet, not restricting themselves to train on permissively licensed code only.
The two ends of the spectrum, both source available and copyleft licensed code shouldn't be used for training, but who's listening.

Reply View | 0 replies
ISSOtm 3 days ago

The point still stands for private repos, and also not making the job easy for them.

Reply View | 1 reply
- TheRoque 3 days ago
  
  They don't train on private repos, there has been no proof of that anyways
  
  Reply View | 0 replies
throwaway290 3 days ago

> If you're publishing your code anywhere, it's getting trained on
citation needed. first they need to know my code exists... spend time and traffic crawling it because it's sure as hell not going to be hosted on azure... probably get detected and banned.

Reply View | 2 replies
- dabockster 3 days ago
  
  No citation needed. It should be an assumption and thought as a malicious cybersecurity threat.
  
  Reply View | 1 reply
  
  throwaway290 2 days ago
  
  > It should be an assumption and thought as a malicious cybersecurity threat.
  If you believe in absolute cybersecurity for anything you keep online boy I've got news for you. Literally all you can do is make it tougher but it will never be uncrackable. The degree of it depends on how much you can invest and suffer.
  same here. codeberg makes in tougher so it's a measure.
  
  Reply View | 0 replies

goku12 2 days ago

Most people don't care about the AI being trained on their FOSS repos. If they did, they would have mass migrated when Microsoft announced it. The timing suggests that the downtime and the performance issues are definitely the irritants here.

This is not to say that people shouldn't care about AI training. I was disappointed by the public response when they announced it. The GH ToS has conditions that allow them to use your code, overriding its license. Even worse, that still applies if somebody else mirrors your code there from some other forge. And they don't stop at that. I have noticed that they just scrape off code from source registries like crates.io in the name of security. I would be surprised if they didn't use that too for training their AI.

Reply View 1 reply

ISSOtm 2 days ago

I personally expected the AI stuff to be a fad that would go away quickly, and thus didn't get out the second they did that (for the same reason that distro-hopping is unhealthy). It's more a symptom of the frog recognising that okay yeah the temperature's grown definitely too high.

Reply View | 0 replies