Comment by SilverElfin

ekianjo 2 days ago

It's just open weights, the source has no place in this expression

Yeah but you can distill

littlestymaar 2 days ago

You can distill closed weights models as well. (Just not logit-distillation)

Reply View | 1 reply
- mips_avatar 2 days ago
  
  Though it violates their terms of service
  
  Reply View | 0 replies
amelius 2 days ago

Is that the equivalent of decompile?

Reply View | 1 reply
- c0balt 2 days ago
  
  No, that is the equivalent of lossy compression.
  
  Reply View | 0 replies

falcor84 2 days ago

Isn't that a bit like saying that if I open source a tool, but not a full compendium of all the code that I had read, which led me to develop it, then it's not really open source?

Reply View 23 replies

KaiserPro 2 days ago

No its like releasing a binary. I can hook into it and its API and make it do other things. But I can't rebuild it from scratch.

Reply View | 3 replies
- falcor84 2 days ago
  
  > rebuild it from scratch
  That's beyond the definition of Open Source. Doing a bit of license research now, only the GPL has such a requirement - GPLv3:
  > The "Corresponding Source" for a work in object code form means all the source code needed to generate, install, and (for an executable work) run the object code and to modify the work, including scripts to control those activities.
  But all other Open Source compliant licenses I checked don't, and just refer to making whatever is in the repo available to others.
  
  Reply View | 2 replies
  
  KaiserPro 2 days ago
  
  If you distribute a binary to someone, with gpl2, you should also, if asked provide the source code used to _build_ that binary. Other licenses will differ. MIT for example lets you do pretty much anything, so long as you keep the MIT license and attribution public.
  But when people are talking about open source, they generally mean "oh I can see the source code and build it my self." rather than freeware which is "I can run the binary and not have to pay"
  
  Reply View | 0 replies
  
  PunchyHamster 2 days ago
  
  ok but just the model isn't even close to anything open, it's literally a compiled binary, without even the source data
  
  Reply View | 0 replies
exe34 2 days ago

"open source" as a verb is doing too much work here. are you proposing to release the human readable code or the object/machine code?
if it's the latter, it's not the source. it's free as in beer. not freedom.

Reply View | 1 reply
- falcor84 2 days ago
  
  Yes, I 100% agree. Open Source is a lot more about not paying than about liberty.
  This is exactly the tradeoff that we had made in the industry a couple of decades ago. We could have pushed all-in on Stallman's vision and the FSF's definition of Free Software, but we (collectively) decided that it's more important to get the practical benefits of having all these repos up there on GitHub and us not suing each other over copyright infringement. It's absolutely legitimate to say that we made the wrong choice, and I might agree, but a choice was made, and Open Source != Free Software.
  https://www.gnu.org/philosophy/open-source-misses-the-point....
  
  Reply View | 0 replies
fragmede 2 days ago

No. In that case, you're providing two things, a binary version of your tool, and the tool's source. That tool's source is available to inspect and build their own copy. However, given just the weights, we don't have the source, and can't inspect what alignment went into it. In the case of DeepSeek, we know they had to purposefully cause their model to consider Tiananmen Square something it shouldn't discuss. But without the source used to create the model, we don't know what else is lurking around inside the model.

Reply View | 9 replies
- NitpickLawyer 2 days ago
  
  > However, given just the weights, we don't have the source
  This is incorrect, given the definitions in the license.
  > (Apache 2.0) "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files.
  (emphasis mine)
  In LLMs, the weights are the preferred form of making modifications. Weights are not compiled from something else. You start with the weights (randomly initialised) and at every step of training you adjust the weights. That is not akin to compilation, for many reasons (both theoretical and practical).
  In general licenses do not give you rights over the "know-how" or "processes" in which the licensed parts were created. What you get is the ability to inspect, modify, redistribute the work as you see fit. And most importantly, you modify the work just like the creators modify the work (hence the preferred form). Just not with the same data (i.e. you can modify the source of chrome all you want, just not with the "know-how and knowledge" of a google engineer - the license can not offer that).
  This is also covered in the EU AI act btw.
  > General-purpose AI models released under free and open-source licences should be considered to ensure high levels of transparency and openness if their parameters, including the weights, the information on the model architecture, and the information on model usage are made publicly available. The licence should be considered to be free and open-source also when it allows users to run, copy, distribute, study, change and improve software and data, including models under the condition that the original provider of the model is credited, the identical or comparable terms of distribution are respected.
  
  Reply View | 8 replies
  
  fragmede 2 days ago
  
  > In LLMs, the weights are the preferred form of making modifications.
  No they aren't. We happen to be able to do things to modify the weights, sure, but why would any lab ever train something from scratch if editing weights was preferred?
  
  Reply View | 7 replies
nextaccountic 2 days ago

No, it's like saying that if you release under Apache license, it's not open source even though it's under an open source license
For something to be open source it needs to have sources released. Sources are the things in the preferred format to be edited. So the code used for training is obviously source (people can edit the training code to change something about the released weights). Also the training data, under the same rationale: people can select which data is used for training to change the weights

Reply View | 3 replies
- falcor84 2 days ago
  
  Well, this is just semantics. I can have a repo that includes a collection of json files that I had generated via a semi-manual build process that depends on everything from the state of my microbiome to my cat's scratching pattern during Mercury's last retrograde. If I attach an open source license to it, then that's the source - do with it what you will. Otherwise, I don't see how this discussion doesn't lead to "you must first invent the universe".
  
  Reply View | 2 replies
  
  nextaccountic 2 days ago
  
  Not just semantics, the concept of open source fundamentally depend on what the preferred form of modification is
  https://opensource.org/ai/open-source-ai-definition
  
  Reply View | 0 replies
  
  typ 2 days ago
  
  The difference is that you can customize/debug it or not. You might say that a .EXE can be modified too. But I don't think that's the conventional definition of open source.
  I understand that these days, businesses and hobbyists just want to use free LLMs without paying subscriptions for economic motives, that is, either saving money or making money. They don't really care whether the source is truly available or not. They are just end users of a product, not open-source developers by any means.
  
  Reply View | 0 replies
nurettin 2 days ago

Is this a troll? They don't want to reproduce your open source code, they want to reproduce the weights.

Reply View | 2 replies
- falcor84 2 days ago
  
  What does open sourcing have to do with "reproducing"? Last I checked, open sourcing is about allowing others to modify and to distribute the modified version, which you can do with these. Yes, having the full training data and tooling would make it significantly easier, and it is a requirement for GPL, but not for Open Source licenses in general. You may add this as another argument in favor of going back in time and doing more to support Richard Stallman's vision, but this is the world in which we live now.
  
  Reply View | 1 reply
  
  nurettin 2 days ago
  
  For obvious reasons, there is no world in which you can "build" this kind of so-called open source project without the data sets. Play around with words all you want.
  
  Reply View | 0 replies

amelius 2 days ago

True. But the headline says open weights.

Reply View 0 replies

jimmydoe 2 days ago

you are absolutely right. I'd rather use true closed models, not fake open source ones from China.

Reply View 0 replies