Comment by diggan

Comment by diggan a day ago

9 replies

I think you've kind of answered a different question. Yes, more LLM models could be created. But specifically Llama? Since it's an open source model, the assumption is that we could (given access to the same compute of course) train one from scratch ourselves, just like we can build our own binaries of open source software.

But this obviously isn't true for Llama, hence the uncertainty if Llama even is open source in the first place. If we cannot create something ourselves (again, given access to compute), how could it possibly be considered open source by anyone?

caseyy a day ago

I understand I was supposed to say “no” and question the open-source label. We’ve heard many arguments that if something can’t be reproduced from scratch, it’s not true open-source.

To me, they sound a bit like “no true Scotsman”. Llama is open source, compared to commercial models with closed weights. Even if it could be more open source.

That’s why I looked at it in a broader sense — what could happen in an open-source world to improve or replace Llama. Much could happen, thanks to Llama’s open nature, actually.

  • diggan a day ago

    > Llama is open source, compared to commercial models with closed weights

    Yeah, just like a turd is a piece of gourmet food if there is no other good food around.

    Sorry, but that's a really bad argument, "open source" is not a relative metric you use to compare different things, it's a label that is applied to something depend on what license that thing has. No matter what licenses others use, the license you use is still the license use.

    Especially when there are actually open source models out there, so it isn't possible. Maybe Meta feels like it's impossible because of X, Y and Z, but that doesn't make it true just because they don't feel like they could earn enough money on it, or whatever their reasoning is.

    • caseyy a day ago

      > Yeah, just like a turd is a piece of gourmet food if there is no other good food around.

      I didn't mean it's on a continuum, as you assumed. Apologies for phrasing it unclearly. I meant that the weights are public. They are open; there is no debate to be had about it. Generally and broadly, that is already considered open-source.

      And we all understand what "open-source" means in the context of Llama - it doesn't mean one of the idealized notions of open source, it means open weights.

      • diggan a day ago

        > Generally and broadly, that is already considered open-source.

        No, just because something is public doesn't mean it's open source, those are two very different things. If I upload code on my website without any license, that code is not now suddenly open source just because it's public. Just like Llama isn't suddenly "open source" because Meta's marketing department says so, their own legal department still call Llama proprietary, don't you wonder why that is?

        > And we all understand what "open-source" means in the context of Llama - it doesn't mean one of the idealized notions of open source, it means open weights.

        You, and some others (including Meta) are using a definition Meta came up with themselves, probably in order to try to skirt EU AI regulations as it's different for "open source" models vs others. I'm not sure why you as an individual would fall for it though, unless I'm missing something you have nothing to gain by spreading PR from Meta, do you?

        The existing definition of open source (before Meta's bastardization) is not a "idealized" definition, is the one we built an enormous ecosystem on top of, who taught a whole generation of programmers how to program and connected people together, without putting profits first.

ImprobableTruth a day ago

I think the fact that all (good) LLM datasets are full with licensed/pirated material means we'll never really see a decent open source model under the strict definition. Open weight + open source code is really the best we're going to get, so I'm fine with it coopting the term open source even if it doesn't fully apply.

  • diggan a day ago

    > we'll never really see a decent open source model under the strict definition

    But there are already a bunch of models like that, were everything (architecture, training data, training scripts, etc) is open, public and transparent. Since you weren't aware those existed since before, but you now know that, are you willing to change your perspective on it?

    > so I'm fine with it coopting the term open source even if it doesn't fully apply

    It really sucks that the community seems OK with this. I probably wouldn't have been a developer without FOSS, and I don't understand how it can seem OK to rob other people of this opportunity to learn from FOSS projects.

    • pabs3 a day ago

      Not all of the community is OK with this, lots of folks are strongly against OSI's bullshit OSAID for example. Really it should have been more like the Debian Deep Learning Team's Machine Learning Policy, just like last time when the OSI used the Debian Free Software Guidelines (DFSG) to create the Open Source Definition (OSD).

      https://salsa.debian.org/deeplearning-team/ml-policy