Comment by esafak

Comment by esafak 10 months ago

5 replies

No, they are not. Model outputs can be discretized but the model parameters (excluding hyperparameters) are typically continuous. That's why we can use gradient descent.

bob1029 10 months ago

Where are the model parameters stored and how are they represented?

  • esafak 10 months ago

    In disk or memory as multidimensional arrays ("tensors" in ML speak).

    • bob1029 10 months ago

      Do we agree that these memories consist of a finite # of bits?

      • esafak 10 months ago

        Yes, of course.

        Consider a toy model with just 1000 double (64-bit), or 64Kb parameters. If you're going to randomly flip bits over this 2^64K search space while you evaluate a nontrivial fitness function, genetic style, you'll be waiting for a long time.

        • bob1029 10 months ago

          I agree if you approach it naively you will accomplish nothing.

          With some optimization, you can evolve programs with search spaces of 10^10000 states (i.e., 10 unique instructions, 10000 instructions long) and beyond.

          Visiting every possible combination is not the goal here.