Comment by esafak

Comment by esafak 2 days ago

5 replies

No, they are not. Model outputs can be discretized but the model parameters (excluding hyperparameters) are typically continuous. That's why we can use gradient descent.

bob1029 2 days ago

Where are the model parameters stored and how are they represented?

  • esafak 2 days ago

    In disk or memory as multidimensional arrays ("tensors" in ML speak).

    • bob1029 2 days ago

      Do we agree that these memories consist of a finite # of bits?

      • esafak 2 days ago

        Yes, of course.

        Consider a toy model with just 1000 double (64-bit), or 64Kb parameters. If you're going to randomly flip bits over this 2^64K search space while you evaluate a nontrivial fitness function, genetic style, you'll be waiting for a long time.

        • bob1029 2 days ago

          I agree if you approach it naively you will accomplish nothing.

          With some optimization, you can evolve programs with search spaces of 10^10000 states (i.e., 10 unique instructions, 10000 instructions long) and beyond.

          Visiting every possible combination is not the goal here.