Comment by esafak

Comment by esafak 2 days ago

14 replies

Where do the features come from, feature engineering? That's the method that failed the bitter lesson. Why would you use genetic programming when you can do gradient descent?

bob1029 2 days ago

> Where do the features come from, feature engineering? That's the method that failed the bitter lesson.

That would be the whole point of genetic programming. You don't have to do feature engineering at all.

Genetic programming is a more robust interpretation of the bitter lesson than transformer architecture and DNNs. You have less clever tricks you need to apply to get the job done. It is more about unmitigated raw compute than anything out there.

In my experiments, there are absolutely zero transformation, feature engineering, normalization, tokenization, etc. It is literally:

1. Copy input byte sequence to program data region

2. Execute program

3. Copy output byte sequence from program data region

Half of this problem is about how you search for the programs. The other half is about how you measure them. There isn't much other problem to worry about other than how many CPUs you have on hand.

  • esafak 2 days ago

    Where does the genome, genetic representation, you are evolving come from? The same raw features you use in neural networks? Then you optimize using that? If so, why not use gradient descent, which is faster? And this is still a step behind neural networks even apart from the optimization method, because neural networks use composition to learn features. How are you doing that?

    Do you have any real world examples of your method that are competitive with DL methods?

    • bob1029 2 days ago

      > Where does the genome, genetic representation, you are evolving come from

      The instruction set of the program that is being searched for.

      This is probably the best publicly available summary of the idea I am pursuing:

      https://github.com/kurtjd/brainfuck-evolved

      • esafak 2 days ago

        A program is composed of arbitrarily many instructions of your set. How are you accounting for this; trying every possible program length? And you are considering the simpler case where the search space is discrete, unlike the continuous spaces in most machine learning problems.

        I think you need to think this through some more. You may see there is a reason nobody uses genetic algorithms for real world tasks.

      • dartos 2 days ago

        you're talking about specifically using genetic programming to create new programs as opposed to gradient decend in LLMs to minimize a loss function, right?

        How would you construct a genetic algorithm to produce natural language like LLMs do?

        Forgive me if i'm misunderstanding, but in programming we have "tokens" which are minimal meaningful bits of code.

        For natural languages it's harder. "Words" are not super meaningful on their own, i don't think. (at least not as much as a token) so how would you break down natural language for a genetic algorithm?