Comment by kragen

Comment by kragen 3 days ago

7 replies

This is a 521-page CC-licensed book on optimization which looks absolutely fantastic. It starts out with modern gradient-based algorithms rooted in automatic differentiation, including recent things like Adam, rather than the historically more important linear optimization algorithms like the simplex method (the 24-page chapter 12 covers linear optimization). There are a number of chapters on things I haven't even heard of, and, best of all, there are exercises.

I've been wanting something like this for a long time, and I regret not knowing about the first edition.

If you are wondering why this is a more interesting problem than, say, sorting a list, the answer is that optimization algorithms are attempts at the ideal of a fully general problem solver. Instead of writing a program to solve the problem, you write a program to recognize what a solution would look like, which is often much easier, for example with a labeled dataset. Then you apply the optimization algorithm on your program. And that is how current AI is being done, with automatic differentiation and variants of Adam, but there are many other algorithms for optimization which may be better alternatives in some circumstances.

energy123 3 days ago

> ideal of a fully general problem solver

In practice that's basically the mindset, but full generality isn't technically possible because of the no free lunch theorem.

  • cchianel 3 days ago

    That depends; do you want the optimal solution?

    If so, I agree it is impossible for a fully general problem solver to find the optimal solution to a problem in a reasonable amount of time (unless P = NP, which is unlikely).

    However, if a "good enough" solution that is only 1% worse than optimal works, then a fully general solver can do the job in a reasonable amount of time.

    One such example of a fully general solver is Timefold; you express your constraints using plain old Java objects, so you can in theory do whatever you want in your constraint functions (you can even do network calls, but that is extremely ill-advised since that will drastically slow down score calculation speeds).

    Disclosure: I work for Timefold.

    • kragen 2 days ago

      No, guaranteeing that a solution to a general computational puzzle found in a finite amount of time is within some percentage of optimality is impossible. You must be talking about a restricted class of problems that enjoy some kind of tractability guarantee.

      • cchianel 2 days ago

        By 1% of optimal, I was giving an example percentage to clarify there are solutions that exists, that are almost as good as optimal, that can be found in reasonable amount of time.

        There cannot be a guarantee to find a solution to a given percentage worse than optimal for a fully general problem, since you would need to know optimal to give such a guarantee (and since the problem fully general, you cannot use the structure of the problem to reduce it).

        Most constraint problems have many feasible solutions, and have a way to judge how much worse or better one solution is to another.

        There are good and bad way to write constraints.

        One bad way to write constraints is score traps, where between one clearly better solution has the same score as a clearly worse solution.

        For example, for shift scheduling, a solution with only 1 overlapping shift with the same employee is better than a solution with 2 overlapping shifts with the same employee.

        A bad score function would penalize both solutions by 1, meaning a solver have no idea which of the two solutions are better.

        A good score function would penalize the schedule with 1 overlapping shift with the same employee by 1, and the schedule with 2 overlapping shifts with the same employee by 2.

        The class of problems I am talking about is the class of problems where you can assign a score to a possible solution, with limited score traps.

        Timefold has no guarantees about finding a solution in reasonable time (but unless you done something terribly wrong or have a truly massive dataset, it finds a good solution really quickly 99.99% of the time). Instead, you set the termination condition of the solver; it could be time-based (say 60 minutes), unimproved time spent (solve until no new better solutions are found after 60 minutes), or the first feasible solution (there are other termination conditions that can be set).

  • kragen 3 days ago

    I was thinking because of Gödel's incompleteness theorem, but maybe there are multiple kinds of full generality. Wolpert and Macready seem to have been thinking about problems that are too open-ended to even be able to write a program to recognize a good solution.