Comment by hansvm

Comment by hansvm 11 hours ago

3 replies

There's value in "being able to master that stuff," and there's value in "having mastered that stuff." The latter lets you trim a lot of possible designs from your search space nearly instantly, letting you focus on routes which are actually viable. The former is only of similar power when you know the design in advance or there aren't many possible solutions.

For a simple example, suppose you need to operate on `n` permutations of an enormous collection of data (far more than fits in RAM or disk), and you need those permutations to be re-usable.

One simple solution is to shuffle the indices `n` times and store the results in your cluster, but even the shuffle process is slow with normal techniques because of inter-machine random-access bandwidth issues. When using those shuffled indices for anything, you're again bandwidth-limited if the task doesn't require access of every index.

With just a tiny bit of a math background, you'll recognize that an O(1)-state shuffle is possible, where you can create some `Permutation` object with a `permute()` method, taking in an index and outputting the corresponding index in your hypothetical shuffle. That permutation will be CPU-bound rather than bandwidth-bound.

The problem with "being able to master stuff" is that your search process in the design space is slow. If I went and told you that an O(1)-state shuffle existed and would be good for the problem, sure, you'd be able to go code that up without issues. What's the chance that you'd even know to try though?

> wide range of esoteric topics ... prepares you for a lifetime of learning

That's part of it, but each of those esoteric topics also give new ideas something to latch onto. Our brains are associative, and being able to look at a new thing and tie it to a few esoteric concepts is a bit of a superpower, even if the association is weak. The difference between knowing nothing other than how to learn and knowing what's vaguely potentially possible or not is weeks or months of research. It's the difference between having to do the dumb, slow thing and being the person promoted for saving $1m/yr fixing whatever you wrote. You can get by for a long time, maybe your entire career, just making shit work, but if you're looking for more money or prestige then there are better routes.

gopher_space 7 hours ago

> What's the chance that you'd even know to try though?

> [...] esoteric topics also give new ideas something to latch onto. Our brains are associative, and being able to look at a new thing and tie it to a few esoteric concepts is a bit of a superpower, even if the association is weak. The difference between knowing nothing other than how to learn and knowing what's vaguely potentially possible or not is weeks or months of research.

The only point I'd add to your paragraph is that this applies to every domain when you're on the job, not just math. I live in constant terror of discovering that other disciplines solved my problem like a hundred years ago.

thethirdone 8 hours ago

What "O(1)-state shuffle" could you possibly be talking about? It takes `O(nlogn)` space to store a permutation of list of length n. Any smaller and some permutations will be unrepresentable. I am very aware of this because shuffling a deck of cards correctly on a computer requires at least 200 random bits.

If the requirements are softer than "n random permutations", there might be a lot of potential solutions. It is very easy to come up with "n permutations" if you have no requirements on the randomness of them. Pick the lowest `k` such that `n < k!`, permute the first k elements leaving the rest in place, and now you have n distinct permutations storeable in `O(log(n)` (still not O(1) but close).

I know this is not really your point, but misusing `O(1)` is a huge pet peeve of mine.

  • hansvm 8 hours ago

    It's O(1) if you don't need access to every permutation (common in various monte carlo applications). 64-128 bits of entropy is good enough for a lot of applications, and that's all you get from any stdlib prng, so that's what I was comparing it to.

    Those sorts of applications would tend to not work well with a solution leaving most elements in the same place or with the same relative ordering.