Comment by GuB-42

Comment by GuB-42 16 hours ago

6 replies

I suspect that the length of the offset of your input data in pi is equal to the length of the input data itself, plus or minus a few bytes at most, regardless of the size of the input data.

That is: no compression, but it won't make things worse either.

Unless the input data is the digits of pi, obviously, or the result of some computation involving pi.

gmuslera 4 hours ago

What if instead of the index of your full data, you store the index of smaller blocks? Would I need i.e. to use an 8kbytes or larger integer to store the offset all the possible 8k blocks?

It is meant to be a joke anyway.

  • sumtechguy 2 hours ago

    That would 'work' to a point. But my gut guess is it would end up with bigger data.

    Most algs that I have ever made. There are several places where your gains disappear. The dictionary lookup for me is where things come apart. Sometimes it is the encoding of the bytes/blocks themselves.

    In your example you could find all of the possible 8k blocks out there in pi. Now that number set would be very large. So it will be tough to get into your head how it is working. As it is not the whole of pi space you also probably need a dictionary or function to hold it or at least pointers to it.

    One way to tell if a compression alg is doing ok is to try to make the most minimal version of it then scale it out. For example start with a 4 bit/8 bit/16 bit value instead of 8k. Then see how much space it would take up. Now sometimes scaling it up will let you get better gains (not always). That is where you will have a pretty good idea if it works or not. Like just move from 1 byte to 2 then 4 and so on. Just to see if the alg works. That exercise also lets you see if there are different ways to encode the data that may help as well.

    I got nerd sniped about 3 decades ago on problems just like this. Still trying :)

noctune 5 hours ago

Some patterns must happen to repeat, so I would assume the offset to be larger, no?

MrLeap 14 hours ago

You could express the offset with scientific notation, tetration, and other big math number things. You probably don't need the whole offset number all at once!

  • GuB-42 14 hours ago

    Actually, you do.

    You can use all the math stuff like scientific notation, tetration, etc... but it won't help you make things smaller.

    Math notation is a form of compression. 10^9 is 1000000000, compressed. But the offset into pi is effectively a random number, and you can't compress random numbers no matter what technique you use, including math notation.

    This can be formalized and mathematically proven. The only thing wrong here is that pi is not a random number, but unless you are dealing with circles, it looks a lot like it, so while unproven, I think it is a reasonable shortcut.