Comment by recursive

Comment by recursive 4 days ago

38 replies

I don't understand why so many people seem so fascinated by constructions like the library of Babel. Yes it contains the answers to all your questions, but there are some significant drawbacks.

* It has more wrong information than right information, with no way to tell the difference.

* If you had an oracle that could tell you how to get to the book you need, the navigation instructions to get to the book will be at least as long as the book, on average.

bonoboTP 4 days ago

The Library of Babel made me aware that choosing/finding is not super distinct from making/creating. Or discovery and invention. In math, there is distinction between "there exists" and "we can construct", but "we can construct" is similar to "we can find".

  • [removed] 4 days ago
    [deleted]
  • matheusmoreira 4 days ago

    I don't think they're equivalent. I think invention and creation aren't actually real. There is no "making" or "creating" when it comes to intellectual work.

    All computer files are sequences of bits. All sequences of bits are integers. All integers already exist in the infinite set of natural numbers. I can even calculate how big those numbers are given their bit count.

      digits(bits)   = ceil(bits * log10(2))
    
      digits(32)     = 10
      digits(64)     = 20
      digits(128)    = 39
      digits(256)    = 78
      digits(512)    = 155
      digits(1024)   = 309
    
      digits(20 KiB) = 49,321
      digits(2 GiB)  = 5,171,655,946
    
    We are merely discovering numbers through convoluted mental and technological processes. All our mental exertions result in the discovery of a number. This comment is a number.
    • bonoboTP 4 days ago

      Yes, I mean exactly this type of insight. Basically taking a digital photo with a camera technically also just picks out the "address" of your current environment within the space of all images. Any 4K 2-hour-length feature film in a digital format is also just an address in the space of all possible videos. The director, the actors, the whole crew did all that work in order to select that point from the space of possibilities, they didn't "create" anything. That movie already existed.

      Of course this is silly, but interesting nonetheless. And we routinely speak about such high-dimensional spaces in research and engineering. Or we can imagine optimization as traversing a pre-existing search space. It may be structured as a graph or perhaps a Euclidean space. And in that space we can imagine a loss surface, that sits there in peace all along, with its global minimum somewhere. And instead of "constructing" a solution, we are simply hiking in this space and trying to spot that valley. But this is a bit fictional. We never physically "instantiate" this surface. It's an imagined abstraction. In reality we just have a vector and some rules as to how we change that vector. But we can imagine those changes to be movements in an imagined space.

      It's like the idea that the sculptor doesn't create the sculpture, the sculpture was there all along, he just had to remove the superfluous matter to reveal what was already there (i.e. the atoms belonging to the final sculpture).

      The most interesting thing is kind of on the border, between these absurdly large spaces and the more manageable ones that are feasible to enumerate.

      Another similar mindblow thing was when I forgot the password to a file that I encrypted. It's a fascinating thing that the bit pattern on the disk is functionally random now, and cracking it would take longer than the age of the universe. But if only I knew the password, it would only take just a second. There is a definite sequence of keystrokes I can execute to bring the universe in a state where the content will appear on my screen, it's so close, yet it's so-so far if you don't remember the password. Just a little difference in your brain state and it flips from trivial to hopeless.

      PS, if you like thinking about such things, I recommend Meta-Math by Gregory Chaitin, it's very fun (providing an address VS constructing the thing is basically the gist of algorithmic information theory).

      • matheusmoreira 4 days ago

        Yeah I agree with you.

        > It's like the idea that the sculptor doesn't create the sculpture, the sculpture was there all along, he just had to remove the superfluous matter to reveal what was already there (i.e. the atoms belonging to the final sculpture).

        I understand this argument but I have far more trouble applying this logic to real things. I'm not sure the same logic applies once the information is instantiated in the real world as a physical object. I haven't thought very deeply about it. I think the true sculpture exists only in the ideal world and the real world object is merely an approximation of it.

        > Of course this is silly

        It's an existential issue for me. At some point it became a political issue. I became a copyright abolitionist because of this insight. Copyright is logically reducible to monopolistic ownership of numbers. The sheer absurdity of it led me to reject the very idea of intellectual property as delusional nonsense.

    • synctext 4 days ago

      How to find a nice SHA1 hash? How do keyword search in this list? Search and discovery of quality are unsolved scientific challenges. Fascinating stuff.

      At our university lab we've been working on this for 25 years. Building a search engine is the easy part. Keeping a federated server with a billion users running is unsolved. Creating a fully -serverless- decentralised search engine is possible, you also need self-funding economy. Seems we're one of the few labs worldwide to still make actual operational prototypes of this stuff. More shameless self promotion:

      "SwarmSearch: Decentralized Search Engine with Self-Funding Economy" [0]

      Really handy to have s search engine to search this webpage with 45,671,926,166,590,716,193,865,151,022,383,844,364,247,891,968 pages and the rest of the web (no spyware, no tracking).

      [0] https://arxiv.org/abs/2505.07452

      • lurk2 4 days ago

        If you’re interested in mass market adoption rather than just proving the theory, you will need to change the name. “LimeWire” is fun. “SwarmSearch” sounds like a biblical plague.

    • ghc 4 days ago

      I admit thinking this way is tempting, but in your model the number represents some kind of language, whether human-readable or machine-readable. If we accept the number is a non-lossy encoding of some language, we reach an equivalency stating there is no creating, just discovering language "through convoluted mental and technological processes". But can we really equate language and knowledge? I believe Gödel proved that we cannot, in the sense that there is no "perfect" way to encode knowledge in a system of consistent axioms. Ergo, no matter how eloquently you describe your invention of "the wheel", it is by its nature incomplete and imperfect. Some part of the knowledge will always be tacit.

    • jimbo808 4 days ago

      This conflates mathematical existence with actual instantiation. A 2gb integer might be definable, but until someone encodes a particular arrangement of bits and gives it context, it doesn’t exist in any practical sense. We don’t treat all future novels as "already written" just because their ASCII codes can be mapped to integers.

      • matheusmoreira 4 days ago

        I said all novels already exist. That's different from claiming all novels have already been written.

        The claim is that humans are not "creators" but generators, very much in the random number generator sense. We are interesting number generators.

        • jacquesm 7 hours ago

          Sorry, but this is just complete numerological nonsense. All novels do not already exist. My proof is that if they already exist you will show me a novel that will come out a year from now today. The act of creation, of ordering words and other symbols in such a way that they convey a particular meaning is a non-trivial exercise to the point that we have created laws and reward structures for the people doing such organization. If we follow your reductive reasoning all media already exist. But they do not. The underlying principle here is the one of ordering, to take a chaotic or boring concept (say, an array of random or blank bits) and to impose order on it so that they take on meaning when used in combination with a suitable interpreter.

          This kind of imposing or order is an act of lowering the entropy of the sample in a very specific way, parties that know the 'key' to the sample will be able to experience the sample in a way that parties without the key would not, to them the sample is still boring or random. Your reduction of the act of creation to picking a particular number is belying the fact that absolutely nobody that creates something is picking that number: the number is a carrier, it is not the ideas embedded in it. You could translate that novel (or textbook, or sound or video or any other medium) into other media, descriptive, literal or you could even completely transform it. And there would still be a relationship to the original creation, hence the concept of a 'derived work', which for your numbers example would utterly fail: you could not take that number outside of knowing its meaning and come up with any of these derivations without having the key to decode it.

          This kind of reductive reasoning is not helpful, it merely attempts to flatten a whole pile of some of the most accomplished and positive contributions by humanity to the generation of interesting numbers. And it is so much more than that.

          Besides all this, any kind of attempt to digitize an actual work of art, rather than just a simple text is going to be a lossy process. You are never going to be able to replicate the original to the point that you have created something that is equal. You may be able to get close but it won't be the same thing. More so for sculptures than for two dimensional art, less so for for instance audio where the replication gear is getting really good. But generation loss is a thing and if you re-create and re-digitize then after a surprisingly low number of such generations you will end up with noise.

          Authors, sculptors, painters, even programmers and other creative people are so much more than interesting number generators, even if their works can be encoded or approximated numerically. That's flipping the encoding analogy on its head, the map really isn't the territory.

      • iberator 3 days ago

        btw. Compressed(at ALU level) 2 GB int is plausible. LOL Sounds like a funny idea for virtual cpu

    • jama211 4 days ago

      I would say that that’s a valid _model_ we can use to describe creation, much like how maths is a model we use to describe the universe. However, whether maths IS the universe or creation IS discovery are more of a philosophical question, possibly an unanswerable one, that people will have many varying opinions on.

      And that’s without me asking you to define “real”, which would be another rabbit hole.

cryzinger 4 days ago

To your first bullet, I believe this is one of the central points of the original Borges story :)

  • cantor_S_drug 4 days ago

    I think Library of Babel by Borges is a static manifestation of Turing complete behaviour via the fact that some L-systems are Turing complete. or put another way. Where in the Library of Babel, does the real Hamlet reside? If we consider finding and replacing names with other names, is it still a Hamlet? And if we bring the full force of edit operations and do these in a reversible manner, then where does the actual Hamlet reside? An equivalence class of Hamlet?

Chinjut 4 days ago

Everyone is aware of this. Sites like this aren't created to be useful. They are created to be an amusement, a joke.

AnthonyMouse 4 days ago

> If you had an oracle that could tell you how to get to the book you need, the navigation instructions to get to the book will be at least as long as the book, on average.

This isn't quite true. Natural language text compresses extremely well and you would only need length equivalent to the compressed form, not the original form. And if you wanted to go further, you could use a mapping where extremely short strings map to known popular books and only unknown works have longer encodings.

  • recursive 4 days ago

    I suppose this would work if the library was arranged such that comprehensible books were closer to the "origin". The workings of the "real" library of babel are supposed to be more inscrutable though.

    But if I built one, it would totally work that way.

Llamamoe 4 days ago

I wonder if there is some way to create a latent-space Library of Babel in which you only find incoherent gibberish with extremely long keys, with the shortest ones pointing specifically to the most common/likely strings of text, in manageable computational complexity.

  • recursive 4 days ago

    Reproducing the text of a book in the library is a synonym for identifying the book. So this is really called "text compression", which is a well-studied field.

  • samsartor 4 days ago

    In a library of all possible strings, this is just text compression (as the other comment observes). But in a finite library it gets even simpler, in a cool way! We can treat each text as a unique symbol and use an entropy encoding (eg Huffman) to assign length-optimized key to each based on likelihood (eg from an LLM). Building the library is something like O(n log n), which isn't terrible. But adding new texts would change the IDs for existing texts (which is annoying). There might be a good way to reserve space for future entries probabilistically? Out of my depth at this point!

  • lxgr 4 days ago

    That's arguably just a regular library :)

a_shovel 4 days ago

Another way of looking at it is that the library of Babel would be less useful than an equivalent quantity of blank paper. For example, you could use it to print books in English instead of gibberish. Multiple copies of those books, even.

[removed] 4 days ago
[deleted]
kristianp 3 days ago

> If you had an oracle that could tell you how to get to the book you need, the navigation instructions to get to the book will be at least as long as the book, on average.

Only if the oracle has all books that could possibly exist. If you're trying to find a book that already exists, that set is infinitely smaller.

  • recursive 3 days ago

    The oracle doesn't have the books. The library does. And it has all of them. Directions to each book depend only on the layout and contents of the library.