Comment by diyer22

Comment by diyer22 4 days ago

6 replies

Yes, it's absolutely possible—just like how diffusion LLMs work, we can do the same with DDN LLMs.

I made an initial attempt to combine [DDN with GPT](https://github.com/Discrete-Distribution-Networks/Discrete-D...), aiming to remove tokenizers and let LLMs directly model binary strings. In each forward pass, the model adaptively adjusts the byte length of generated content based on generation difficulty (naturally supporting speculative sampling).

vintermann 4 days ago

This is what I find most impressive, that it's a natural hierarchial method which seems so general, yet is actually quite competitive. I feel like the machine learning community has been looking for that for a long time. Non-generative uses (like hierarchial embeddings, maybe? Making Dewey's decimal like embeddings for anything!) are even more exciting.

  • diyer22 4 days ago

    Exactly! The paragraph on Efficient Data Compression Capability in the original paper also highlights:

    > To our knowledge, Taiji-DDN is the first generative model capable of directly transforming data into a semantically meaningful binary string which represents a leaf node on a balanced binary tree.

    This property excites me just as much.

cubefox 4 days ago

This sounds a bit like H-Net [1] or Byte Latent Transformer [2].

1: https://arxiv.org/abs/2507.07955

2: https://arxiv.org/abs/2412.09871

  • diyer22 3 days ago

    It does seem that way — we’re both trying to overcome the limitations imposed by LLM tokenization to achieve a truly end-to-end model.

    And, their work is far more polished; I’ve only put together a quick GPT+DDN proof-of-concept.

    Thank you for sharing.

  • lukan 3 days ago

    I vouched for this comment. Your account seems to be shadow banned, but your last comments look fine to me, so you maybe want to email dang to revoke that status ..