Comment by gg82

Comment by gg82 3 days ago

4 replies

I wonder if embeddings could be created from open source and library code and then used to convert back the code with all the correct variable and function names.

Everdred2dx 3 days ago

It's not AI but Ghidra has a cool feature called BSim which does something similar. Each function get's a "feature vector" which now that I think about it has some clear parallels to embeddings.

  • MomsAVoxell 2 days ago

    BSim is a hash machine, right? (BSim uses feature vectors, and locality-sensitive hashing.)

    Embeddings could be derived from reconstituted hash.

  • mixel 3 days ago

    Wow that is cool, I bet with that feature and a huge database of known "feature vectors" from open-source libraries so you can focus on the actual business logic of the binary instead of trying to reverse external library functions

nekitamo 3 days ago

I've been wondering the same thing. However you would have to have a very large database of embeddings for this to be useful, right?

Otoh I can see this being disproportionately helpful with reverse Engineering Rust and Go binaries, which usually include many opensource dependencies