Comment by gg82
I wonder if embeddings could be created from open source and library code and then used to convert back the code with all the correct variable and function names.
I wonder if embeddings could be created from open source and library code and then used to convert back the code with all the correct variable and function names.
BSim is a hash machine, right? (BSim uses feature vectors, and locality-sensitive hashing.)
Embeddings could be derived from reconstituted hash.
I've been wondering the same thing. However you would have to have a very large database of embeddings for this to be useful, right?
Otoh I can see this being disproportionately helpful with reverse Engineering Rust and Go binaries, which usually include many opensource dependencies
It's not AI but Ghidra has a cool feature called BSim which does something similar. Each function get's a "feature vector" which now that I think about it has some clear parallels to embeddings.