Comment by brcmthrowaway
Comment by brcmthrowaway 13 hours ago
Can someone tell me the mechanism by which the prompts are even recovered?
Cosma Shalizi says that this isn't possible. Are they in the training set? I doubt it.
http://bactra.org/notebooks/nn-attention-and-transformers.ht...
There's a detailed description of how they were recovered here: https://www.lesswrong.com/posts/vpNG99GhbBoLov9og/claude-4-5...
Plus these transcripts showing the chats: https://gist.github.com/Richard-Weiss/efe157692991535403bd7e...