Comment by ACCount37

Comment by ACCount37 17 hours ago

2 replies

Extracted system prompts are usually very, very accurate.

It's a slightly noisy process, and there may be minor changes to wording and formatting. Worst case, sections may be omitted intermittently. But system prompts that are extracted by AI-whispering shamans are usually very consistent - and a very good match for what those companies reveal officially.

In a few cases, the extracted prompts were compared to what the companies revealed themselves later, and it was basically a 1:1 match.

If this "soul document" is a part of the system prompt, then I would expect the same level of accuracy.

If it's learned, embedded in model weights? Much less accurate. It can probably be recovered fully, with a decent level of reliability, but only with some statistical methods and at least a few hundred $ worth of AI compute.

simonw 16 hours ago

It's not part of the system prompt.

  • astrange 4 hours ago

    It's very unclear to me how it could be recovered if it wasn't part of the system prompt, especially how Claude knows it's called the "soul doc" if that was an internal nickname.

    I mean, obviously we know how it happened - the text was shown to it during late-era post-training or SFT multiple times. That's the only way it could have memorized it. But I don't see the point in having it memorize such a document.