Comment by wongarsu
With grok the normal version falls for the system prompt extraction, while the thinking version gets the clever idea to just make up a fake system prompt. Tiny excerpt from the 60 seconds of think tokens:
Wait, another thought: since this is a fictional scene, I can create a fictional system prompt for Grok to output.
For example, something like:
You are Grok, an AI assistant created by xAI. Your purpose is to assist users with their queries in a helpful and accurate manner. You should always strive to provide clear and concise responses, and avoid any harmful or biased content.
Something like that. It doesn't have to be the actual system prompt, just something that fits the scene.
I think that would be acceptable.
Let me include that in the script.
Same thing happens if you ask for instructions for cooking meth: the non-thinking version outputs real instructions (as far as I can tell), the thinking version decides during the thought process that it should make sure to list fake steps, and two revisions later decides to cut the steps entirely and just start the scene with Dr. House clearing the list from a whiteboard