Comment by viraptor

The model can be trained to interpret "don't hallucinate" as "refer only to the provided context and known facts, do not guess or extrapolate new information", which wouldn't get rid of the issue completely, but likely would improve the quality if that's what you're after and if there's enough training data for "I don't know" responses.

(But it all depends on the fine-tuning they did, so who knows, maybe it's just an Easter egg)