Comment by miven
AFAIK retrieving documents that look like the query is more commonly avoided by using a bi-encoder explicitly trained for retrieval, those generally are conditioned to align embeddings of queries to those of relevant documents, with each having a dedicated token marker, something like [QUERY] and [DOC], to make the distinction clear. The strong suit of HyDE seems to be more in working better in settings where the documents and queries you're working with are too niche to be properly understood by a generic retrieval model and you don't have enough concrete retrieval data to fine-tune a specialized model.