Comment by Sesse__
The less verbose way of doing semijoins is by an IN subquery.
The less verbose way of doing semijoins is by an IN subquery.
What do you mean? Here are your real alternatives for doing a semijoin (assuming ANSI SQL, no vendor extensions):
SELECT * FROM t1 WHERE EXISTS ( SELECT * FROM t2 WHERE t2.x = t1.x );
SELECT * FROM t1 WHERE x IN ( SELECT x FROM t2 );
SELECT * FROM t1 JOIN ( SELECT DISTINCT x FROM t2 ) s1 USING (x);
Now tell me which one of these is the less verbose semijoin?You could argue that you could fake a semijoin using
SELECT DISTINCT * FROM t1 JOIN t2 USING (x);
or SELECT * FROM t1 JOIN t2 USING (x) GROUP BY t1.*;
but it doesn't give the same result if t1 has duplicate rows, or if there is more than one t2 matching t1. (You can try to fudge it by replacing * with something else, in which case the problem just moves around, since “duplicate rows” will mean something else.)No, sorry, you’re certainly correct, I just meant that any subqueries are generally crazy verbose. And then you usually want additional Where clauses or even Joins in there, and it starts to stop looking like a Where clause, so I’m often happy when I can push that logic into From.
It's great being able to use an any join (and the counterpart anti join) in Clickhouse to deal with these operations.
>subquery
>less verbose
Well…
In any case, it depends. OP nicely guarded himself by writing “overusing”, so at that point his pro-tip is just a tautology and we are in agreement: not every use of DISTINCT is an immediate smell.