wvbdmp 9 hours ago

>subquery

>less verbose

Well…

In any case, it depends. OP nicely guarded himself by writing “overusing”, so at that point his pro-tip is just a tautology and we are in agreement: not every use of DISTINCT is an immediate smell.

  • Sesse__ 9 hours ago

    What do you mean? Here are your real alternatives for doing a semijoin (assuming ANSI SQL, no vendor extensions):

      SELECT * FROM t1 WHERE EXISTS ( SELECT * FROM t2 WHERE t2.x = t1.x );
      SELECT * FROM t1 WHERE x IN ( SELECT x FROM t2 );
      SELECT * FROM t1 JOIN ( SELECT DISTINCT x FROM t2 ) s1 USING (x);
    
    Now tell me which one of these is the less verbose semijoin?

    You could argue that you could fake a semijoin using

      SELECT DISTINCT * FROM t1 JOIN t2 USING (x);
    
    or

      SELECT * FROM t1 JOIN t2 USING (x) GROUP BY t1.*;
    
    but it doesn't give the same result if t1 has duplicate rows, or if there is more than one t2 matching t1. (You can try to fudge it by replacing * with something else, in which case the problem just moves around, since “duplicate rows” will mean something else.)
    • wvbdmp 8 hours ago

      No, sorry, you’re certainly correct, I just meant that any subqueries are generally crazy verbose. And then you usually want additional Where clauses or even Joins in there, and it starts to stop looking like a Where clause, so I’m often happy when I can push that logic into From.

      • Sesse__ 8 hours ago

        Yes, I would certainly prefer if you could write

        SELECT * FROM t1 SEMIJOIN t2 USING (x);

        although it creates some extra problems for the join optimizer.

        • Little_Kitty 5 hours ago

          It's great being able to use an any join (and the counterpart anti join) in Clickhouse to deal with these operations.