Comment by wvbdmp

Comment by wvbdmp 10 hours ago

6 replies

Eh, sometimes you need a quick fix and it’s just extremely concise and readable. I’ll take an INNER JOIN over EXISTS (nice but insanely verbose) or CROSS APPLY (nice but slow) almost every time. Obviously you have to know what you’re dealing with, and I’m mostly talking about reporting, not perf critical application code.

Distinct is also easily explained to users, who are probably familiar with Excel’s “remove duplicate rows”.

It can also be great for exploring unfamiliar databases. I ask applicants to find stuff in a database they would never see by scrolling, and you’d be surprised how many don’t find it.

Sesse__ 10 hours ago

The less verbose way of doing semijoins is by an IN subquery.

  • wvbdmp 9 hours ago

    >subquery

    >less verbose

    Well…

    In any case, it depends. OP nicely guarded himself by writing “overusing”, so at that point his pro-tip is just a tautology and we are in agreement: not every use of DISTINCT is an immediate smell.

    • Sesse__ 9 hours ago

      What do you mean? Here are your real alternatives for doing a semijoin (assuming ANSI SQL, no vendor extensions):

        SELECT * FROM t1 WHERE EXISTS ( SELECT * FROM t2 WHERE t2.x = t1.x );
        SELECT * FROM t1 WHERE x IN ( SELECT x FROM t2 );
        SELECT * FROM t1 JOIN ( SELECT DISTINCT x FROM t2 ) s1 USING (x);
      
      Now tell me which one of these is the less verbose semijoin?

      You could argue that you could fake a semijoin using

        SELECT DISTINCT * FROM t1 JOIN t2 USING (x);
      
      or

        SELECT * FROM t1 JOIN t2 USING (x) GROUP BY t1.*;
      
      but it doesn't give the same result if t1 has duplicate rows, or if there is more than one t2 matching t1. (You can try to fudge it by replacing * with something else, in which case the problem just moves around, since “duplicate rows” will mean something else.)
      • wvbdmp 8 hours ago

        No, sorry, you’re certainly correct, I just meant that any subqueries are generally crazy verbose. And then you usually want additional Where clauses or even Joins in there, and it starts to stop looking like a Where clause, so I’m often happy when I can push that logic into From.