Comment by Sesse__

Comment by Sesse__ 11 hours ago

3 replies

What do you mean? Here are your real alternatives for doing a semijoin (assuming ANSI SQL, no vendor extensions):

  SELECT * FROM t1 WHERE EXISTS ( SELECT * FROM t2 WHERE t2.x = t1.x );
  SELECT * FROM t1 WHERE x IN ( SELECT x FROM t2 );
  SELECT * FROM t1 JOIN ( SELECT DISTINCT x FROM t2 ) s1 USING (x);
Now tell me which one of these is the less verbose semijoin?

You could argue that you could fake a semijoin using

  SELECT DISTINCT * FROM t1 JOIN t2 USING (x);
or

  SELECT * FROM t1 JOIN t2 USING (x) GROUP BY t1.*;
but it doesn't give the same result if t1 has duplicate rows, or if there is more than one t2 matching t1. (You can try to fudge it by replacing * with something else, in which case the problem just moves around, since “duplicate rows” will mean something else.)
wvbdmp 10 hours ago

No, sorry, you’re certainly correct, I just meant that any subqueries are generally crazy verbose. And then you usually want additional Where clauses or even Joins in there, and it starts to stop looking like a Where clause, so I’m often happy when I can push that logic into From.

  • Sesse__ 10 hours ago

    Yes, I would certainly prefer if you could write

    SELECT * FROM t1 SEMIJOIN t2 USING (x);

    although it creates some extra problems for the join optimizer.

    • Little_Kitty 8 hours ago

      It's great being able to use an any join (and the counterpart anti join) in Clickhouse to deal with these operations.