Comment by sigwinch28

Comment by sigwinch28 10 hours ago

5 replies

Or it’s simply an indicator of a schema that has not been excessively normalised (why create an addresses_cities table just to ensure no duplicate cities are ever written to the addresses table?)

valiant55 8 hours ago

It depends when you see it, but I agree that DISTINCT shouldn't be used in production. If I'm writing a one off query and DISTINCT gets me over the finish line sparing me a few minutes then that's fine.

sgarland 3 hours ago

Because a city/region/state can be uniquely identified with a postal code (hell, in Ireland, the entire address is encapsulated in the postal code), but the reverse is not true.

At scale, repeated low-cardinality columns matter a great deal.

  • virissimo an hour ago

    There are ZIP codes that overlap a city and also an unincorporated area. Furthermore, there are zip codes that overlap different states. A data model that renders these unrepresentable may come back to bite you.

  • pbnjay an hour ago

    FYI this is not true in the US. Zip codes identify postal routes not locations

echelon 5 hours ago

DISTINCT, as well as the other aggregation functions, are fantastic for offline analytics queries. I find a lot of use for them in reporting, non-production code.