Comment by csande17

The type is the same, i.e., if you look at a type as an infinite set of values, they are the same infinite set. Yes, their in-memory representations might differ, but it means all values in one exist in the other, and only those, so conversion between them are infallible.

So in your last example, UTF-8 & UTF-32 are the same type, containing the same infinite set of values, and — of course — one can convert between them infallibly.

But you can't encode arbitrary Go strings in WTF-8 (some are not representable), you can't encode arbitrary Python strings in UTF-8 or WTF-8 (n.b. that upthread is wrong about Python being equivalent to Unicode scalars/well-formed UTF-*.) and attempts to do so might error. (E.g., `.encode('utf-8')` in Python on a `str` can raise.)

layer8 10 hours ago

By that logic, you could say ‘“UTF-8” aka “UTF-32”’, since they are encoding the same value space. But that’s just wrong.

Reply View 1 reply

deathanatos 6 hours ago

The type is the same, i.e., if you look at a type as an infinite set of values, they are the same infinite set. Yes, their in-memory representations might differ, but it means all values in one exist in the other, and only those, so conversion between them are infallible.
So in your last example, UTF-8 & UTF-32 are the same type, containing the same infinite set of values, and — of course — one can convert between them infallibly.
But you can't encode arbitrary Go strings in WTF-8 (some are not representable), you can't encode arbitrary Python strings in UTF-8 or WTF-8 (n.b. that upthread is wrong about Python being equivalent to Unicode scalars/well-formed UTF-*.) and attempts to do so might error. (E.g., `.encode('utf-8')` in Python on a `str` can raise.)

Reply View | 0 replies