Comment by layer8

Comment by layer8 8 hours ago

1 reply

The use of 8-bit extensions of ASCII (like the ISO 8859-x family) was ubiquitous for a few decades, and arguably still is to some extent on Windows (the standard Windows code pages). If ASCII had been 8-bit from the start, but with the most common characters all within the first 128 integers, which would seem likely as a design, then UTF-8 would still have worked out pretty well.

The accident of history is less that ASCII happens to be 7 bits, but that the relevant phase of computer development happened to primarily occur in an English-speaking country, and that English text happens to be well representable with 7-bit units.

necovek 2 hours ago

Most languages are well representable with 128 characters (7-bits) if you do not include English characters among those (eg. replace those 52 characters and some control/punctuation/symbols).

This is easily proven by the success of all the ISO-8859-*, Windows and IBM CP-* encodings, and all the *SCII (ISCII, YUSCII...) extensions — they fit one or more languages in the upper 128 characters.

It's mostly CJK out of large languages that fail to fit within 128 characters as a whole (though there are smaller languages too).