Comment by account42

Comment by account42 2 days ago

2 replies

It conceptually uses arrays of code points, which need up to 24 bits. Optimizing the storage to use smaller integers when possible is an implementation detail.

jibal 2 days ago

Python3 is specified to use arrays of 8, 16, or 32 bit units, depending on the largest code point in the string. As a result, all code points in all strings are O(1) indexable. The claim that "Python 3 internally uses UTF-32" is simply false.

zahlman a day ago

> code points, which need up to 24 bits

They need at most 21 bits. The bits may only be available in multiples of 8, but the implementation also doesn't byte-pack them into 24-bit units, so that's moot.