Comment by bruce511

Comment by bruce511 9 hours ago

While the backward compatibility of utf-8 is nice, and makes adoption much easier, the backward compatibility does not come at any cost to the elegance of the encoding.

In other words, yes it's backward compatible, but utf-is also compact and elegant even without that.

nextaccountic 9 hours ago

UTF-8 also enables this mindblowing design for small string optimization - if the string has 24 bytes or less it is stored inline, otherwise it is stored on the heap (with a pointer, a length, and a capacity - also 24 bytes)

https://github.com/ParkMyCar/compact_str

How cool is that

(Discussed here https://news.ycombinator.com/item?id=41339224)

Reply View 3 replies

adgjlsfhk1 8 hours ago

How is that UTF8 specific?

Reply View | 2 replies
- ubitaco 6 hours ago
  
  It's slightly buried in the readme on Github:
  > how can we store a 24 byte long string, inline? Don't we also need to store the length somewhere?
  > To do this, we utilize the fact that the last byte of our string could only ever have a value in the range [0, 192). We know this because all strings in Rust are valid UTF-8, and the only valid byte pattern for the last byte of a UTF-8 character (and thus the possible last byte of a string) is 0b0XXXXXXX aka [0, 128) or 0b10XXXXXX aka [128, 192)
  
  Reply View | 0 replies
- [removed] 5 hours ago
  
  [deleted]
  
  Reply View | 0 replies