Comment by xyzzyz

Comment by xyzzyz a day ago

Go was designed by some old-school folks that maybe stuck a bit too hard to their principles, losing sight of the practical conveniences.

I'd say that it's entirely the other way around: they stuck to the practical convenience of solving the problem that they had in front of them, quickly, instead of analyzing the problem from the first principles, and solving the problem correctly (or using a solution that was Not Invented Here).

Go's filesystem API is the perfect example. You need to open files? Great, we'll create

  func Open(name string) (*File, error)

function, you can open files now, done. What if the file name is not valid UTF-8, though? Who cares, hasn't happen to me in the first 5 years I used Go.

jerf a day ago

While the general question about string encoding is fine, unfortunately in a general-purpose and cross-platform language, a file interface that enforces Unicode correctness is actively broken, in that there are files out in the world it will be unable to interact with. If your language is enforcing that, and it doesn't have a fallback to a bag of bytes, it is broken, you just haven't encountered it. Go is correct on this specific API. I'm not celebrating that fact here, nor do I expect the Go designers are either, but it's still correct.

Reply View 13 replies

klodolph a day ago

This is one of those things that kind of bugs me about, say, OsStr / OsString in Rust. In theory, it’s a very nice, principled approach to strings (must be UTF-8) and filenames (arbitrary bytes, almost, on Linux & Mac). In practice, the ergonomics around OsStr are horrible. They are missing most of the API that normal strings have… it seems like manipulating them is an afterthought, and it was assumed that people would treat them as opaque (which is wrong).
Go’s more chaotic approach to allow strings to have non-Unicode contents is IMO more ergonomic. You validate that strings are UTF-8 at the place where you care that they are UTF-8. (So I’m agreeing.)

Reply View | 11 replies
- duckerude a day ago
  
  The big problem isn't invalid UTF-8 but invalid UTF-16 (on Windows et al). AIUI Go had nasty bugs around this (https://github.com/golang/go/issues/59971) until it recently adopted WTF-8, an encoding that was actually invented for Rust's OsStr.
  WTF-8 has some inconvenient properties. Concatenating two strings requires special handling. Rust's opaque types can patch over this but I bet Go's WTF-8 handling exposes some unintuitive behavior.
  There is a desire to add a normal string API to OsStr but the details aren't settled. For example: should it be possible to split an OsStr on an OsStr needle? This can be implemented but it'd require switching to OMG-WTF-8 (https://rust-lang.github.io/rfcs/2295-os-str-pattern.html), an encoding with even more special cases. (I've thrown my own hat into this ring with OsStr::slice_encoded_bytes().)
  The current state is pretty sad yeah. If you're OK with losing portability you can use the OsStrExt extension traits.
  
  Reply View | 1 reply
  
  klodolph a day ago
  
  Yeah, I avoided talking about Windows which isn’t UTF-16 but “int16 string” the same way Unix filenames are int8 strings.
  IMO the differences with Windows are such that I’m much more unhappy with WTF-8. There’s a lot that sucks about C++ but at least I can do something like
  #if _WIN32 using pathchar = wchar_t; constexpr pathchar sep = L'\\'; #else using pathchar = char; constexpr pathchar sep = '/'; #endif using pathstring = std::basic_string<pathchar>;
  Mind you this sucks for a lot of reasons, one big reason being that you’re directly exposed to the differences between path representations on different operating systems. Despite all the ways that this (above) sucks, I still generally prefer it over the approaches of Go or Rust.
  
  Reply View | 0 replies
- Kinrany a day ago
  
  > You validate that strings are UTF-8 at the place where you care that they are UTF-8.
  The problem with this, as with any lack of static typing, is that you now have to validate at _every_ place that cares, or to carefully track whether a value has already been validated, instead of validating once and letting the compiler check that it happened.
  
  Reply View | 7 replies
  
  klodolph a day ago
  
  In practice, the validation generally happens when you convert to JSON or use an HTML template or something like that, so it’s not so many places.
  Validation is nice but Rust’s principled approach leaves me high and dry sometimes. Maybe Rust will finish figuring out the OsString interface and at that point we can say Rust has “won” the conversation, but it’s not there yet, and it’s been years.
  
  Reply View | 6 replies
- pas 21 hours ago
  
  It's completely in-line with Rust's approach. Concentrate on the hard stuff that lifts every boat. Like the type system, language features, and keep the standard library very small, and maybe import/adopt very successful packages. (Like once_cell. But since removing things from std is considered a forever no-no, it seems path handling has to be solved by crates. Eg. https://github.com/chipsenkbeil/typed-path )
  
  Reply View | 0 replies
[removed] a day ago

[deleted]

Reply View | 0 replies

stouset a day ago

[flagged]

Reply View 17 replies

blibble a day ago

> Golang makes it easy to do the dumb, wrong, incorrect thing that looks like it works 99.7% of the time. How can that be wrong? It works in almost all cases!
my favorite example of this was the go authors refusing to add monotonic time into the standard library because they confidently misunderstood its necessity
(presumably because clocks at google don't ever step)
then after some huge outages (due to leap seconds) they finally added it
now the libraries are a complete a mess because the original clock/time abstractions weren't built with the concept of multiple clocks
and every go program written is littered with terrible bugs due to use of the wrong clock
https://github.com/golang/go/issues/12914 (https://github.com/golang/go/issues/12914#issuecomment-15075... might qualify for the worst comment ever)

Reply View | 1 reply
- 0cf8612b2e1e a day ago
  
  This issue is probably my favorite Goism. Real issue identified and the feedback is, “You shouldn’t run hardware that way. Run servers like Google does without time jumping.” Similar with the original stance to code versioning. Just run a monorepo!
  
  Reply View | 0 replies
0x696C6961 a day ago

[flagged]

Reply View | 4 replies
- jack_h a day ago
  
  It’s not about making zero mistakes, it’s about learning from previous languages which made mistakes and not repeating them. I decided against using go pretty early on because I recognized just how many mistakes they were repeating that would end up haunting maintainers.
  
  Reply View | 3 replies
  
  0x696C6961 a day ago
  
  [flagged]
  
  Reply View | 2 replies
jen20 a day ago

I can count on fewer hands the number of times I've been bitten by such things in over 10 years of professional Go vs bitten just in the last three weeks by half-assed Java.

Reply View | 8 replies
- gf000 a day ago
  
  There is a lot to say about Java, but the libraries (both standard lib and popular third-party ones) are goddamn battle-hardened, so I have a hard time believing your claim.
  
  Reply View | 5 replies
  
  p2detar a day ago
  
  They might very well be, because time-handling in Java almost always sucked. In the beginning there was java.util.Date and it was very poorly designed. Sun tried to fix that with java.util.Calendar. That worked for a while but it was still cumbersome, Calendar.getInstance() anyone? After that someone sat down and wrote Joda-Time, which was really really cool and IMO the basis of JSR-310 and the new java.time API. So you're kind of right, but it only took them 15 years to make it right.
  
  Reply View | 2 replies
  
  jen20 a day ago
  
  You can believe what you like, of course, but "battle tested" does not mean "isn't easy to abuse".
  
  Reply View | 0 replies
  
  tom_m 18 hours ago
  
  ROFL really?
  
  Reply View | 0 replies
- stouset a day ago
  
  Is golang better than Java? Sure, fine, maybe. I'm not a Java expert so I don't have a dog in the race.
  Should and could golang have been so much better than it is? Would golang have been better if Pike and co. had considered use-cases outside of Google, or looked outward for inspiration even just a little? Unambiguously yes, and none of the changes would have needed it to sacrifice its priorities of language simplicity, compilation speed, etc.
  It is absolutely okay to feel that go is a better language than some of its predecessors while at the same time being utterly frustrated at the the very low-hanging, comparatively obvious, missed opportunities for it to have been drastically better.
  
  Reply View | 0 replies
- [removed] a day ago
  
  [deleted]
  
  Reply View | 0 replies
yehyehboi 8 hours ago

[flagged]

Reply View | 0 replies

herbstein a day ago

Much more egregious is the fact that the API allows returning both an error and a valid file handle. That may be documented to not happen. But look at the Read method instead. It will return both errors and a length you need to handle at the same time.

Reply View 2 replies

nasretdinov a day ago

The Read() method is certainly an exception rather than a rule. The common convention is to return nil value upon encountering an error unless there's real value in returning both, e.g. for a partial read that failed in the end but produced some non-empty result nevertheless. It's a rare occasion, yes, but if you absolutely have to handle this case you can. Otherwise you typically ignore the result if err!=nil. It's a mess, true, but real world is also quite messy unfortunately, and Go acknowledges that

Reply View | 1 reply
- stouset a day ago
  
  Go doesn't acknowledge that. It punts.
  Most of the time if there's a result, there's no error. If there's an error, there's no result. But don't forget to check every time! And make sure you don't make a mistake when you're checking and accidentally use the value anyway, because even though it's technically meaningless it's still nominally a meaningful value since zero values are supposed to be meaningful.
  Oh and make sure to double-check the docs, because the language can't let you know about the cases where both returns are meaningful.
  The real world is messy. And golang doesn't give you advance warning on where the messes are, makes no effort to prevent you from stumbling into them, and stands next to you constantly criticizing you while you clean them up by yourself. "You aren't using that variable any more, clean that up too." "There's no new variables now, so use `err =` instead of `err :=`."
  
  Reply View | 0 replies

koakuma-chan a day ago

> What if the file name is not valid UTF-8

Nothing? Neither Go nor the OS require file names to be UTF-8, I believe

Reply View 22 replies

zimpenfish a day ago

> Nothing?
It breaks. Which is weird because you can create a string which isn't valid UTF-8 (eg "\xbd\xb2\x3d\xbc\x20\xe2\x8c\x98") and print it out with no trouble; you just can't pass it to e.g. `os.Create` or `os.Open`.
(Bash and a variety of other utils will also complain about it being valid UTF-8; neovim won't save a file under that name; etc.)

Reply View | 12 replies
- yencabulator a day ago
  
  That sounds like your kernel refusing to create that file, nothing to do with Go.
  $ cat main.go package main import ( "log" "os" ) func main() { f, err := os.Create("\xbd\xb2\x3d\xbc\x20\xe2\x8c\x98") if err != nil { log.Fatalf("create: %v", err) } _ = f } $ go run . $ ls -1 ''$'\275\262''='$'\274'' ⌘' go.mod main.go
  
  Reply View | 10 replies
  
  kragen 7 hours ago
  
  I've posted a longer explanation in https://news.ycombinator.com/item?id=44991638. I'm interested to hear which kernel and which firesystem zimpenfish is using that has this problem.
  
  Reply View | 2 replies
  
  commandersaki a day ago
  
  I'm confused, so is Go restricted to UTF-8 only filenames, because it can read/write arbitrary byte sequences (which is what string can hold), which should be sufficient for dealing with other encodings?
  
  Reply View | 5 replies
  
  zimpenfish a day ago
  
  > That sounds like your kernel refusing to create that file
  Yes, that was my assumption when bash et al also had problems with it.
  
  Reply View | 0 replies
- kragen a day ago
  
  It sounds like you found a bug in your filesystem, not in Golang's API, because you totally can pass that string to those functions and open the file successfully.
  
  Reply View | 0 replies
johncolanduoni a day ago

Well, Windows is an odd beast when 8-bit file names are used. If done naively, you can’t express all valid filenames with even broken UTF-8 and non-valid-Unicode filenames cannot be encoded to UTF-8 without loss or some weird convention.
You can do something like WTF-8 (not a misspelling, alas) to make it bidirectional. Rust does this under the hood but doesn’t expose the internal representation.

Reply View | 7 replies
- jstimpfle a day ago
  
  What do you mean by "when 8-bit filenames are used"? Do you mean the -A APIs, like CreateFileA()? Those do not take UTF-8, mind you -- unless you are using a relatively recent version of Windows that allows you to run your process with a UTF-8 codepage.
  In general, Windows filenames are Unicode and you can always express those filenames by using the -W APIs (like CreateFileW()).
  
  Reply View | 2 replies
  
  johncolanduoni 21 hours ago
  
  Windows filenames in the W APIs are 16-bit (which the A APIs essentially wrap with conversions to the active old-school codepage), and are normally well formed UTF-16. But they aren’t required to be - NTFS itself only cares about 0x0000 and 0x005C (backslash) I believe, and all layers of the stack accept invalid UTF-16 surrogates. Don’t get me started on the normal Win32 path processing (Unicode normalization, “COM” is still a special file, etc.), some of which can be bypassed with the “\\?\” prefix when in NTFS.
  The upshot is that since the values aren’t always UTF-16, there’s no canonical way to convert them to single byte strings such that valid UTF-16 gets turned into valid UTF-8 but the rest can still be roundtripped. That’s what bastardized encodings like WTF-8 solve. The Rust Path API is the best take on this I’ve seen that doesn’t choke on bad Unicode.
  
  Reply View | 0 replies
  
  af78 a day ago
  
  I think it depends on the underlying filesystem. Unicode (UTF-16) is first-class on NTFS. But Windows still supports FAT, I guess, where multiple 8-bit encodings are possible: the so-called "OEM" code pages (437, 850 etc.) or "ANSI" code pages (1250, 1251 etc.). I haven't checked how recent Windows versions cope with FAT file names that cannot be represented as Unicode.
  
  Reply View | 0 replies
- andyferris a day ago
  
  I believe the same is true on linux, which only cares about 0x2f bytes (i.e. /)
  
  Reply View | 3 replies
  
  johncolanduoni 21 hours ago
  
  Windows paths are not necessarily well-formed UTF-16 (UCS-2 by some people’s definition) down to the filesystem level. If they were always well formed, you could convert to a single byte representation by straightforward Unicode re-encoding. But since they aren’t - there are choices that need to be made about what to do with malformed UTF-16 if you want to round trip them to single byte strings such that they match UTF-8 encoding if they are well formed.
  In Linux, they’re 8-bit almost-arbitrary strings like you noted, and usually UTF-8. So they always have a convenient 8-bit encoding (I.e. leave them alone). If you hated yourself and wanted to convert them to UTF-16, however, you’d have the same problem Windows does but in reverse.
  
  Reply View | 0 replies
  
  orthoxerox a day ago
  
  And 0x00, if I remember correctly.
  
  Reply View | 0 replies
  
  matt_kantor a day ago
  
  And 0x00.
  
  Reply View | 0 replies
[removed] a day ago

[deleted]

Reply View | 0 replies

nasretdinov a day ago

Note that Go strings can be invalid UTF-8, they dropped panicking on encountering an invalid UTF string before 1.0 I think

Reply View 66 replies

xyzzyz a day ago

This also epitomizes the issue. What's the point of having `string` type at all, if it doesn't allow you to make any extra assumptions about the contents beyond `[]byte`? The answer is that they planned to make conversion to `string` error out when it's invalid UTF-8, and then assume that `string`s are valid UTF-8, but then it caused problems elsewhere, so they dropped it for immediate practical convenience.

Reply View | 65 replies
- tialaramex a day ago
  
  Rust apparently got relatively close to not having &str as a primitive type and instead only providing a library alias to &[u8] when Rust 1.0 shipped.
  Score another for Rust's Safety Culture. It would be convenient to just have &str as an alias for &[u8] but if that mistake had been allowed all the safety checking that Rust now does centrally has to be owned by every single user forever. Instead of a few dozen checks overseen by experts there'd be myriad sprinkled across every project and always ready to bite you.
  
  Reply View | 13 replies
  
  steveklabnik a day ago
  
  It wouldn't have been an alias, it would have been struct Str([u8]). Nothing would have been different about the safety story.
  https://github.com/rust-lang/rfcs/issues/2692
  
  Reply View | 1 reply
  
  stouset a day ago
  
  I love this kind of historical knowledge. Thanks for sharing it!
  
  Reply View | 0 replies
  
  inferiorhuman a day ago
  
  Even so you end up with paper cuts like len which returns the number of bytes.
  
  Reply View | 7 replies
  
  adastra22 a day ago
  
  . (early morning brain fart -- I need my coffee)
  
  Reply View | 2 replies
- 0x000xca0xfe a day ago
  
  Why not use utf8.ValidString in the places it is needed? Why burden one of the most basic data types with highly specific format checks?
  It's far better to get some � when working with messy data instead of applications refusing to work and erroring out left and right.
  
  Reply View | 11 replies
  
  const_cast a day ago
  
  IMO utf8 isn't a highly specific format, it's universal for text. Every ascii string you'd write in C or C++ or whatever is already utf8.
  So that means that for 99% of scenarios, the difference between char[] and a proper utf8 string is none. They have the same data representation and memory layout.
  The problem comes in when people start using string like they use string in PHP. They just use it to store random bytes or other binary data.
  This makes no sense with the string type. String is text, but now we don't have text. That's a problem.
  We should use byte[] or something for this instead of string. That's an abuse of string. I don't think allowing strings to not be text is too constraining - that's what a string is!
  
  Reply View | 10 replies
- roncesvalles a day ago
  
  I've always thought the point of the string type was for indexing. One index of a string is always one character, but characters are sometimes composed of multiple bytes.
  
  Reply View | 2 replies
  
  crazygringo a day ago
  
  Yup. But to be clear, in Unicode a string will index code points, not characters. E.g. a single emoji can be made of multiple code points, as well as certain characters in certain languages. The Unicode name for a character like this is a "grapheme", and grapheme splitting is so complicated it generally belongs in a dedicated Unicode library, not a general-purpose string object.
  
  Reply View | 0 replies
  
  birn559 a day ago
  
  You can't do that in a performant way and going that route can lead to problems, because characters (= graphemes in the language of Unicode) generally don't always behave as developers assume.
  
  Reply View | 0 replies
- assbuttbuttass a day ago
  
  string is just an immutable []byte. It's actually one of my favorite things about Go that strings can contain invalid utf-8, so you don't end up with the Rust mess of String vs OSString vs PathBuf vs Vec<u8>. It's all just string
  
  Reply View | 33 replies
  
  zozbot234 a day ago
  
  Rust &str and String are specifically intended for UTF-8 valid text. If you're working with arbitrary byte sequences, that's what &[u8] and Vec<u8> are for in Rust. It's not a "mess", it's just different from what Golang does.
  
  Reply View | 32 replies
- naikrovek a day ago
  
  I think maybe you've forgotten about the rune type. Rune does make assumptions.
  []Rune is for sequences of UTF characters. rune is an alias for int32. string, I think, is an alias for []byte.
  
  Reply View | 1 reply
  
  TheDong a day ago
  
  `string` is not an alias for []byte.
  Consider:
  for i, chr := range string([]byte{226, 150, 136, 226, 150, 136}) { fmt.Printf("%d = %v\n", i, chr) // note, s[i] != chr }
  How many times does that loop over 6 bytes iterate? The answer is it iterates twice, with i=0 and i=3.
  There's also quite a few standard APIs that behave weirdly if a string is not valid utf-8, which wouldn't be the case if it was just a bag of bytes.
  
  Reply View | 0 replies

silverwind a day ago

> What if the file name is not valid UTF-8, though

They could support passing filename as `string | []byte`. But wait, go does not even have union types.

Reply View 1 reply

lblume a day ago

But []byte, or a wrapper like Path, is enough, if strings are easily convertible into it. Rust does it that way via the AsRef<T> trait.

Reply View | 0 replies

kragen a day ago

If the filename is not valid UTF-8, Golang can still open the file without a problem, as long as your filesystem doesn't attempt to be clever. Linux ext4fs and Go both consider filenames to be binary strings except that they cannot contain NULs.

This is one of the minor errors in the post.

Reply View 0 replies

ants_everywhere a day ago

> they stuck to the practical convenience of solving the problem that they had in front of them, quickly, instead of analyzing the problem from the first principles, and solving the problem correctly (or using a solution that was Not Invented Here).

I've said this before, but much of Go's design looks like it's imitating the C++ style at Google. The comments where I see people saying they like something about Go it's often an idiom that showed up first in the C++ macros or tooling.

I used to check this before I left Google, and I'm sure it's becoming less true over time. But to me it looks like the idea of Go was basically "what if we created a Python-like compiled language that was easier to onboard than C++ but which still had our C++ ergonomics?"

Reply View 15 replies

shrubble a day ago

Didn’t Go come out of a language that was written for Plan9, thus pre-dating Rob Pike’s work at Google?

Reply View | 14 replies
- pjmlp 11 hours ago
  
  Kind of, Limbo, written for Inferno, taking into consideration what made Alef's design for Plan 9 a failure, like not having garbage collection.
  
  Reply View | 0 replies
- kragen a day ago
  
  Yes, Golang is superficially almost identical to Pike's Newsqueak.
  
  Reply View | 2 replies
  
  pjmlp 11 hours ago
  
  More like Limbo and Alef.
  
  Reply View | 1 reply
  
  kragen 8 hours ago
  
  Agreed.
  
  Reply View | 0 replies
- ants_everywhere a day ago
  
  not that I recall but I may not be recalling correctly.
  But certainly, anyone will bring their previous experience to the project, so there must be some Plan9 influence in there somewhere
  
  Reply View | 9 replies
  
  kragen a day ago
  
  They were literally using the Plan9 C compiler and linker.
  
  Reply View | 8 replies

[removed] a day ago

[deleted]

Reply View 0 replies

perryizgr8 17 hours ago

> What if the file name is not valid UTF-8, though?

Then make it valid UTF-8. If you try to solve the long tail of issues in a commonly used function of the library its going to cause a lot of pain. This approach is better. If someone has a weird problem like file names with invalid characters, they can solve it themselves, even publish a package. Why complicate 100% of uses for solving 0.01% of issues?

Reply View 1 reply

nomel 17 hours ago

> Then make it valid UTF-8.
I think you misunderstand. How do you do that for a file that exists on disk that's trying to be read? Rename it for them? They may not like that.

Reply View | 0 replies