Comment by xyzzyz

Comment by xyzzyz a day ago

147 replies

Go was designed by some old-school folks that maybe stuck a bit too hard to their principles, losing sight of the practical conveniences.

I'd say that it's entirely the other way around: they stuck to the practical convenience of solving the problem that they had in front of them, quickly, instead of analyzing the problem from the first principles, and solving the problem correctly (or using a solution that was Not Invented Here).

Go's filesystem API is the perfect example. You need to open files? Great, we'll create

  func Open(name string) (*File, error)
function, you can open files now, done. What if the file name is not valid UTF-8, though? Who cares, hasn't happen to me in the first 5 years I used Go.
jerf a day ago

While the general question about string encoding is fine, unfortunately in a general-purpose and cross-platform language, a file interface that enforces Unicode correctness is actively broken, in that there are files out in the world it will be unable to interact with. If your language is enforcing that, and it doesn't have a fallback to a bag of bytes, it is broken, you just haven't encountered it. Go is correct on this specific API. I'm not celebrating that fact here, nor do I expect the Go designers are either, but it's still correct.

  • klodolph a day ago

    This is one of those things that kind of bugs me about, say, OsStr / OsString in Rust. In theory, it’s a very nice, principled approach to strings (must be UTF-8) and filenames (arbitrary bytes, almost, on Linux & Mac). In practice, the ergonomics around OsStr are horrible. They are missing most of the API that normal strings have… it seems like manipulating them is an afterthought, and it was assumed that people would treat them as opaque (which is wrong).

    Go’s more chaotic approach to allow strings to have non-Unicode contents is IMO more ergonomic. You validate that strings are UTF-8 at the place where you care that they are UTF-8. (So I’m agreeing.)

    • duckerude a day ago

      The big problem isn't invalid UTF-8 but invalid UTF-16 (on Windows et al). AIUI Go had nasty bugs around this (https://github.com/golang/go/issues/59971) until it recently adopted WTF-8, an encoding that was actually invented for Rust's OsStr.

      WTF-8 has some inconvenient properties. Concatenating two strings requires special handling. Rust's opaque types can patch over this but I bet Go's WTF-8 handling exposes some unintuitive behavior.

      There is a desire to add a normal string API to OsStr but the details aren't settled. For example: should it be possible to split an OsStr on an OsStr needle? This can be implemented but it'd require switching to OMG-WTF-8 (https://rust-lang.github.io/rfcs/2295-os-str-pattern.html), an encoding with even more special cases. (I've thrown my own hat into this ring with OsStr::slice_encoded_bytes().)

      The current state is pretty sad yeah. If you're OK with losing portability you can use the OsStrExt extension traits.

      • klodolph a day ago

        Yeah, I avoided talking about Windows which isn’t UTF-16 but “int16 string” the same way Unix filenames are int8 strings.

        IMO the differences with Windows are such that I’m much more unhappy with WTF-8. There’s a lot that sucks about C++ but at least I can do something like

          #if _WIN32
          using pathchar = wchar_t;
          constexpr pathchar sep = L'\\';
          #else
          using pathchar = char;
          constexpr pathchar sep = '/';
          #endif
          using pathstring = std::basic_string<pathchar>;
        
        Mind you this sucks for a lot of reasons, one big reason being that you’re directly exposed to the differences between path representations on different operating systems. Despite all the ways that this (above) sucks, I still generally prefer it over the approaches of Go or Rust.
    • Kinrany a day ago

      > You validate that strings are UTF-8 at the place where you care that they are UTF-8.

      The problem with this, as with any lack of static typing, is that you now have to validate at _every_ place that cares, or to carefully track whether a value has already been validated, instead of validating once and letting the compiler check that it happened.

      • klodolph a day ago

        In practice, the validation generally happens when you convert to JSON or use an HTML template or something like that, so it’s not so many places.

        Validation is nice but Rust’s principled approach leaves me high and dry sometimes. Maybe Rust will finish figuring out the OsString interface and at that point we can say Rust has “won” the conversation, but it’s not there yet, and it’s been years.

    • pas 21 hours ago

      It's completely in-line with Rust's approach. Concentrate on the hard stuff that lifts every boat. Like the type system, language features, and keep the standard library very small, and maybe import/adopt very successful packages. (Like once_cell. But since removing things from std is considered a forever no-no, it seems path handling has to be solved by crates. Eg. https://github.com/chipsenkbeil/typed-path )

  • [removed] a day ago
    [deleted]
stouset a day ago

[flagged]

  • blibble a day ago

    > Golang makes it easy to do the dumb, wrong, incorrect thing that looks like it works 99.7% of the time. How can that be wrong? It works in almost all cases!

    my favorite example of this was the go authors refusing to add monotonic time into the standard library because they confidently misunderstood its necessity

    (presumably because clocks at google don't ever step)

    then after some huge outages (due to leap seconds) they finally added it

    now the libraries are a complete a mess because the original clock/time abstractions weren't built with the concept of multiple clocks

    and every go program written is littered with terrible bugs due to use of the wrong clock

    https://github.com/golang/go/issues/12914 (https://github.com/golang/go/issues/12914#issuecomment-15075... might qualify for the worst comment ever)

    • 0cf8612b2e1e a day ago

      This issue is probably my favorite Goism. Real issue identified and the feedback is, “You shouldn’t run hardware that way. Run servers like Google does without time jumping.” Similar with the original stance to code versioning. Just run a monorepo!

  • 0x696C6961 a day ago

    [flagged]

    • jack_h a day ago

      It’s not about making zero mistakes, it’s about learning from previous languages which made mistakes and not repeating them. I decided against using go pretty early on because I recognized just how many mistakes they were repeating that would end up haunting maintainers.

  • jen20 a day ago

    I can count on fewer hands the number of times I've been bitten by such things in over 10 years of professional Go vs bitten just in the last three weeks by half-assed Java.

    • gf000 a day ago

      There is a lot to say about Java, but the libraries (both standard lib and popular third-party ones) are goddamn battle-hardened, so I have a hard time believing your claim.

      • p2detar a day ago

        They might very well be, because time-handling in Java almost always sucked. In the beginning there was java.util.Date and it was very poorly designed. Sun tried to fix that with java.util.Calendar. That worked for a while but it was still cumbersome, Calendar.getInstance() anyone? After that someone sat down and wrote Joda-Time, which was really really cool and IMO the basis of JSR-310 and the new java.time API. So you're kind of right, but it only took them 15 years to make it right.

      • jen20 a day ago

        You can believe what you like, of course, but "battle tested" does not mean "isn't easy to abuse".

    • stouset a day ago

      Is golang better than Java? Sure, fine, maybe. I'm not a Java expert so I don't have a dog in the race.

      Should and could golang have been so much better than it is? Would golang have been better if Pike and co. had considered use-cases outside of Google, or looked outward for inspiration even just a little? Unambiguously yes, and none of the changes would have needed it to sacrifice its priorities of language simplicity, compilation speed, etc.

      It is absolutely okay to feel that go is a better language than some of its predecessors while at the same time being utterly frustrated at the the very low-hanging, comparatively obvious, missed opportunities for it to have been drastically better.

    • [removed] a day ago
      [deleted]
herbstein a day ago

Much more egregious is the fact that the API allows returning both an error and a valid file handle. That may be documented to not happen. But look at the Read method instead. It will return both errors and a length you need to handle at the same time.

  • nasretdinov a day ago

    The Read() method is certainly an exception rather than a rule. The common convention is to return nil value upon encountering an error unless there's real value in returning both, e.g. for a partial read that failed in the end but produced some non-empty result nevertheless. It's a rare occasion, yes, but if you absolutely have to handle this case you can. Otherwise you typically ignore the result if err!=nil. It's a mess, true, but real world is also quite messy unfortunately, and Go acknowledges that

    • stouset a day ago

      Go doesn't acknowledge that. It punts.

      Most of the time if there's a result, there's no error. If there's an error, there's no result. But don't forget to check every time! And make sure you don't make a mistake when you're checking and accidentally use the value anyway, because even though it's technically meaningless it's still nominally a meaningful value since zero values are supposed to be meaningful.

      Oh and make sure to double-check the docs, because the language can't let you know about the cases where both returns are meaningful.

      The real world is messy. And golang doesn't give you advance warning on where the messes are, makes no effort to prevent you from stumbling into them, and stands next to you constantly criticizing you while you clean them up by yourself. "You aren't using that variable any more, clean that up too." "There's no new variables now, so use `err =` instead of `err :=`."

koakuma-chan a day ago

> What if the file name is not valid UTF-8

Nothing? Neither Go nor the OS require file names to be UTF-8, I believe

  • zimpenfish a day ago

    > Nothing?

    It breaks. Which is weird because you can create a string which isn't valid UTF-8 (eg "\xbd\xb2\x3d\xbc\x20\xe2\x8c\x98") and print it out with no trouble; you just can't pass it to e.g. `os.Create` or `os.Open`.

    (Bash and a variety of other utils will also complain about it being valid UTF-8; neovim won't save a file under that name; etc.)

    • yencabulator a day ago

      That sounds like your kernel refusing to create that file, nothing to do with Go.

        $ cat main.go
        package main
      
        import (
         "log"
         "os"
        )
      
        func main() {
         f, err := os.Create("\xbd\xb2\x3d\xbc\x20\xe2\x8c\x98")
         if err != nil {
          log.Fatalf("create: %v", err)
         }
         _ = f
        }
        $ go run .
        $ ls -1
        ''$'\275\262''='$'\274'' ⌘'
        go.mod
        main.go
      • commandersaki a day ago

        I'm confused, so is Go restricted to UTF-8 only filenames, because it can read/write arbitrary byte sequences (which is what string can hold), which should be sufficient for dealing with other encodings?

      • zimpenfish a day ago

        > That sounds like your kernel refusing to create that file

        Yes, that was my assumption when bash et al also had problems with it.

    • kragen a day ago

      It sounds like you found a bug in your filesystem, not in Golang's API, because you totally can pass that string to those functions and open the file successfully.

  • johncolanduoni a day ago

    Well, Windows is an odd beast when 8-bit file names are used. If done naively, you can’t express all valid filenames with even broken UTF-8 and non-valid-Unicode filenames cannot be encoded to UTF-8 without loss or some weird convention.

    You can do something like WTF-8 (not a misspelling, alas) to make it bidirectional. Rust does this under the hood but doesn’t expose the internal representation.

    • jstimpfle a day ago

      What do you mean by "when 8-bit filenames are used"? Do you mean the -A APIs, like CreateFileA()? Those do not take UTF-8, mind you -- unless you are using a relatively recent version of Windows that allows you to run your process with a UTF-8 codepage.

      In general, Windows filenames are Unicode and you can always express those filenames by using the -W APIs (like CreateFileW()).

      • johncolanduoni 21 hours ago

        Windows filenames in the W APIs are 16-bit (which the A APIs essentially wrap with conversions to the active old-school codepage), and are normally well formed UTF-16. But they aren’t required to be - NTFS itself only cares about 0x0000 and 0x005C (backslash) I believe, and all layers of the stack accept invalid UTF-16 surrogates. Don’t get me started on the normal Win32 path processing (Unicode normalization, “COM” is still a special file, etc.), some of which can be bypassed with the “\\?\” prefix when in NTFS.

        The upshot is that since the values aren’t always UTF-16, there’s no canonical way to convert them to single byte strings such that valid UTF-16 gets turned into valid UTF-8 but the rest can still be roundtripped. That’s what bastardized encodings like WTF-8 solve. The Rust Path API is the best take on this I’ve seen that doesn’t choke on bad Unicode.

      • af78 a day ago

        I think it depends on the underlying filesystem. Unicode (UTF-16) is first-class on NTFS. But Windows still supports FAT, I guess, where multiple 8-bit encodings are possible: the so-called "OEM" code pages (437, 850 etc.) or "ANSI" code pages (1250, 1251 etc.). I haven't checked how recent Windows versions cope with FAT file names that cannot be represented as Unicode.

    • andyferris a day ago

      I believe the same is true on linux, which only cares about 0x2f bytes (i.e. /)

      • johncolanduoni 21 hours ago

        Windows paths are not necessarily well-formed UTF-16 (UCS-2 by some people’s definition) down to the filesystem level. If they were always well formed, you could convert to a single byte representation by straightforward Unicode re-encoding. But since they aren’t - there are choices that need to be made about what to do with malformed UTF-16 if you want to round trip them to single byte strings such that they match UTF-8 encoding if they are well formed.

        In Linux, they’re 8-bit almost-arbitrary strings like you noted, and usually UTF-8. So they always have a convenient 8-bit encoding (I.e. leave them alone). If you hated yourself and wanted to convert them to UTF-16, however, you’d have the same problem Windows does but in reverse.

  • [removed] a day ago
    [deleted]
nasretdinov a day ago

Note that Go strings can be invalid UTF-8, they dropped panicking on encountering an invalid UTF string before 1.0 I think

  • xyzzyz a day ago

    This also epitomizes the issue. What's the point of having `string` type at all, if it doesn't allow you to make any extra assumptions about the contents beyond `[]byte`? The answer is that they planned to make conversion to `string` error out when it's invalid UTF-8, and then assume that `string`s are valid UTF-8, but then it caused problems elsewhere, so they dropped it for immediate practical convenience.

    • tialaramex a day ago

      Rust apparently got relatively close to not having &str as a primitive type and instead only providing a library alias to &[u8] when Rust 1.0 shipped.

      Score another for Rust's Safety Culture. It would be convenient to just have &str as an alias for &[u8] but if that mistake had been allowed all the safety checking that Rust now does centrally has to be owned by every single user forever. Instead of a few dozen checks overseen by experts there'd be myriad sprinkled across every project and always ready to bite you.

    • 0x000xca0xfe a day ago

      Why not use utf8.ValidString in the places it is needed? Why burden one of the most basic data types with highly specific format checks?

      It's far better to get some � when working with messy data instead of applications refusing to work and erroring out left and right.

      • const_cast a day ago

        IMO utf8 isn't a highly specific format, it's universal for text. Every ascii string you'd write in C or C++ or whatever is already utf8.

        So that means that for 99% of scenarios, the difference between char[] and a proper utf8 string is none. They have the same data representation and memory layout.

        The problem comes in when people start using string like they use string in PHP. They just use it to store random bytes or other binary data.

        This makes no sense with the string type. String is text, but now we don't have text. That's a problem.

        We should use byte[] or something for this instead of string. That's an abuse of string. I don't think allowing strings to not be text is too constraining - that's what a string is!

    • roncesvalles a day ago

      I've always thought the point of the string type was for indexing. One index of a string is always one character, but characters are sometimes composed of multiple bytes.

      • crazygringo a day ago

        Yup. But to be clear, in Unicode a string will index code points, not characters. E.g. a single emoji can be made of multiple code points, as well as certain characters in certain languages. The Unicode name for a character like this is a "grapheme", and grapheme splitting is so complicated it generally belongs in a dedicated Unicode library, not a general-purpose string object.

      • birn559 a day ago

        You can't do that in a performant way and going that route can lead to problems, because characters (= graphemes in the language of Unicode) generally don't always behave as developers assume.

    • assbuttbuttass a day ago

      string is just an immutable []byte. It's actually one of my favorite things about Go that strings can contain invalid utf-8, so you don't end up with the Rust mess of String vs OSString vs PathBuf vs Vec<u8>. It's all just string

      • zozbot234 a day ago

        Rust &str and String are specifically intended for UTF-8 valid text. If you're working with arbitrary byte sequences, that's what &[u8] and Vec<u8> are for in Rust. It's not a "mess", it's just different from what Golang does.

    • naikrovek a day ago

      I think maybe you've forgotten about the rune type. Rune does make assumptions.

      []Rune is for sequences of UTF characters. rune is an alias for int32. string, I think, is an alias for []byte.

      • TheDong a day ago

        `string` is not an alias for []byte.

        Consider:

            for i, chr := range string([]byte{226, 150, 136, 226, 150, 136}) {
              fmt.Printf("%d = %v\n", i, chr)
              // note, s[i] != chr
            }
        
        How many times does that loop over 6 bytes iterate? The answer is it iterates twice, with i=0 and i=3.

        There's also quite a few standard APIs that behave weirdly if a string is not valid utf-8, which wouldn't be the case if it was just a bag of bytes.

silverwind a day ago

> What if the file name is not valid UTF-8, though

They could support passing filename as `string | []byte`. But wait, go does not even have union types.

  • lblume a day ago

    But []byte, or a wrapper like Path, is enough, if strings are easily convertible into it. Rust does it that way via the AsRef<T> trait.

kragen a day ago

If the filename is not valid UTF-8, Golang can still open the file without a problem, as long as your filesystem doesn't attempt to be clever. Linux ext4fs and Go both consider filenames to be binary strings except that they cannot contain NULs.

This is one of the minor errors in the post.

ants_everywhere a day ago

> they stuck to the practical convenience of solving the problem that they had in front of them, quickly, instead of analyzing the problem from the first principles, and solving the problem correctly (or using a solution that was Not Invented Here).

I've said this before, but much of Go's design looks like it's imitating the C++ style at Google. The comments where I see people saying they like something about Go it's often an idiom that showed up first in the C++ macros or tooling.

I used to check this before I left Google, and I'm sure it's becoming less true over time. But to me it looks like the idea of Go was basically "what if we created a Python-like compiled language that was easier to onboard than C++ but which still had our C++ ergonomics?"

  • shrubble a day ago

    Didn’t Go come out of a language that was written for Plan9, thus pre-dating Rob Pike’s work at Google?

    • pjmlp 11 hours ago

      Kind of, Limbo, written for Inferno, taking into consideration what made Alef's design for Plan 9 a failure, like not having garbage collection.

    • kragen a day ago

      Yes, Golang is superficially almost identical to Pike's Newsqueak.

    • ants_everywhere a day ago

      not that I recall but I may not be recalling correctly.

      But certainly, anyone will bring their previous experience to the project, so there must be some Plan9 influence in there somewhere

      • kragen a day ago

        They were literally using the Plan9 C compiler and linker.

[removed] a day ago
[deleted]
perryizgr8 17 hours ago

> What if the file name is not valid UTF-8, though?

Then make it valid UTF-8. If you try to solve the long tail of issues in a commonly used function of the library its going to cause a lot of pain. This approach is better. If someone has a weird problem like file names with invalid characters, they can solve it themselves, even publish a package. Why complicate 100% of uses for solving 0.01% of issues?

  • nomel 17 hours ago

    > Then make it valid UTF-8.

    I think you misunderstand. How do you do that for a file that exists on disk that's trying to be read? Rename it for them? They may not like that.