Comment by Joker_vD

Comment by Joker_vD 2 days ago

16 replies

> A word is a maximal string of characters delimited by spaces, tabs or newlines.

And then the actual code explicitly filters out and ignores every character larger than 0x7F. Just why.

jolmg 2 days ago

Probably because they're not characters. They're just bytes undefined by ASCII.

Tor3 2 days ago

ASCII is 7 bits (the eight bit would be parity), so that makes perfect sense, in an ASCII world.

  • Joker_vD 2 days ago

    So the character e.g. "B" would have this parity bit set and therefore should be filtered out and not count as a letter, in the ASCII world?

    • aap_ 2 days ago

      There are only 7 bits in ASCII. An 8th can be used for parity when transmitting data but a regular program will never see it. Anything above 0x7F is simply not a character.

    • Tor3 a day ago

      Parity bits are not part of the character. They are for detecting transmission errors. You filter off the parity bit before looking at the byte.

      • Joker_vD a day ago

        But this is not what's the code doing, is it? It's not doing (ch & 0x7F), it's doing ch <= 0x7F. And the parity checking/filtering is done in the tape drive/serial port driver anyhow, it would never reach wc in the first place.

        • Tor3 4 hours ago

          Yes, that's true for that code. But that wasn't really the point, the point I wrote in my earlier post was that ASCII is 7 bits, it's 0..127, and, depending on where the characters came from, only values below 128 are valid ASCII. What I was talking about was that because a parity bit was common, ASCII was limited to 7 bits, to make room for a parity bit. When other transports are involved, e.g. reading from a file, there aren't any parity bits (well, that's not entirely true - a minicomputer I worked with back in the day used parity bits on characters in text files, but that's not the case for the platform where this particular old 'wc' was used), the code simply focuses on valid ASCII, which is below 128.

    • epcoa 2 days ago

      What in the hell are you going on about? B is 0x46 which is < 0x7F.

      • Joker_vD 2 days ago

        I am going about the parity bit. 0x46 has odd number of bits set (three, to be precise) so for the parity to check out (that is, the number of bits set has to be even), a parity bit needs to be set and the resulting encoding has to be 0xC6, with four bits set.

ivan_gammel 2 days ago

Because they thought that a word is something said in a human language that they can understand.

  • Joker_vD 2 days ago

    Mi ne pensas ke lingvoj kiuj usas ekskluzive la basan latinan alfabeton estas komprepeneblaj per si mem.

    • luismedel 2 days ago

      Cool how my native language is Spanish and I can almost-understand 80% of Esperanto.

    • actionfromafar 2 days ago

      Ze riform iz komplit.

      • Joker_vD 2 days ago

        The [z] and [ð] are phonemically different in English, just as [i] and [i:] are, so it'd actually be "Ðe riform is komplijt". American rhotacism prevents us from spelling it "rifoom" as would be proper, unfortunately.