shric 2 days ago

A fun read on word count optimization can be found in Abrash's Black Book:

https://www.jagregory.com/abrash-black-book/#lessons-learned...

You can gloss over the asm if you wish, the tricks that are explained around it are worth it imho.

  • Joker_vD 2 days ago

    I wonder if large lookup tables/table-driven state machines are still as good as they used to be. After all, even with all the on-chip caches, the additional memory accesses today seem to be slower than doing some multi-instruction SIMD voodoo.

tripdout 2 days ago

Those `goto`s between two different for loops is crazy.

  • actionfromafar 2 days ago

    Assembly / machine code thinking.

    • amszmidt 2 days ago

      More like a relic of (actual) "spaghetti code", it was relatively common in really old Lisp code.

  • lifthrasiir 2 days ago

    Not that crazy given that it closely mirrors it's state machine structure.

Joker_vD 2 days ago

> A word is a maximal string of characters delimited by spaces, tabs or newlines.

And then the actual code explicitly filters out and ignores every character larger than 0x7F. Just why.

  • jolmg 2 days ago

    Probably because they're not characters. They're just bytes undefined by ASCII.

  • Tor3 2 days ago

    ASCII is 7 bits (the eight bit would be parity), so that makes perfect sense, in an ASCII world.

    • Joker_vD 2 days ago

      So the character e.g. "B" would have this parity bit set and therefore should be filtered out and not count as a letter, in the ASCII world?

      • aap_ 2 days ago

        There are only 7 bits in ASCII. An 8th can be used for parity when transmitting data but a regular program will never see it. Anything above 0x7F is simply not a character.

      • Tor3 a day ago

        Parity bits are not part of the character. They are for detecting transmission errors. You filter off the parity bit before looking at the byte.

      • epcoa 2 days ago

        What in the hell are you going on about? B is 0x46 which is < 0x7F.

  • ivan_gammel 2 days ago

    Because they thought that a word is something said in a human language that they can understand.

    • Joker_vD 2 days ago

      Mi ne pensas ke lingvoj kiuj usas ekskluzive la basan latinan alfabeton estas komprepeneblaj per si mem.

      • luismedel 2 days ago

        Cool how my native language is Spanish and I can almost-understand 80% of Esperanto.

      • actionfromafar 2 days ago

        Ze riform iz komplit.

        • Joker_vD 2 days ago

          The [z] and [ð] are phonemically different in English, just as [i] and [i:] are, so it'd actually be "Ðe riform is komplijt". American rhotacism prevents us from spelling it "rifoom" as would be proper, unfortunately.

dexen 2 days ago

The brevity carried over to Plan 9. Re-posting my older comment (https://news.ycombinator.com/item?id=4023385):

http://en.wikipedia.org/wiki/Plan_9_from_Bell_Labs follows the Unix philosophy. A lot of legacy has been shed. I can count 13 options to ls, 11 options to sed and just 5 to sed.

The standard Plan 9 shell, Rc, is described in mere ~500 lines of manpage, while Bash takes whooping ~5400 lines.

Oh, and there is no `dll hell' in P9 :-)