Exploring pre-1990 versions of wc(1) (2023)
(sigwait.org)58 points by henry_flower 10 months ago
58 points by henry_flower 10 months ago
At least the GNU version of wc [0] uses AVX2 for line counting, if available. Though it falls back to a simple character-by-character loop if you ask for a character count [not to be confused with a byte count!] or a word count.
[0] https://git.savannah.gnu.org/cgit/coreutils.git/tree/src/wc_...
Not that crazy given that it closely mirrors it's state machine structure.
> A word is a maximal string of characters delimited by spaces, tabs or newlines.
And then the actual code explicitly filters out and ignores every character larger than 0x7F. Just why.
ASCII is 7 bits (the eight bit would be parity), so that makes perfect sense, in an ASCII world.
Because they thought that a word is something said in a human language that they can understand.
The brevity carried over to Plan 9. Re-posting my older comment (https://news.ycombinator.com/item?id=4023385):
http://en.wikipedia.org/wiki/Plan_9_from_Bell_Labs follows the Unix philosophy. A lot of legacy has been shed. I can count 13 options to ls, 11 options to sed and just 5 to sed.
The standard Plan 9 shell, Rc, is described in mere ~500 lines of manpage, while Bash takes whooping ~5400 lines.
Oh, and there is no `dll hell' in P9 :-)
A fun read on word count optimization can be found in Abrash's Black Book:
https://www.jagregory.com/abrash-black-book/#lessons-learned...
You can gloss over the asm if you wish, the tricks that are explained around it are worth it imho.