Exploring pre-1990 versions of wc(1) (2023)
(sigwait.org)58 points by henry_flower 2 days ago
58 points by henry_flower 2 days ago
At least the GNU version of wc [0] uses AVX2 for line counting, if available. Though it falls back to a simple character-by-character loop if you ask for a character count [not to be confused with a byte count!] or a word count.
[0] https://git.savannah.gnu.org/cgit/coreutils.git/tree/src/wc_...
Not that crazy given that it closely mirrors it's state machine structure.
> A word is a maximal string of characters delimited by spaces, tabs or newlines.
And then the actual code explicitly filters out and ignores every character larger than 0x7F. Just why.
Because they thought that a word is something said in a human language that they can understand.
The brevity carried over to Plan 9. Re-posting my older comment (https://news.ycombinator.com/item?id=4023385):
http://en.wikipedia.org/wiki/Plan_9_from_Bell_Labs follows the Unix philosophy. A lot of legacy has been shed. I can count 13 options to ls, 11 options to sed and just 5 to sed.
The standard Plan 9 shell, Rc, is described in mere ~500 lines of manpage, while Bash takes whooping ~5400 lines.
Oh, and there is no `dll hell' in P9 :-)
A fun read on word count optimization can be found in Abrash's Black Book:
https://www.jagregory.com/abrash-black-book/#lessons-learned...
You can gloss over the asm if you wish, the tricks that are explained around it are worth it imho.