Comment by PaulHoule
The '8-bit' micros all had a 16-bit address space, at least in their early implementations. (Later we got the eZ80, 65816, etc.) which lets you address 64k words of memory and the word was always 8-bit bytes.
Contrast that to the PDP-10 [1] which had a 36 bit word and a 20-bit address space and could access 256k words for a total of 1152 kilobytes.
The use of 8-bit bytes for characters I think killed off any word size other than 8-bit because otherwise it would be awkward to work with characters. [2] To be efficient you have to pack multiple characters into a word, it's something that comes up common enough you could create some special machine instructions for it, but if you want to support a C compiler you need a char. It's easiest if native pointers point to a byte. If it was otherwise you could make up a char that consists of a native pointer plus a pointer to the char inside the word, but boy what a hassle. [3]
Modern computers get many of the benefits of a larger word size (wider pipe to suck data through) by having a cache system that decouples the memory interface from the CPU, so a CPU could ask for 32 bits and get it retrieved 8 bits at a time, or it could ask for 32 bits and get the surrounding 128 bits stored in the cache so they don't need to be retrieved next)
[1] https://en.wikipedia.org/wiki/PDP-10
[2] DEC had a system of 6-bit characters, which divides nicely into 36, but you have the same problem
[3] That PDP-10 did have deep pointers that could point to a specific range of bits inside a word, that's what you need if you want something like that to be reasonable to program. I've been thinking about a fantasy computer to run inside Javascript and came across a 48-bit word size to use doubles efficiently to store words. That thing would have 24-bit address spaces, plus it would be possible to make 'deep pointers' that have 6 bits of offset and 6 bits of length (with the possibility of 0-length to point to a specific bit) and could be extended to 'wide pointers' by putting a few bits in front that would reference particular address spaces (might be video RAM, or a unit of memory protection, or made contiguous to represent a larger address space) I think I'd want enough to make a 1GB word address space so it could outdo a 32-bit machine and then let the rest be used for flags just to make it as baroque as possible... And that's why you only see 8-bit words today!