Comment by phire
It gets way weirder.
The TMS9900 didn't have any internal data registers. It only had a program counter, a status register, and a workspace pointer. Instead, it put the "registers" in that same 256 bytes of RAM. There were sixteen 16-bit registers which the workspace pointer pointed to.
The original idea was that this made for fast context switches, instead of dumping all registers to stack (it doesn't even have a stack pointer), just update the workspace pointer to point at a new set. But I have to assume this wasn't really used on the TI-99/4A, as there just wasn't enough RAM. Because your only other ram was locked behind the video controller, that 256 bytes had to contain all your registers, any your dynamically loaded code and any data you wanted rapid access to.
The TMS9900 is weird, because it's the only CPU of the early home computer era that wasn't designed for microcomputers. It's actually an implementation of the TI-990 mini-computer on a single chip and is actually used in later versions of the minicomputer. Those minicomputers had more than enough fast 16-bit memory to take advantage of this fast context switching.
Every other commonly used microprocessor of the 70s (8080, 6800, F8, 6502, RCA1802, Z80, 6809, 8086, 68000) was explicitly designed to target the low-cost microcomputer market.
I've been working on a TMS99110 homebrew & emulator, and have studied the architecture of the 990 a whole lot over the past couple years. I want to make a very important distinction in a few things you said.
For anyone that didn't get the context, it's the 99/4 design that has this weird RAM layout. The 990 architecture itself can use any (16-bit) word in memory as the starting point of the 16 registers. Developers have been known to use and abuse the workspace pointer to slide around the "window" on the registers.
The window itself also uses the top three registers to link back to the previous workspace, status, and PC, if you use the proper instructions to branch and return. While there is no stack*, you can still crawl back through those references and get the state of each call.
It's a really cool little architecture, hobbled by the 16-bit address space and how slow it was to keep the registers in RAM. Nowadays I can pick up a 1MB memory chip that's faster than the native bus speed for a few bucks, but that wasn't anywhere near the case in the late 70s and early 80s.
*: The 990/12 minicomputer features the PSHS and POPS instructions, which take a pointer to a definition of where the stack lives and how big it should be. These instructions are not implemented in any production processor, but the platform makes it possible to emulate these instructions in software transparently... as an actual explicit instead of accidental feature in the later few iterations. The 990/12 itself was microcoded on a set of four daisy-chained programmable 4-bit bit slicers so they didn't need any of that nonsense.