Comment by djmips
More often than not the slow IO devices were coupled with optimized speed critical code due to cost savings or hardware simplification. Heap is an approach that rarely works well on a 6502 machine - there are no 16 bit stack pointers and it's just slower than doing without - However I tend to agree that a middle ground 16 bit virtual machine is a great idea. The first one I ever saw was Sweet16 by Woz.
I agree about heap - too much overhead to be a great approach on such a constrained target, but of course the standard library for C has to include it all the same.
Memory is better allocated in more of a customized application specific way, such as an arena allocator, or just avoid dynamic allocation altogether if possible.
I was co-author of Acorn's ISO-Pascal system for the 6502-based BBC micro (16KB or 32KB RAM) back in the day, and one part I was proud of was a pretty full featured (for the time) code editor that was included, written in 4KB of heavily optimized assembler. The memory allocation I used was just to take ownership of all free RAM, and maintain the edit buffer before the cursor at one end of memory, and the buffer content after the cursor at the other end. This meant that as you typed and entered new text, it was just appended to the "before cursor" block, with no text movement or memory allocation needed.