Comment by djmips
I feel like no amount of optimizations will close the gap - it's an intractable problem.
I feel like no amount of optimizations will close the gap - it's an intractable problem.
It is my conjecture that due to the 8 bit index registers, contrast that to 6800, 6809 and others, the 6502 becomes fundamentally a structure of arrays (SOA) system versus C which is coupled in it's libraries and existing code base with array of structures (A0S).
Optimizing code will never solve good data oriented design. This is just one of the reasons that Asm programmers routinely beat C code on the 6502. Another one is math. In the C language specification, if fixed point had been given equal footing with float that would also help.
These are such a blind spos that you rarely even see custom 6502 high level languages accommodate these fundamental truths.
BTW, growing up on the 6502, I had no problems moving into the very C friendly 68000 but later going backwards to the 6809, on the surface it looked so much like a 6502 that I was initially trying to force SOA in my data design before realizing it was better suited to AOS.
If we're comparing performance of code generated by a C compiler vs hand optimized assembler, then for it to be an apples-to-apples comparison the same data structures (e.g. SOA or AOS) need to be used in both cases.
I wouldn't say intractable, but it's not clear whether LLVM's optimization framework is flexible enough for it.
From mysterymath's (LLVM-MOS) description, presenting some of zero page as 16 bit registers (to make up for lack thereof, and perhaps due to LLVM not having any other support for preferred/faster memory regions), while beneficial, still had limitations since LLVM just assumes that there will be FAST register-register transfer operations available, and that is not even true for the 6502's real registers (no TXY), let alone these ZP "registers" which would require using the accumulator to copy.
A code generation/optimization approach that would seem more guaranteed to do well on the 6502 might be to treat it more as tree search (with pruning) - generate multiple branching alternatives and then select the best based on whatever is being optimized for (clock cycles or code size).
Coding for the 6502 by hand was always a bit like that ... you had some ideas of alternate ways things could be coded, but there was also an iterative phase of optimization (cf search) to tweak it and save a byte here, a couple of cycles there ...
I've mentioned elsewhere I used to work for Acorn back in the day, developing for the BBC micro, with me and a buddy developing the ISO-Pascal system which was delivered in 2 16KB ROMs. Putting code in ROM gives you an absolute hard size budget, and putting a full Pascal compiler, interpreter, runtime library and programmers editor into a total 32KB was no joke! I remember at the end of the project we were still a few hundred bytes over what would fit in the ROMs, and had to fight for every byte to make it fit!