Comment by morphle
Squeak Smalltalk has several automatic runtime optimizations and compilers like JIT, parallel load balancing compiler [1], adaptive compiler [2] and a metacircular simulator and byte code virtual machine written in itself that allows you to do runtime optimisations on GPUs. The byte codes are of course replaced with the native GPU instructions at runtime.
There are dozens of scientific papers and active research is still being done [1].
I've worked on automatic parallel runtime optimizations and adaptive compilers since 1981. We make reconfigurable hardware (chips and wafers) that also optimises at runtime.
Truffle/GraalVM is very rigid and overly complicated [6].
With a meta compiler like Ometa or Ohm we can give any programming language the runtime adaptive compilation for GPUs [3][4].
I'm currently adapting my adaptive compiler to Apple Silicon M4 GPU and neural engine to unlock the trillions of operations per second these chips can do.
I can adapt them to more NVIDIA GPUs with the information of the website in the title. Thank you very much charles_irl! I would love to be able to save the whole website in a single PDF.
I can optimise your GPU software a lot with my adaptive compilers. It will cost less than 100K in labour to speed up your GPU code by a factor 4-8 at least, sometimes I see 30-50 times speedup.
[1] https://www.youtube.com/watch?v=wDhnjEQyuDk
[2] https://www.youtube.com/watch?v=CfYnzVxdwZE
[3] https://tinlizzie.org/~ohshima/shadama2/
[4] https://github.com/yoshikiohshima/Shadama