Comment by naveen_k
Good point. I'm often running multiple parallel jobs with varying priorities where uniform throttling actually makes sense. Many LLM inference tasks are long-running but not fully utilizing hardware (often waiting on I/O or running at partial capacity)
The dual Epyc CPUs (128 cores) in my setup have a relatively high idle power draw compared to consumer chips. Even when "idle" they're consuming significant power maintaining all those cores and I/O capabilities. By implementing uniform throttling when utilization is low, the automation actually reduces the baseline power consumption by a decent amount without much performance hit.
It seems it may be relatively accessible to take a few representative tasks and actually measure the soup-to-nuts energy consumed at the plug. Would be very interesting to see in tandem with the power optimizations!