Comment by simonw
"I would estimate that the local llama3 inferencing uses less power than when done in a datacenter, because there simply is less power available locally"
Is this taking into account the fact that datacenter resources are shared?
Llama 3 on my laptop may use less power, but it's serving just me.
Llama 3 in a datacenter or more expensive, more power-hungry hardware is potentially serving hundreds or thousands of users.