Comment by simonw

"I would estimate that the local llama3 inferencing uses less power than when done in a datacenter, because there simply is less power available locally"

Is this taking into account the fact that datacenter resources are shared?

Llama 3 on my laptop may use less power, but it's serving just me.

Llama 3 in a datacenter or more expensive, more power-hungry hardware is potentially serving hundreds or thousands of users.