Comment by oneplane

Comment by oneplane 8 hours ago

0 replies

As I wrote in my reply, I don't have "sources".

Pure decode excluding any other requirements is probably pretty low, but running a decoder isn't all you need. There's network, display, storage and RAM so your OS can run etc. There will probably be plenty of variation (brightness, environment, how you get your stream in since a 5G modem is probably going to be different energy-wise compared to WiFi or Ethernet), and if you have something like a decoder in the CPU or in the GPU and if that GPU is separate, more PCIe involvement etc. But we can still estimate:

Hardware decoding (1080p video): ~5–15 W for the CPU/GPU

Overall system power usage (screen, memory, etc.): ~25–45 W for a typical laptop.

Duration (30 minutes): If we assume an average of 35 W total system power, the energy consumption is:

Energy = 35W × 0.5h ours = 17.5 Wh

We can do a similar one for inference, also recognising you'll have variations either way:

CPU inference: ~50 W. GPU inference: ~80 W. Overall system power usage: ~70–120 W for a typical laptop during LLM inference.

Duration (30 minutes): Assuming an average of 100 W total system power:

Energy = 100W × 0.5 hours = 50Wh

We could pretend that our own laptop is very good at some of these tasks, but we're not taking about the best possible outcome, we're talking about the fact that there is a difference between decoding a video stream and doing LLM inference, and the fact that that difference is big enough to make someone's point that video streaming is somehow 'worse' or 'as bad as' LLM usage moot. Because it's not. LLM training and LLM inference eats way more energy.

Edit: looking at some random search engine results, you get a bunch of reddit posts with screenshots from people asking where the power consumption goes on their locally running LLM inferencing: https://www.reddit.com/r/LocalLLaMA/comments/17vr3uu/what_ex...

It seems their local usage hovers around 100W. Other similar posts hover around the same, but it seems to be throttle based as other machines with faster chips also throttle around the same power target while delivering better performance. Most local models use a quantised model which is less resource-hungry, the cloud-hosted models tend to use much larger (and thus more hungry models).

Edit2: looking at some real-world optimised decoding measurements, it appears you can decode VP9 and H.265 on 1 year old hardware below 200mW. So not even 1W. That would mean LLM inferencing is orders of magnitude more power hungry than video decoding. Either way: LLM power usage > Video Decode power usage, so the article trying to put them in the same boat is nonsense.