Comment by craftkiller

> The NPU is slower than the integrated graphics for inference.

Yeah, that's expected. On consumer devices, the NPUs are not optimizing for speed and they're not meant to out-perform the GPU. They are optimizing for low power consumption. They want to be able to run simple AI tasks without turning your laptop into a frying pan, so that is where the NPU comes in.

Quoting wikipedia:

> On consumer devices, the NPU is intended to be small, power-efficient, but reasonably fast when used to run small models.