Comment by nickpsecurity
Comment by nickpsecurity a day ago
You actually have three, AI accelerators: the CPU's SIMD, the NPU, and iGPU. Using them simultaneously could be interesting. It might require custom work, though.
Comment by nickpsecurity a day ago
You actually have three, AI accelerators: the CPU's SIMD, the NPU, and iGPU. Using them simultaneously could be interesting. It might require custom work, though.
No doubt. I had a few ideas for what might be done:
1. Put the tokenizers or other lower-performance parts on the NPU.
2. Pipelining that moves things through different models or layers on different hardware.
3. If multiple layers, put most of them on the fastest part with a small number on the others. Like with hardware clocking, the ratio is decided to ensure the slower ones don't drag down overall performance.
In things like game or real-time AI's, esp multimodal, there's even more potential as some parts could be on different chips.
If there are any LLM frameworks that can shard over disparate processor architectures I haven't heard of it.
It'd be pretty cool for sure, but you'd be absolutely strangled by memory bandwidth, I'd expect. LLM sure the chipset would not at all enjoy trying to route all that RAM to three processors at once.