Comment by nickpsecurity

Comment by nickpsecurity a day ago

You actually have three, AI accelerators: the CPU's SIMD, the NPU, and iGPU. Using them simultaneously could be interesting. It might require custom work, though.

estimator7292 a day ago

If there are any LLM frameworks that can shard over disparate processor architectures I haven't heard of it.

It'd be pretty cool for sure, but you'd be absolutely strangled by memory bandwidth, I'd expect. LLM sure the chipset would not at all enjoy trying to route all that RAM to three processors at once.

Reply View 1 reply

nickpsecurity 21 hours ago

No doubt. I had a few ideas for what might be done:
1. Put the tokenizers or other lower-performance parts on the NPU.
2. Pipelining that moves things through different models or layers on different hardware.
3. If multiple layers, put most of them on the fastest part with a small number on the others. Like with hardware clocking, the ratio is decided to ensure the slower ones don't drag down overall performance.
In things like game or real-time AI's, esp multimodal, there's even more potential as some parts could be on different chips.

Reply View | 0 replies