Comment by crackalamoo
Comment by crackalamoo 10 hours ago
This is true in principle, yes. In practice, the way this usually works is by converting inputs to bits and bytes, and then computing the result as a digital circuit (AND, OR, XOR).
Doing this encrypted is very slow: without hardware acceleration or special tricks, running the circuit is 1 million times slower than unencrypted, or about 1ms for a single gate. (https://www.jeremykun.com/2024/05/04/fhe-overview/)
When you think about all the individual logic gates involved in just a matrix multiplication, and scale it up to a diffusion model or large transformer, it gets infeasible very quickly.
There are FHE schemes that do better than binary gates (cf. CKKS) but they have other problems in that they require polynomial approximations for all the activation functions. Still they are much better than the binary-FHE schemes for stuff like neural networks, and most hardware accelerators in the pipeline right now are targeting CKKS and similar for this reason.
For some numbers, a ResNet-20 inference can be done in CKKS in like 5 minutes on CPU. With custom changes to the architecture you can get less than one minute, and in my view HW acceleration will improve that by another factor of 10-100 at least, so I'd expect 1s inference of these (still small) networks within the next year or two.
LLMs, however, are still going to be unreasonably slow for a long time.