Comment by HarHarVeryFunny

> It is possible to run locally though

> running one of the heavily quantized versions

There is night and day difference in generation quality between even something like 8-bit and "heavily quantized" versions. Why not quantize to 1-bit anyway? Would that qualify as "running the model?" Food for thought. Don't get me wrong: there's plenty of stuff you can actually run on 96 GB Mac studio (let alone on 128/256 GB ones) but 1T-class models are not in that category, unfortunately. Unless you put four of them in a rack or something.