Comment by adam

We developed a novel optimization pipeline for LLMs so large models can run on a standard laptop.

Our first prototype optimized an 80B model to run at full 256k context at 40 tokens/s while only taking up 14gb of RAM.

We are currently leveraging this tech to build https://cortex.build a terminal AI coding assistant.

Comment by adam_patarino