Comment by adam_patarino
Comment by adam_patarino a day ago
We developed a novel optimization pipeline for LLMs so large models can run on a standard laptop.
Our first prototype optimized an 80B model to run at full 256k context at 40 tokens/s while only taking up 14gb of RAM.
We are currently leveraging this tech to build https://cortex.build a terminal AI coding assistant.