Comment by Eisenstein

Comment by Eisenstein a year ago

View on Hacker News

Download koboldcpp and llama3.1 gguf weights, use it with the llama3 completions adapter.

Edit the 'background.js' file in the extension and replace the openAI endpoint with

'http://your.local.ip.addr:5001/v1/chat/completions'

Set anything you want as an API key. Now you have a truly local version.

* https://github.com/LostRuins/koboldcpp/releases

* https://huggingface.co/bartowski/Meta-Llama-3.1-8B-Instruct-...

* https://github.com/LostRuins/koboldcpp/blob/concedo/kcpp_ada...