Comment by Eisenstein
Comment by Eisenstein 10 months ago
Download koboldcpp and llama3.1 gguf weights, use it with the llama3 completions adapter.
Edit the 'background.js' file in the extension and replace the openAI endpoint with
'http://your.local.ip.addr:5001/v1/chat/completions'
Set anything you want as an API key. Now you have a truly local version.
* https://github.com/LostRuins/koboldcpp/releases
* https://huggingface.co/bartowski/Meta-Llama-3.1-8B-Instruct-...
* https://github.com/LostRuins/koboldcpp/blob/concedo/kcpp_ada...