Comment by Eisenstein
Comment by Eisenstein 2 days ago
Download koboldcpp and llama3.1 gguf weights, use it with the llama3 completions adapter.
Edit the 'background.js' file in the extension and replace the openAI endpoint with
'http://your.local.ip.addr:5001/v1/chat/completions'
Set anything you want as an API key. Now you have a truly local version.
* https://github.com/LostRuins/koboldcpp/releases
* https://huggingface.co/bartowski/Meta-Llama-3.1-8B-Instruct-...
* https://github.com/LostRuins/koboldcpp/blob/concedo/kcpp_ada...