Comment by homarp

Comment by homarp a year ago

Can you use https://github.com/abetlen/llama-cpp-python or you need something ollama provide ?

punnerud a year ago

Switching to a low level integration will probably not improve the speed, the waiting is primarily on the llama generation of text.

Should be easy to switch embeddings.

Already playing with adding different tags to previous answers using embeddings, then using that to improve the reasoning.