Comment by punnerud

Switching to a low level integration will probably not improve the speed, the waiting is primarily on the llama generation of text.

Should be easy to switch embeddings.

Already playing with adding different tags to previous answers using embeddings, then using that to improve the reasoning.