Comment by wongarsu

$work has a GPU server running Ollama, I connect to it using the continue.dev VsCode extension. Just ignore the login prompts and set up models via the config.yaml.

In terms of models, qwen2.5-coder:3b is a good compromise for autocomplete, as agent choose pretty much just the biggest sota model you can run