Comment by _moog

Comment by _moog 13 hours ago

3 replies

I recently started diving into LLMs a few weeks ago, and one thing that immediately caught me off guard was how little standardization there is across all the various pieces you would use to build a chat stack.

Want to swap out your client for a different one? Good luck - it probably expects a completely different schema. Trying a new model? Hope you're ready to deal with a different chat template. It felt like every layer had its own way of doing things, which made understanding the flow pretty frustrating for a noobie.

So I sketched out a diagram that maps out what (rough) schema is being used at each step of the process - from the initial request all the way through Ollama and an MCP server with OpenAI-compatible endpoints showing what transformations occur where.

Figured I'd share it as it may help someone else.

https://moog.sh/posts/openai_ollama_mcp_flow.html

Somewhat ironically, Claude built the JS hooks for my SVG with about five minutes of prompting.

youdont 12 hours ago

Have you tried BAML? We use it to manage APIs and clients, as well as prompts and types. It gives great low level control over your prompts and logic, but acts as a nice standardisation later.

  • _moog 12 hours ago

    That's going to be super useful for some of the high-level prompt-testing work I'm doing. Thanks!

    I'm also getting more into the lower-level LLM fine-tuning, training on custom chat templates, etc. which is more of where the diagram was needed.