Comment by raw_anon_1111

Comment by raw_anon_1111 2 days ago

24 replies

I do all of my “AI” development on top of AWS Bedrock that hosts every available model except for OpenAIs closed source models that are exclusive to Microsoft.

It’s extremely easy to write a library that makes switching between models trivial. I could add OpenAI support. It would be just slightly more complicated because I would have to have a separate set of API keys while now I can just use my AWS credentials.

Also of course latency would be theoretically worse since with hosting on AWS and using AWS for inference you stay within the internal network (yes I know to use VPC endpoints).

There is no moat around switching models unlike Ben says.

bambax 2 days ago

openrouter.ai does exactly that, and it lets you use models from OpenAI as well. I switch models often using openrouter.

But, talk to any (or almost any) non-developer and you'll find they 1/ mostly only use ChatGPT, sometimes only know of ChatGPT and have never heard of any other solution, and 2/ in the rare case they did switch to something else, they don't want to go back, they're gone for good.

Each provider has a moat that is its number of daily users; and although it's a little annoying to admit, OpenAI has the biggest moat of them all.

  • raw_anon_1111 2 days ago

    Non developers using Chatbots and being willing to pay is never going to be as big as the enterprise market or BigTech using AI in the background.

    I would think that Gemini (the model) will add profit to Google way before OpenAI ever becomes profitable as they leverage it within their business.

    Why would I pay for openrouter.ai and add another dependency? If I’m just using Amazon Bedrock hosted models, I can just use the AWS SDK and change the request format slightly based on the model family and abstract that into my library.

    • bambax 2 days ago

      You don't need openrouter if you already have everything set up in your own AWS environment. But if you don't, openrouter is extremely straightforward, just open an account and you're done.

  • redwood 2 days ago

    All google needs to do is bite the bullet on the cost and flip core search to AI and immediately dominate the user count. They can start by focusing first on questions that get asked in Google search. Boom

    • raw_anon_1111 2 days ago

      Core search has been using “AI” since they basically deprioritized PageRank.

      I think the combination of AI overviews and a separate “AI mode” tab is good enough.

  • EmiDub 2 days ago

    How is the number of users a moat when you are losing money on every user?

  • sumedh a day ago

    Do you use thinking functionality of these models, does every model have their own syntax for their API?

    • raw_anon_1111 a day ago

      This is the documentation for using Amazon Bedrock hosted models from Python.

      https://docs.aws.amazon.com/code-library/latest/ug/python_3_...

      Every model family has its own request format.

      When I said it was “trivial” to write a library, I should have been more honest. “It’s trivial to point ChatGPT to the documentation and have it one shot creating a Python library for the models you want to support”.

spruce_tips 2 days ago

I agree there is no moat to the mechanics of switching models i.e. what openrouter does. But it's not as straightforward as everyone says to switch out the model powering a workflow that's been tuned around said model, whether that tuning was purposeful or accidental. It takes time to re-evaluate that new model works the same or better than old model.

That said, I don't believe oai's models consistently produce the best results.

  • raw_anon_1111 2 days ago

    You need a way to test model changes regardless as models in the same family change. Is it really a heavier lift to test different model families than it is to test going from GPT 3.5 to GPT 5 or even as you modify your prompts?

    • spruce_tips 2 days ago

      no, i dont think it's a heavier lift to test different model families. my point was that swapping models, whether that's to different model families or to new versions in the same model family, isn't straightforward. i'm reluctant to both upgrade model versions AND to swap model families, and that in itself is a type of stickiness that multiple model providers have.

      maybe another way of saying the same thing is that there is still a lot of work to make eval tooling a lot better!

      • DenisM a day ago

        Continuous eval is unavoidable even absent model changes. Agents are keeping memories, tools evolve over time, external data changes, new exploits are being deployed, partner agents do get upgraded.

        Theres too much entropy in the system. Context babysitting is our future.

mike_hearn a day ago

I recently ported a personal coding agent from GPT-5 by adding Grok 4 Fast Reasoning. This was quite tricky. It wasn't just a matter of switching a model URL. Grok 4 Fast is vastly faster, but some things GPT-5 can do well Grok struggled with, aspects of the prompting style confused it too. I had to rework some tools and stuff.

It wasn't a huge lift, but there is some moat. And the results were worse than for GPT-5 which I suppose is not a surprise, it was always unlikely GPT-5 was wasting all those flops.

biophysboy 2 days ago

Have you noticed any significant AND consistent differences between them when you switch? I frequently get a better answer from one vs the other, but it feels unpredictable. Your setup seems like a better test of this

  • raw_anon_1111 2 days ago

    For the most part, I don’t do chatbots except for a couple of RAG based chatbots. It’s more behind the scenes stuff like image understanding, categorization, nuanced sentiment analsys, semantic alignment, etc.

    I’ve created a framework that lets me test the quality in automated way between prompt changes and models and I compare costs/speed/quality.

    The only thing that requires humans to judge the qualify out of all those are RAG results.

    • biophysboy 2 days ago

      So who is the winner using the framework you created?

      • raw_anon_1111 2 days ago

        It depends. Amazon’s Nova Light gave me the best speed vs performance when I needed really quick real time inference for categorizing a users input (think call centers).

        One of Anthropics models did the best with image understanding with Amazon’s Nova Pro being slightly behind.

        For my tests, I used a customer’s specific set of test data.

        For RAG I forgot. But is much more subjective. I just gave the customer an ability to configure the model and modify the prompt so they could choose.

        • biophysboy 2 days ago

          Your experience matches mine then... I haven't noticed any clear, consistent differences. I'm always looking for second opinions on this (bc I've gotten fairly cynical). Appreciate it

  • kevstev 2 days ago

    checkout https://poe.com - it does the same thing. I agree with your assessment though, while you can get better answers from some models than others, being able to predict in advance which model will give you the better answer is hard to predict.