Comment by biophysboy

Comment by biophysboy 2 days ago

5 replies

Have you noticed any significant AND consistent differences between them when you switch? I frequently get a better answer from one vs the other, but it feels unpredictable. Your setup seems like a better test of this

raw_anon_1111 2 days ago

For the most part, I don’t do chatbots except for a couple of RAG based chatbots. It’s more behind the scenes stuff like image understanding, categorization, nuanced sentiment analsys, semantic alignment, etc.

I’ve created a framework that lets me test the quality in automated way between prompt changes and models and I compare costs/speed/quality.

The only thing that requires humans to judge the qualify out of all those are RAG results.

  • biophysboy 2 days ago

    So who is the winner using the framework you created?

    • raw_anon_1111 2 days ago

      It depends. Amazon’s Nova Light gave me the best speed vs performance when I needed really quick real time inference for categorizing a users input (think call centers).

      One of Anthropics models did the best with image understanding with Amazon’s Nova Pro being slightly behind.

      For my tests, I used a customer’s specific set of test data.

      For RAG I forgot. But is much more subjective. I just gave the customer an ability to configure the model and modify the prompt so they could choose.

      • biophysboy 2 days ago

        Your experience matches mine then... I haven't noticed any clear, consistent differences. I'm always looking for second opinions on this (bc I've gotten fairly cynical). Appreciate it

kevstev 2 days ago

checkout https://poe.com - it does the same thing. I agree with your assessment though, while you can get better answers from some models than others, being able to predict in advance which model will give you the better answer is hard to predict.