Comment by biophysboy

Comment by biophysboy 2 days ago

Have you noticed any significant AND consistent differences between them when you switch? I frequently get a better answer from one vs the other, but it feels unpredictable. Your setup seems like a better test of this

raw_anon_1111 2 days ago

For the most part, I don’t do chatbots except for a couple of RAG based chatbots. It’s more behind the scenes stuff like image understanding, categorization, nuanced sentiment analsys, semantic alignment, etc.

I’ve created a framework that lets me test the quality in automated way between prompt changes and models and I compare costs/speed/quality.

The only thing that requires humans to judge the qualify out of all those are RAG results.

Reply View 3 replies

biophysboy 2 days ago

So who is the winner using the framework you created?

Reply View | 2 replies
- raw_anon_1111 2 days ago
  
  It depends. Amazon’s Nova Light gave me the best speed vs performance when I needed really quick real time inference for categorizing a users input (think call centers).
  One of Anthropics models did the best with image understanding with Amazon’s Nova Pro being slightly behind.
  For my tests, I used a customer’s specific set of test data.
  For RAG I forgot. But is much more subjective. I just gave the customer an ability to configure the model and modify the prompt so they could choose.
  
  Reply View | 1 reply
  
  biophysboy 2 days ago
  
  Your experience matches mine then... I haven't noticed any clear, consistent differences. I'm always looking for second opinions on this (bc I've gotten fairly cynical). Appreciate it
  
  Reply View | 0 replies

kevstev 2 days ago

checkout https://poe.com - it does the same thing. I agree with your assessment though, while you can get better answers from some models than others, being able to predict in advance which model will give you the better answer is hard to predict.

Reply View 0 replies