Comment by contagiousflow

Comment by contagiousflow a day ago

How is that an argument at all? Of course if you could build a better agent that could solve every problem the outcome of the paper would be "this tool performs well at this"

notahacker a day ago

Even more so when the context is "this person is an AI research engineer at a company doubling down on AI agents, designing relevant benchmarks and building agents that run on that company's stack" not "this is an AI-skeptic dilettante who wrote a weird prompt". It's not like we have reason to believe the average Salesforce customer is much better at building agents who respect confidence and handle CRM tasks optimally...

Reply View 0 replies

handfuloflight a day ago

It is an argument: a flawed agent lead to flawed results. A flawed agent does not speak for all agents.

Reply View 6 replies

contagiousflow a day ago

But the argument should be showing an agent that does in fact pass these tests. You can't just assert that "this one failed, but surely there must be some agent that is perfect, therefore you can't generalize".

Reply View | 3 replies
- handfuloflight a day ago
  
  That's not my argument. My argument isn't "surely there must be some agent that is perfect", my argument is this test study can't speak for all agents.
  
  Reply View | 2 replies
  
  nitwit005 a day ago
  
  But no test can. They ran an experiment, they got this result. You can run more experiments if you want.
  
  Reply View | 1 reply
  
  handfuloflight a day ago
  
  I didn't say any test could. I'm pointing out the flaw in the commenters in this thread generalizing the findings.
  
  Reply View | 0 replies
oblio a day ago

The "sufficiently smart compiler" debate, 50 years later :-p

Reply View | 1 reply
- handfuloflight a day ago
  
  https://en.wikipedia.org/wiki/Faulty_generalization
  
  Reply View | 0 replies