Comment by jrflowers

Comment by jrflowers a day ago

2 replies

This is a good point. They tested software that exists rather than software that you’ve imagined in your head, which is a curious decision.

The choice of test is interesting as well. Instead of doing CRM and confidentiality tests they could have done a “quickly generate a listicle of plausible-sounding ant facts” test, which an LLM would surely be more likely to pass.

CityOfThrowaway a day ago

They tested one specific agent implementation that they themselves made, and made sweeping claims about LLM agents.

  • jrflowers a day ago

    This makes sense. The CRM company made a CRM agent to do CRM tasks and it did poorly. The lesson to be learned here is that attempting to leverage institutional knowledge to make a language model do something useful is a mistake, when the obvious solution for LLM agents is to simply make them more gooder, which must be trivial since I can picture them being very good in my mind.