Comment by koakuma-chan
Comment by koakuma-chan 2 days ago
Is anyone working on or knows a library for evaluating LLMs for application features and/or application features that use LLMs? I am wondering what people use or if anyone has their own solution.
There would be so much subjectivity to this. I like the idea but executing in a reliable, repeatable way would be very challenging imo.