Comment by marcus

Have you tried writing tests with an LLM?

Because I have, and it's not been the experience you're describing. The LLM hallucinated the error message it was testing for (that it had itself written 5 minutes earlier, in the file it had been given as the source to test).

I don't think this can be solved with the current methodology we're using to create these assistants. I remain arguably highly intelligent and definitely convinced that LLMs need to evolve more to surpass humans as coders.

Comment by marcus_holmes