Comment by psychoslave

Comment by psychoslave 2 days ago

1 reply

Over the last week I went with a bigger dig on using agent mode et work, and my experiment align with this observation.

The first thing that surprising to me is how much the default tuning are leaned toward laudative stances, the user is always absolutely right, what was done is solving everything expected. But actually no, not a single actual check was done, a tone of code was produced but the goal is not at all achieved and of course many regressions now lure in the code base, when it's not straight breaking everything (which is at least less insidious).

The thing that is surprising to me, is that it can easily drop thousands of lines of tests, and then it can be forced to loop over these tests until it succeed. In my experiments it still drop far too much noise code, but at least the burden of checking if it looks like it makes any sense is drastically reduced.

hu3 2 days ago

That's my observation too.

And I have been trying to improve the framework and abstractions/types to reduce the lines of code required for LLMs to create features in my web app.

Did the LLM really needed to spit 1k lines for this feature? Could I create abstractions to make it feasible in under 300 lines?

Of course there's cost and diminishing returns to abstractions so there are tradeoffs.