Comment by SamInTheShell

Currently today, I would say these models can be used by someone with minimal knowledge to churn out SPAs with React. They can probably get pretty far into making logins, message systems, and so on because there is lots of training data for those things. They can struggle through building desktop apps as well with relative ease compared to how I had to learn in years long past.

What these LLMs continue to prove those is they are no substitute for real domain knowledge. To date, I've yet to have a model implement RAFT consensus correctly in testing to see if they can build a database.

The way I interact with these models is almost adversarial in nature. I prompt them with the bare minimum that a developer might get in a feature request. I may even have a planning session to populate the context before I set it off on a task.

The bias in these LLMs really shines through an proves their autocomplete properties when they have a strong bias towards changing the one snippet of code I wrote because it doesn't fit in how it's training data would suggest the shape of it's code should be. Most models will course correct with instructions that they are wrong and I am right though.

One thing I've noted is that if you let it generate choices for you from the start of a project, it will make poor choices in nearly every language. You can be using uv to manage a python project and it will continue to try using pip or python commands. You can start an electron app and it will continuously botch if it's using commonjs or some other standard. It persistently wants to download go modules before coding instead of just writing the code and doing `go mod tidy` after (it literally doesn't need the module in advance, it doesn't even have tools to probe the module before writing the code anyway).

RAFT consensus is my go-to test because there is no 1 size fits all way for you to implement it. It might get an in-memory key store system right, but what if you want it to organize etcd/raft/v3 in a way that you can do multi-group RAFT? What if you need RAFT to coordinate some other form of data replication? None of these LLMs can really do it without a lot of prep work.

This is across all the models available from OpenAI, Claude, and Google.