Comment by imiric
> By throwing things over the wall to the AI first, you learn what it can do at the same time as you learn how to structure your requests.
Unfortunately, it doesn't quite work out that way.
Yes, you will get better at using these tools the more you use them, which is the case with any tool. But you will not learn what they can do as easily, or at all.
The main problem with them is the same one they've had since the beginning. If the user is a domain expert, then they will be able to quickly spot the inaccuracies and hallucinations in the seemingly accurate generated content, and, with some effort, coax the LLM into producing correct output.
Otherwise, the user can be easily misled by the confident and sycophantic tone, and waste potentially hours troubleshooting, without being able to tell if the error is on the LLM side or their own. In most of these situations, they would've probably been better off reading the human-written documentation and code, and doing the work manually. Perhaps with minor assistance from LLMs, but never relying on them entirely.
This is why these tools are most useful to people who are already experts in their field, such as Filippo. For everyone else who isn't, and actually cares about the quality of their work, the experience is very hit or miss.
> That being said.. you also need to understand how they fail and build an intuition for why they fail.
I've been using these tools for years now. The only intuition I have for how and why they fail is when I'm familiar with the domain. But I had that without LLMs as well, whenever someone is talking about a subject I know. It's impossible to build that intuition with domains you have little familiarity with. You can certainly do that by traditional learning, and LLMs can help with that, but most people use them for what you suggest: throwing things over the wall and running with it, which is a shame.
> I work with people that have elaborate context setups they crafted for less capable models, they largely are un-neccessary with GPT-5-Codex and Sonnet 4.5.
I haven't used GPT-5-Codex, but have experience with Sonnet 4.5, and it's only marginally better than the previous versions IME. It still often wastes my time, no matter the quality or amount of context I feed it.
I guess there are several unsaid assumptions here. The article is by a domain expert working on their domain. Throw work you understand at it, see what it does. Do it before you even work on it. I kind of assumed based on the audience that most people here would be domain experts.
As for the building intuition, perhaps I am over-estimating what most people are capable of.
Working with and building systems using LLMs over the last few years has helped me build a pretty good intuition about what is breaking down when the model fails at a task. While having an ML background is useful in some very narrow cases (like: 'why does an LLM suck at ranking...'), I "think" a person can get a pretty good intuition purely based on observational outcomes.
I've been wrong before though. When we first started building LLM products, I thought, "Anyone can prompt, there is no barrier for this skill." That was not the case at all. Most people don't do well trying to quantify ambiguity, specificity, and logical contridiction when writing a process or set of instructions. I was REALLY surprised how I became a "go-to" person to "fix" prompt systems all based on linguistics and systematic process decomposition. Some of this was understaing how the auto-regressive attention system benefits from breaking the work down into steps, but really most of it was just "don't contradict yourself and be clear".
Working with them extensively also has helped me hone in on how the models get "better" with each release. Though most of my expertise is with OpenAI and Antrhopic model families.
I still think most engineers "should" be able to build intuition generally on what works well with LLMs and how to interact with them, but you are probably right. It will be just like most ML engineers where they see something work in a paper and then just paste it onto their model with no intuition about what systemically that structurally changes in the model dynamics.