Comment by XenophileJKO

Comment by XenophileJKO 2 days ago

Personally my biggest piece of advice is: AI First.

If you really want to understand what the limitations are of the current frontier models (and also really learn how to use them), ask the AI first.

By throwing things over the wall to the AI first, you learn what it can do at the same time as you learn how to structure your requests. The newer models are quite capable and in my experience can largely be treated like a co-worker for "most" problems. That being said.. you also need to understand how they fail and build an intuition for why they fail.

Every time a new model generation comes up, I also recommend throwing away your process (outside of things like lint, etc.) and see how the model does without it. I work with people that have elaborate context setups they crafted for less capable models, they largely are un-neccessary with GPT-5-Codex and Sonnet 4.5.

imiric 2 days ago

> By throwing things over the wall to the AI first, you learn what it can do at the same time as you learn how to structure your requests.

Unfortunately, it doesn't quite work out that way.

Yes, you will get better at using these tools the more you use them, which is the case with any tool. But you will not learn what they can do as easily, or at all.

The main problem with them is the same one they've had since the beginning. If the user is a domain expert, then they will be able to quickly spot the inaccuracies and hallucinations in the seemingly accurate generated content, and, with some effort, coax the LLM into producing correct output.

Otherwise, the user can be easily misled by the confident and sycophantic tone, and waste potentially hours troubleshooting, without being able to tell if the error is on the LLM side or their own. In most of these situations, they would've probably been better off reading the human-written documentation and code, and doing the work manually. Perhaps with minor assistance from LLMs, but never relying on them entirely.

This is why these tools are most useful to people who are already experts in their field, such as Filippo. For everyone else who isn't, and actually cares about the quality of their work, the experience is very hit or miss.

> That being said.. you also need to understand how they fail and build an intuition for why they fail.

I've been using these tools for years now. The only intuition I have for how and why they fail is when I'm familiar with the domain. But I had that without LLMs as well, whenever someone is talking about a subject I know. It's impossible to build that intuition with domains you have little familiarity with. You can certainly do that by traditional learning, and LLMs can help with that, but most people use them for what you suggest: throwing things over the wall and running with it, which is a shame.

> I work with people that have elaborate context setups they crafted for less capable models, they largely are un-neccessary with GPT-5-Codex and Sonnet 4.5.

I haven't used GPT-5-Codex, but have experience with Sonnet 4.5, and it's only marginally better than the previous versions IME. It still often wastes my time, no matter the quality or amount of context I feed it.

Reply View 7 replies

XenophileJKO 2 days ago

I guess there are several unsaid assumptions here. The article is by a domain expert working on their domain. Throw work you understand at it, see what it does. Do it before you even work on it. I kind of assumed based on the audience that most people here would be domain experts.
As for the building intuition, perhaps I am over-estimating what most people are capable of.
Working with and building systems using LLMs over the last few years has helped me build a pretty good intuition about what is breaking down when the model fails at a task. While having an ML background is useful in some very narrow cases (like: 'why does an LLM suck at ranking...'), I "think" a person can get a pretty good intuition purely based on observational outcomes.
I've been wrong before though. When we first started building LLM products, I thought, "Anyone can prompt, there is no barrier for this skill." That was not the case at all. Most people don't do well trying to quantify ambiguity, specificity, and logical contridiction when writing a process or set of instructions. I was REALLY surprised how I became a "go-to" person to "fix" prompt systems all based on linguistics and systematic process decomposition. Some of this was understaing how the auto-regressive attention system benefits from breaking the work down into steps, but really most of it was just "don't contradict yourself and be clear".
Working with them extensively also has helped me hone in on how the models get "better" with each release. Though most of my expertise is with OpenAI and Antrhopic model families.
I still think most engineers "should" be able to build intuition generally on what works well with LLMs and how to interact with them, but you are probably right. It will be just like most ML engineers where they see something work in a paper and then just paste it onto their model with no intuition about what systemically that structurally changes in the model dynamics.

Reply View | 5 replies
- fn-mote 2 days ago
  
  > I kind of assumed based on the audience that most people here would be domain experts.
  No take on the rest of your comment, but it’s the nature of software engineering that we work on a breadth of problems. Nobody can be a domain expert in everything.
  For example: I use a configurable editor every day, but I’m not a domain expert in the configuration. An LLM wasted an hour of my day pointing me in “almost the right direction” when after 10 minutes I really needed to RTFM.
  I am a domain expert in some programming languages, but now I need to implement a certain algorithm… I’m not an expert in that algorithm. There’s lots of traps for the unwary.
  I just wanted to challenge the assumption that we are all domain experts in the things we do daily. We are, but … with limitations.
  
  Reply View | 4 replies
  
  imiric 2 days ago
  
  Exactly.
  A typical programmer works within unfamiliar domains all the time. It's not just about being familiar with the programming language or tooling. Every project potentially has new challenges you haven't faced before, new APIs to evaluate and design, new tradeoffs to consider, etc.
  The less familiar you are with the domain or API, the less instincts and influence you have to steer the LLM in the right direction, and the more inclined you are to trust the tool over yourself. So when the tool is wrong, as it often still is, you can spend a lot of time fighting with it to produce the correct output.
  The example in the article is actually the best case scenario for these tools. It's essentially pattern matching using high quality code, from someone who's deeply familiar with the domain and the code they've written. The experience of someone unfamiliar trying to implement the same algorithm from scratch by relying on LLMs would be vastly different.
  
  Reply View | 3 replies
[removed] 2 days ago

[deleted]

Reply View | 0 replies

Razengan a day ago

I did ask the AI first, about some things that I already knew how to do.

It gave me horribly inefficient or long-winded ways of doing it. In the time it took for "prompt tuning" I could have just written the damn code myself. It decreased the confidence for anything else it suggested about things I didn't already know about.

Claude still sometimes insists that iOS 26 isn't out yet. sigh.. I suppose I just have to treat it as an occasional alternative to Google/StackOverflow/Reddit for now. No way would I trust it to write an entire class let alone an app and be able to sleep at night (not that I sleep at night, but that's besides the point)

I think I prefer Xcode's built-in local model approach better, where it just offers sane autocompletions based on your existing code. e.g. if you already wrote a Dog class it can make a Cat class and change `bark()` to `meow()`

Reply View 10 replies

theshrike79 a day ago

You can write the "prompt tuning" down in AGENTS.md and then you only need to do it once. This is why you need to keep working with different ones to get the feeling what they're good at and how you can steer them closer to your style and preferences without having to reiterate from scratch every time.
I personally have a git submodule built specifically for shared instructions like that, it contains the assumptions and defaults for my specific style of project for 3 different programming languages. When I update it on one project, all my projects benefit.
This way I don't need to tell whatever LLM I'm working with to use modernc.org/sqlite for database connections, for example.

Reply View | 1 reply
- Razengan 2 hours ago
  
  > You can write the "prompt tuning" down in AGENTS.md and then you only need to do it once.
  Yeah, I just mean: I know how to "fix" the AI for things that I already know about.
  But how would I know if it's wrong or right about the stuff I DON"T know?? I'd have to go Google shit anyway to verify it.
  This is me asking ChatGPT 5 about ChatGPT 5: https://i.imgur.com/aT8C3qs.png
  Asking about Nintendo Switch 2: https://i.imgur.com/OqmB9jG.png
  Imagine if AI was somebody's first stop for asking about those things. They'd be led to believe they weren't out when they in fact were!
  
  Reply View | 0 replies
simonw a day ago

> Claude still sometimes insists that iOS 26 isn't out yet.
How would you imagine an AI system working that didn't make mistakes like that?
iOS 26 came out on September 15th.
LLMs aren't omniscient or constantly updated with new knowledge. Which means we have to figure out how to make use of them despite them not having up-to-the-second knowledge of the world.

Reply View | 7 replies
- Razengan a day ago
  
  > How would you imagine an AI system working that didn't make mistakes like that?
  I mean, if the user says "Use the latest APIs as of version N" and the AI thinks version N isn't out yet, then it should CHECK on the web first, it's right there, before second guessing the user. I didn't ask it whether 26 was out or not. I told it.
  Oh but I guess AIs aren't allowed to have free use of Google's web search or scrap other websites eh
  > iOS 26 came out on September 15th.
  It was in beta all year and the APIs were publicly available on Apple's docs website. If I told it to use version 26 APIs then it should just use those instead of gaslighting me.
  > LLMs aren't omniscient or constantly updated with new knowledge.
  So we shouldn't use them if we want to make apps with the latest tech? Despite what the AI companies want us to believe.
  You know, on a more general note, I think all AIs should have a toggle between "Do as I say" (Monkey Paw) and "Do what I mean"
  
  Reply View | 6 replies
  
  simonw a day ago
  
  Was this Claude Code or Claude.ai or some other tool that used Claude under the hood?
  Different harnesses have different search capabilities.
  If I'm doing something that benefits from search I tend to switch to ChatGPT because I know it has a really good search feature available to it. I don't trust Claude's as much.
  
  Reply View | 5 replies