Comment by xianshou

Comment by xianshou 6 months ago

One trend I've noticed, framed as a logical deduction:

1. Coding assistants based on o1 and Sonnet are pretty great at coding with <50k context, but degrade rapidly beyond that.

2. Coding agents do massively better when they have a test-driven reward signal.

3. If a problem can be framed in a way that a coding agent can solve, that speeds up development at least 10x from the base case of human + assistant.

4. From (1)-(3), if you can get all the necessary context into 50k tokens and measure progress via tests, you can speed up development by 10x.

5. Therefore all new development should be microservices written from scratch and interacting via cleanly defined APIs.

Sure enough, I see HN projects evolving in that direction.

swatcoder 6 months ago

> 3. If a problem can be framed in a way that a coding agent can solve...

This reminds me of the South Park underwear gnomes. You picked a tool and set an expectation, then just kind of hand wave over the hard part in the middle, as though framing problems "in a way coding agents can solve" is itself a well-understood or bounded problem.

Does it sometimes take 50x effort to understand a problem and the agent well enough to get that done? Are there classes of problems where it can't be done? Are either of those concerns something you can recognize before they impact you? At commercial quality, is it an accessible skill for inexperienced people or do you need a mastery of coding, the problem domain, or the coding agent to be able to rely on it? Can teams recruit people who can reliable achieve any of this? How expensive is that talent? etc

Reply View 5 replies

hitchstory 6 months ago

>as though framing problems "in a way coding agents can solve" is itself a well-understood or bounded problem.
It's not, but if you can A) make it cheap to try out different types of framings - not all of them have to work and B) automate everything else then the labor intensity of programming decreases drastically.
>At commercial quality, is it an accessible skill for inexperienced people
I'd expect the opposite, it would be an extremely inaccessible skill requiring high skill and high pay. But, if 2 people can deliver as much as 15 people at a higher quality and they're paid triple, it's still way cheaper overall.
I would still expect somebody following this development pattern to routinely discover a problem the LLM can't deal with and have to dive under the hood to fix it - digging down below multiple levels of abstraction. This would be Hard with a capital H.

Reply View | 0 replies
myko 6 months ago

> as though framing problems "in a way coding agents can solve" is itself a well-understood or bounded problem
It is imminently solvable! All that is necessary is to use a subset of language easier for the machine to understand and use in a very defined way; we could call this "coding language" or something similar. Even build tools to ensure we write this correctly (to avoid confusing the machine). Perhaps we could define our own algorithms using this "language" to help them along!

Reply View | 0 replies
emptiestplace 6 months ago

We've had failed projects since long before LLMs. I think there is a tendency for people to gloss over this (3.) regardless, but working with an LLM it tends to become obvious much more quickly, without investing tens/hundreds of person-hours. I know it's not perfect, but I find a lot of the things people complain about would've been a problem either way - especially when people think they are going to go from 'hello world' to SaaS-billionaire in an hour.
I think mastery of the problem domain is still important, and until we have effectively infinite context windows (that work perfectly), you will need to understand how and when to refactor to maximize quality and relevance of data in context.

Reply View | 2 replies
- dingnuts 6 months ago
  
  well according to xianshou's profile they work in finance so it makes sense to me that they would gloss over the hard part of programming when describing how AI is going to improve it
  
  Reply View | 1 reply
  
  ziddoap 6 months ago
  
  Working in one domain does not preclude knowledge of others. I work in cybersec but spent my first working decade in construction estimation for institutional builds. I can talk confidently about firewalls or the hospital you want to build.
  No need to make assumptions based on a one-line hacker news profile.
  
  Reply View | 0 replies

Arcuru 6 months ago

> 5. Therefore all new development should be microservices written from scratch and interacting via cleanly defined APIs.

Not necessarily. You can get the same benefits you described in (1)-(3) by using clearly defined modules in your codebase, they don't need to be separate microservices.

Reply View 8 replies

sdesol 6 months ago

Agreed. If the microservice does not provide any value from being isolated, it is just a function call with extra steps.

Reply View | 3 replies
- __MatrixMan__ 6 months ago
  
  I think the argument is that the extra value provided is a small enough context window for working with an LLM. Although I'd suggest making it a library if one can manage, that gives you the desired context reduction bounded by interfaces without taking on the complexities of adding an additional microservice.
  I imagine throwing a test at an LLM and saying:
  > hold the component under test constant (as well as the test itself), and walk the versions of the library until you can tell me where they're compatible and where they break.
  If you tried to do that with a git bisect and everything in the same codebase, you'd end up varying all three (test, component, library) which is worse science than holding two constant and varying the third would be.
  
  Reply View | 2 replies
  
  sdesol 6 months ago
  
  > I think the argument is that the extra value provided is a small enough context window for working with an LLM.
  I'm not sure moving something that could work as function to a microservice would save much context. If anything, I think you are adding more context, since you would need to talk about the endpoint and having it route to the function that does what you need. When it is all over, you need to describe what the input and output is.
  
  Reply View | 1 reply
  
  __MatrixMan__ 6 months ago
  
  Oh certainly. I was arguing that if you need more isolation than a function gives you, don't jump to the conclusion that you need a service. Consider a library as a middle ground.
  
  Reply View | 0 replies
lolinder 6 months ago

I wonder if we'll see a return of the kind of interface file present in C++, Ocaml, and Ada. These files, well commented, are naturally the context window to use for reference for a module.
Even if languages don't grow them back as a first class feature, some format that is auto generated from the code and doesn't include the function bodies is really what is needed here.

Reply View | 1 reply
- senkora 6 months ago
  
  Python (which I mention because it is the preferred language of LLM output) has grown stub files that would work for this:
  https://peps.python.org/pep-0484/#stub-files
  I guess that this usecase would be an argument to include docstrings in your Python stub files, which I hadn’t considered before.
  
  Reply View | 0 replies
ben_w 6 months ago

Indeed; I think there's a strong possibility that there's certain architectural choices where LLMs can do very well, and others where they would struggle.
There are with humans, but it's inconsistent; personally I really dislike VIPER, yet I've never felt the pain others insist comes with too much in a ViewController.

Reply View | 0 replies
theptip 6 months ago

Yeah, I think monorepos will be better for LLMs. Easier to refactor module boundaries as context grows or requirements change.
But practices like stronger module boundaries, module docs, acceptance tests on internal dev-facing module APIs, etc are all things that will be much more valuable for LLM consumption. (And might make things more pleasant for humans too!)

Reply View | 0 replies

steeeeeve 6 months ago

So having clear requirements, a focused purpose for software, and a clear boundary of software responsibility makes for a software development task that can be accomplished?

If only people had figured out at some point that the same thing applies when communicating to human software engineers.

Reply View 3 replies

PoppinFreshDo 6 months ago

If human software engineers refused to work unless those conditions were met, what a wonderful world it would be.

Reply View | 2 replies
- intelVISA 6 months ago
  
  They do implicitly: you can only be accidentally productive without those preconditions.
  
  Reply View | 1 reply
  
  PoppinFreshDo 6 months ago
  
  Explicitly failing fast is needed I think.
  Failing slowly and producing months of unhappy results is too slow of a feedback cycle.
  
  Reply View | 0 replies

sdesol 6 months ago

> you can speed up development by 10x.

If you know what you are doing, then yes. If you are a domain expert and can articulate your thoughts clearly in a prompt, you will most likely see a boost—perhaps two to three times—but ten times is unlikely. And if you don't fully understand the problem, you may experience a negative effect.

Reply View 1 reply

throwup238 6 months ago

I think it also depends on how much yak-shaving is involved in the domain, regardless of expertise. Whether that’s something simple like remembering the right bash incantation or something more complex like learning enough Terraform and providers to be able to spin up cloud infrastructure.
Some projects just have a lot of stuff to do around the edges and LLMs excel at that.

Reply View | 0 replies

andrewchambers 6 months ago

You don't need microservices for that, just factor your code into libraries that can fit into the context window. Also write functions that have clear inputs and outputs and don't need to know the full state of the software.

This has always been good practice anyway.

Reply View 0 replies

huac 6 months ago

> Coding assistants based on o1 and Sonnet are pretty great at coding with <50k context, but degrade rapidly beyond that.

I had a very similar impression (wrote more in https://hua.substack.com/p/are-longer-context-windows-all-yo...).

One framing is that effective context window (i.e. the length that the model is able to effectively reason over) determines how useful the model is. A human new grad programmer might effectively reason over 100s or 1000s of tokens but not millions - which is why we carefully scope the work and explain where to look for relevant context only. But a principal engineer might reason over many many millions of context - code yes, but also organizational and business context.

Trying to carefully select those 50k tokens is extremely difficult for LLMs/RAG today. I expect models to get much longer effective context windows but there are hardware / cost constraints which make this more difficult.

Reply View 0 replies

phaedrus 6 months ago

50K context is an interesting number because I think there's a lot to explore with software within an order of magnitude that size. With apologies to Richard Feynman, I call it, "There's plenty of room in the middle." My idea there is the rapid expansion of computing power during the reign of Moore's law left the design space of "medium sized" programs under-explored. These would be programs in the range of 100's of kilobytes to low megabytes.

Reply View 0 replies

whoisnnamdi 6 months ago

This is a helpful breakdown of a trend, thank you

Might be a boon for test-driven development. Could turn out that AI coding is the killer app for TDD. I had a similar thought about a year ago but had forgotten, appreciate the reminder

Reply View 1 reply

MarcelOlsz 6 months ago

Hey I reached out on twitter to chat :)

Reply View | 0 replies

makk 6 months ago

> microservices written from scratch and interacting via cleanly defined APIs.

Introducing network calls because why? How about just factoring a monolith appropriately?

Reply View 0 replies

jiriknesl 6 months ago

It doesn't have to be microservices. You can use modular architecture. You can use polylith. You can have boundaries in your code and mock around them.

Reply View 0 replies

Swizec 6 months ago

> 5. Therefore all new development should be ~~microservices~~ modules written from scratch and interacting via cleanly defined APIs.

We figured this out for humans almost 20 years ago. Some really good empirical research. It's the only approach to large scale software development that works.

But it requires leadership that gives a shit about the quality of their product and value long-term outcomes over short-term rewards.

Reply View 6 replies

p1necone 6 months ago

By large scale do you mean large software or large amounts of developers? Because there's some absolutely massive software in terms of feature set, usefulness and even LoC (not that that is a useful measurement) etc out there made by very small teams.
I'm not sure that you've got the causal relationship the right way around here re: architecture:team size.

Reply View | 4 replies
- Swizec 6 months ago
  
  What does team size have to do with this? Small teams can (and should) absolutely build modularized software ...
  You simply cannot build a [working/maintainable] large piece of software if everything is connected to everything and any one change may cause issues in conceptually unrelated pieces of code. As soon as your codebase is bigger than what you can fully memorize, you need modules, separation of concerns, etc.
  
  Reply View | 3 replies
  
  p1necone 6 months ago
  
  Sure I agree with that, but microservices are just one of many ways to modularize software/achieve separation of concerns.
  I assumed you were talking about team size specifically because that is the thing that a microservice architecture uniquely enables in my experience.
  
  Reply View | 2 replies
PoppinFreshDo 6 months ago

[dead]

Reply View | 0 replies

fire_lake 6 months ago

(5) does not follow. We could use a strongly typed monolith with a purely functional core

Reply View 0 replies