Comment by storystarling

Comment by storystarling 5 days ago

How did you handle the context window for 20k lines? I assume you aren't feeding the whole codebase in every time given the API costs. I've struggled to keep agents coherent on larger projects without blowing the budget, so I'm curious if you used a specific scoping strategy here.

simonw 5 days ago

GPT-5.2 has a 400,000 token context window. Claude Opus 4.5 is just 200,000 tokens. To my surprise this doesn't seem to limit their ability to work with much larger codebases - the coding agent harnesses have got really good at grepping for just the code that they need to have in-context, similar to how a human engineer can make changes to a million lines of code without having to hold it all in their head at once.

Reply View 2 replies

storystarling 5 days ago

That explains the coherence, but I'm curious about the mechanics of the retrieval. Is it AST-based to map dependencies or are you just using vector search? I assume you still have to filter pretty aggressively to keep the token costs viable for a commercial tool.

Reply View | 1 reply
- simonw 5 days ago
  
  No vector search, just grep.
  
  Reply View | 0 replies

embedding-shape 5 days ago

I didn't, Codex (tui/cli) did, it does it all by itself. I have one REQUIREMENTS.md which is specific to the project, a AGENTS.md that I reuse across most projects, then I give Codex (gpt-5.2 with reasoning effort set to xhigh) a prompt + screenshot, tells it to get it to work somewhat similar, waits until it completes, reviewed that it worked, then continued.

Most of the time when I develop professionally, I restart the session after each successful change, for this project, I initially tried to let one session go as long as possible, but eventually I reverted back to my old behavior of restarting from 0 after successful changes.

For knowing what file it should read/write, it uses `ls`, `tree` and `ag ` most commonly, there is no out-of-band indexing or anything, just a unix shell controlled by a LLM via tool calls.

Reply View 0 replies

nurettin 5 days ago

You don't load the entire project into the context. You let the agent work on a few 600-800 line files one feature at a time.

Reply View 3 replies

storystarling 5 days ago

Right, but how does it know which files to pick? I'm curious if you're using a dependency graph or embeddings for that discovery step, since getting the agent to self-select the right scope is usually the main bottleneck.

Reply View | 2 replies
- embedding-shape 5 days ago
  
  I gave you a more complete answer here: https://news.ycombinator.com/item?id=46787781
  > since getting the agent to self-select the right scope is usually the main bottleneck
  I haven't found this to ever be the bottleneck, what agent and model are you using?
  
  Reply View | 0 replies
- nurettin 5 days ago
  
  If you don't trigger the discovery agents, claude cli uses a search tool and greps 50-100 lines at a go. If discovery is triggered, claude sends multiple agents to the code with different tasks which return with overall architecture notes.
  
  Reply View | 0 replies