Comment by storystarling

Comment by storystarling 5 days ago

8 replies

How did you handle the context window for 20k lines? I assume you aren't feeding the whole codebase in every time given the API costs. I've struggled to keep agents coherent on larger projects without blowing the budget, so I'm curious if you used a specific scoping strategy here.

simonw 5 days ago

GPT-5.2 has a 400,000 token context window. Claude Opus 4.5 is just 200,000 tokens. To my surprise this doesn't seem to limit their ability to work with much larger codebases - the coding agent harnesses have got really good at grepping for just the code that they need to have in-context, similar to how a human engineer can make changes to a million lines of code without having to hold it all in their head at once.

  • storystarling 5 days ago

    That explains the coherence, but I'm curious about the mechanics of the retrieval. Is it AST-based to map dependencies or are you just using vector search? I assume you still have to filter pretty aggressively to keep the token costs viable for a commercial tool.

embedding-shape 5 days ago

I didn't, Codex (tui/cli) did, it does it all by itself. I have one REQUIREMENTS.md which is specific to the project, a AGENTS.md that I reuse across most projects, then I give Codex (gpt-5.2 with reasoning effort set to xhigh) a prompt + screenshot, tells it to get it to work somewhat similar, waits until it completes, reviewed that it worked, then continued.

Most of the time when I develop professionally, I restart the session after each successful change, for this project, I initially tried to let one session go as long as possible, but eventually I reverted back to my old behavior of restarting from 0 after successful changes.

For knowing what file it should read/write, it uses `ls`, `tree` and `ag ` most commonly, there is no out-of-band indexing or anything, just a unix shell controlled by a LLM via tool calls.

nurettin 5 days ago

You don't load the entire project into the context. You let the agent work on a few 600-800 line files one feature at a time.

  • storystarling 5 days ago

    Right, but how does it know which files to pick? I'm curious if you're using a dependency graph or embeddings for that discovery step, since getting the agent to self-select the right scope is usually the main bottleneck.

    • nurettin 5 days ago

      If you don't trigger the discovery agents, claude cli uses a search tool and greps 50-100 lines at a go. If discovery is triggered, claude sends multiple agents to the code with different tasks which return with overall architecture notes.