Comment by simonw

Comment by simonw 6 months ago

3 replies

This has become especially true for me in the past four months. The new long context reasoning models are shockingly good at digging through larger volumes of gnarly code. o3, o4-mini and Claude 3.7 Sonnet "thinking" all have 200,000 token context limits, and Gemini 2.5 Pro and Flash can do 1,000,000. As "reasoning" models they are much better suited to following the chain of a program to figure out the source of an obscure bug.

Makes me wonder how many of the people who continue to argue that LLMs can't help with large existing codebases are missing that you need to selectively copy the right chunks of that code into the model to get good results.

IshKebab 6 months ago

But 1 million tokens is like 50k lines of code or something. That's only medium sized. How does that help with large complex codebases?

What tools are you guys using? Are there none that can interactively probe the project in a way that a human would, e.g. use code intelligence to go-to-definition, find all references and so on?

  • tptacek 6 months ago

    This to me is like every complaint I read when people generate code and the LLM spits out an error, or something stupid. It's a tool. You still have to understand software construction, and how to hold the tool.

    Our Rust fly-proxy tree is about 80k (cloc) lines of code; our Go flyd tree (a Go monorepo) is 300k. Generally, I'll prompt an LLM to deal with them in stages; a first pass, with some hints, on a general question like "find the code that does XYZ"; I'll review and read the code itself, then feed that back to the LLM with questions like "summarize all the functionality of this package and how it relates to other packages" or "trace the flow of an HTTP request through all the layers of this proxy".

    Generally, I'll take the results of those queries and have them saved in .txt files that I can reference in future prompts.

    I think sometimes developers are demanding something close to AGI from their tooling, something that would do exactly what they would do (only, in the span of about 15 seconds). I don't believe in AGI, and so I don't expect it from my tools; I just want them to do a better job of fielding arbitrary questions (or generating arbitrary code) than grep or eglot could.

  • simonw 6 months ago

    Yeah, 50,000 lines sounds about right for 1m tokens.

    If your codebase is larger than that there are a few tricks.

    The first is to be selective about what you feed into the LLM: if you know the work you are doing is in a particular area of the codebase, just paste that bit in. The LLM can make reasonable guesses about things the code references that it can't see.

    An increasingly effective trick is to arm a tool-using LLM with a tool like ripgrep (effectively the "interactively probe the project in a way that a human would" idea you suggested). Claude Code and OpenAI Codex both use this trick. The smarter models are really good at deciding what to search for and evaluating the results.

    I've built tools that can run against Python code and extract just the class, function and method signatures and their docstrings - omitting the actual code. If you code is well designed and has reasonable documentation that could be enough for the LLM to understand it.

    https://github.com/simonw/symbex is my CLI tool for that

    https://simonwillison.net/2025/Apr/23/llm-fragment-symbex/ is a tool I released this morning that turns Symbex into a plugin for my LLM tool.

    I use my https://llm.datasette.io/ tool a lot, especially with its new fragments feature: https://simonwillison.net/2025/Apr/7/long-context-llm/

    This means I can feed in the exact code that the model needs in order to solve a problem. Here's a recent example:

      llm -m openai/o3 \
        -f https://raw.githubusercontent.com/simonw/llm-hacker-news/refs/heads/main/llm_hacker_news.py \
        -f https://raw.githubusercontent.com/simonw/tools/refs/heads/main/github-issue-to-markdown.html \
        -s 'Write a new fragments plugin in Python that registers issue:org/repo/123 which fetches that issue
            number from the specified github repo and uses the same markdown logic as the HTML page to turn that into a fragment'
    
    From https://simonwillison.net/2025/Apr/20/llm-fragments-github/ - I'm populating the context with the exact examples needed to solve the problem.