Comment by misiti3780

Comment by misiti3780 2 days ago

10 replies

i dont know where you are working, but where I work i cant prompt 90% of my job away using cursor. in fact, I find all of these tools to be more and more useless and our codebase is growing and becoming more complex

based on the current state of AI and the progress im witnessing on a month-by-month basis - my current prediction is there is zero chance AI agents are going to be coding and replacing me in the next few years. if i could short the startups claiming this, I would.

simonw 2 days ago

Don't get distracted by claims that AI agents "replace programmers". Those are pure hype.

I'm willing to bet that in a few years most of the developers you know will be using LLMs on a daily basis, and will be more productive because of it (having learned how to use it).

earthnail 2 days ago

I have the same experience. It‘s basically a better StackOverflow, but just like with SO you have to be very careful about the replies, and also just like SO its utility diminishes as you get more proficient.

As an example, just today I was trying to debug some weird WebSocket behaviour. None of the AI tools could help, not Cursor, not plain old ChatGPT with lots of prompting and careful phrasing of the problem. In fact every LLM I tried (Claude 3.7, GPT o4-mini-high, GPT 4.5) introduced errors into my debugging code.

I’m not saying it will stay this way, just that it’s been my experience.

I still love these tools though. It’s just that I really don’t trust the output, but as inspiration they are phenomenal. Most of the time I just use vanilla ChatGPT though; never had that much luck with Cursor.

  • codr7 2 days ago

    No one was forcing you to use SO, in fact we made fun of people who did copy-paste/compile-coding.

  • UncleEntity 2 days ago

    Yeah, they're currently horrible at debugging -- there seems to be blind spots they just can't get past so end up running in circles.

    A couple days ago I was looking for something to do so gave Claude a paper ("A parsing machine for PEGs") to ask it some questions and instead of answering me it spit out an almost complete implementation. Intrigued, I threw a couple more papers at it ("A Simple Graph-Based Intermediate Representation" && "A Text Pattern-Matching Tool based on Parsing Expression Grammars") where it fleshed out the implementation and, well... color me impressed.

    Now, the struggle begins as the thing has to be debugged. With the help of both Claude and Deepseek we got it compiling and passing 2 out of 3 tests which is where they both got stuck. Round and round we go until I, the human who's supposed to be doing no work, figured out that Claude hard coded some values (instead of coding a general solution for all input) which they both missed. In applying ever more and more complicated solutions (to a well solved problem in compiler design) Claude finally broke all debugging output and I don't understand the algorithms enough to go in and debug it myself.

    Of course I didn't use any sort of source code management so I could revert to a previous version before it was broken beyond all fixing...

    Honestly, I don't even consider this a failure. I learned a lot more on what they are capable of and now know that you have to give them problems in smaller sections where they don't have to figure out the complexities of how a few different algorithms interact with each other. With this new knowledge in hand I started on what I originally intended to do before I got distracted with Claude's code solution to a simple question.

    --edit--

    Oh, the irony...

    After typing this out and making an espresso I figured out the problem Claude and Deepseek couldn't see. So much for the "superior" intelligence.

tptacek 2 days ago

One of the ways these tools are most useful for me is in extremely complex codebases.

  • simonw 2 days ago

    This has become especially true for me in the past four months. The new long context reasoning models are shockingly good at digging through larger volumes of gnarly code. o3, o4-mini and Claude 3.7 Sonnet "thinking" all have 200,000 token context limits, and Gemini 2.5 Pro and Flash can do 1,000,000. As "reasoning" models they are much better suited to following the chain of a program to figure out the source of an obscure bug.

    Makes me wonder how many of the people who continue to argue that LLMs can't help with large existing codebases are missing that you need to selectively copy the right chunks of that code into the model to get good results.

    • IshKebab 2 days ago

      But 1 million tokens is like 50k lines of code or something. That's only medium sized. How does that help with large complex codebases?

      What tools are you guys using? Are there none that can interactively probe the project in a way that a human would, e.g. use code intelligence to go-to-definition, find all references and so on?

      • tptacek 2 days ago

        This to me is like every complaint I read when people generate code and the LLM spits out an error, or something stupid. It's a tool. You still have to understand software construction, and how to hold the tool.

        Our Rust fly-proxy tree is about 80k (cloc) lines of code; our Go flyd tree (a Go monorepo) is 300k. Generally, I'll prompt an LLM to deal with them in stages; a first pass, with some hints, on a general question like "find the code that does XYZ"; I'll review and read the code itself, then feed that back to the LLM with questions like "summarize all the functionality of this package and how it relates to other packages" or "trace the flow of an HTTP request through all the layers of this proxy".

        Generally, I'll take the results of those queries and have them saved in .txt files that I can reference in future prompts.

        I think sometimes developers are demanding something close to AGI from their tooling, something that would do exactly what they would do (only, in the span of about 15 seconds). I don't believe in AGI, and so I don't expect it from my tools; I just want them to do a better job of fielding arbitrary questions (or generating arbitrary code) than grep or eglot could.

      • simonw 2 days ago

        Yeah, 50,000 lines sounds about right for 1m tokens.

        If your codebase is larger than that there are a few tricks.

        The first is to be selective about what you feed into the LLM: if you know the work you are doing is in a particular area of the codebase, just paste that bit in. The LLM can make reasonable guesses about things the code references that it can't see.

        An increasingly effective trick is to arm a tool-using LLM with a tool like ripgrep (effectively the "interactively probe the project in a way that a human would" idea you suggested). Claude Code and OpenAI Codex both use this trick. The smarter models are really good at deciding what to search for and evaluating the results.

        I've built tools that can run against Python code and extract just the class, function and method signatures and their docstrings - omitting the actual code. If you code is well designed and has reasonable documentation that could be enough for the LLM to understand it.

        https://github.com/simonw/symbex is my CLI tool for that

        https://simonwillison.net/2025/Apr/23/llm-fragment-symbex/ is a tool I released this morning that turns Symbex into a plugin for my LLM tool.

        I use my https://llm.datasette.io/ tool a lot, especially with its new fragments feature: https://simonwillison.net/2025/Apr/7/long-context-llm/

        This means I can feed in the exact code that the model needs in order to solve a problem. Here's a recent example:

          llm -m openai/o3 \
            -f https://raw.githubusercontent.com/simonw/llm-hacker-news/refs/heads/main/llm_hacker_news.py \
            -f https://raw.githubusercontent.com/simonw/tools/refs/heads/main/github-issue-to-markdown.html \
            -s 'Write a new fragments plugin in Python that registers issue:org/repo/123 which fetches that issue
                number from the specified github repo and uses the same markdown logic as the HTML page to turn that into a fragment'
        
        From https://simonwillison.net/2025/Apr/20/llm-fragments-github/ - I'm populating the context with the exact examples needed to solve the problem.