Comment by p1necone

Comment by p1necone 14 hours ago

52 replies

Every so often I try out a GPT model for coding again, and manage to get tricked by the very sparse conversation style into thinking it's great for a couple of days (when it says nothing and then finishes producing code with a 'I did x, y and z' with no stupid 'you're absolutely' right sucking up and it works, it feels very good).

But I always realize it's just smoke and mirrors - the actual quality of the code and the failure modes and stuff are just so much worse than claude and gemini.

kshacker 14 hours ago

I am a novice programmer -- I have programmed for 35+ years now but I build and lose the skills moving between coder to manager to sales -- multiple times. Fresh IC since last week again :) I have coded starting with Fortran, RPG and COBOL and I have also coded Java and Scala. I know modern architecture but haven't done enough grunt work to make it work or to debug (and fix) a complex problem. Needless to say sometimes my eyes glaze over the code.

And I write some code for my personal enjoyment, and I gave it to Claude 6-8 months back for improvement, it gave me a massive change log and it was quite risky so abandoned it.

I tried this again with Gemini last week, I was more prepared and asked it to improve class by class, and for whatever reasons I got better answers -- changed code, with explanations, and when I asked it to split the refactor in smaller steps, it did so. Was a joy working on this over the thanksgiving holidays. It could break the changes in small pieces, talk through them as I evolved concepts learned previously, took my feedback and prioritization, and also gave me nuanced explanation of the business objectives I was trying to achieve.

This is not to downplay claude, that is just the sequence of events narration. So while it may or may not work well for experienced programmers, it is such a helpful tool for people who know the domain or the concepts (or both) and struggle with details, since the tool can iron out a lot of details for you.

My goal now is to have another project for winter holidays and then think through 4-6 hour AI assisted refactors over the weekends. Do note that this is a project of personal interest so not spending weekends for the big man.

  • Aurornis 8 hours ago

    > I was more prepared and asked it to improve class by class, and for whatever reasons I got better answers

    There is a learning curve with all of the LLM tools. It's basically required for everyone to go through the trough of disillusionment when you realize that the vibecoding magic isn't quite real in the way the influencers talk about it.

    You still have to be involved in the process, steer it in the right direction, and review the output. Rejecting a lot of output and re-prompting is normal. From reading comments I think it's common for new users to expect perfection and reject the tools when it's not vibecoding the app for them autonomously. To be fair, that's what the hype influencers promised, but it's not real.

    If you use it as an extension of yourself that can type and search faster, while also acknowledging that mistakes are common and you need to be on top of it, there is some interesting value for some tasks.

    • vidarh 3 hours ago

      It really depends on what you're building. As an experiment, I started having Claude Code build a real-time strategy game a bit over a week ago, and it's done an amazing job, with me writing no code whatsoever. It's an area with lots of tutorials for code structure etc., and I'm guessing that helps. And so while I've had to read the code and tell it to refactor things, it has managed to do a good job of it with just relatively high level prodding, and produced a well-architected engine with traits based agents for the NPCs and a lot of well-functioning game mechanics. It started as an experiment, but now I'm seriously toying with building an actual (but small) game with it just to see how far it can get.

      In other areas, it is as you say and you need to be on top of it constantly.

      You're absolutely right re: the learning curve, and you're much more likely to hit an area where you need to be on top of it than one that it can do autonomously, at least without a lot of scaffolding in the form of sub-agents, and rules to follow, and agent loops with reviews etc., which takes a lot of time to build up, and often include a lot of things specific to what you want to achieve. Sorting through how much effort is worth it for those things for a given project will take time to establish.

      • FuckButtons 2 hours ago

        I suspect the meta architecture can also be done autonomously though no one has got there yet, figuring out the right fractal dimension for sub agents and the right prompt context can itself be thought of as a learning problem.

    • wiz21c 4 hours ago

      For me the learning curve was learning to choose what is worth asking to Claude. After 3 months on it, I can reap the benefit: Claude produces the code I want right 80% of the time. I usually ask it: to create new functions from scratch (it truly shines at understanding the context of these functions by reusing other parts of the code I wrote), refactor code, create little tools (for example a chart viewer).

    • boie0025 7 hours ago

      I appreciate this narrative; relatable to me in how I have experienced and watched others around me experience the last few years. It's as if we're all kinda-sorta following a similar "Dunning–Kruger effect" curve at the same time. It feels similar to growing up mucking around with a ppp connection and Netscape in some regards. I'll stretch it: "multimodal", meet your distant analog "hypermedia".

  • altmanaltman 5 hours ago

    Interesting. From my experience, Claude is much better at stuff involving frontend design somehow compared to other models (GPT is pretty bad). Gemini is also good but often the "thinking" mode just adds stuff to my code that I did not ask it to add or modifies stuff to make it "better". It likes to 1 up on the objective a lot which is not great when you're just looking for it to do what you precisely asked it and nothing else.

  • ikidd 10 hours ago

    My problem with Gemini is how token hungry it is. It does a good job but it ends up being more expensive than any other model because it's so yappy. It sits there and argues with itself and outputs the whole movie.

  • mleo 8 hours ago

    Breaking down requirements, functionality and changes into smaller chunks is going to give you better results with most of the tools. If it can complete smaller tasks in the context window, the quality will likely hold up. My go to has been to develop task documents with multiple pieces of functionality and sub tasks. Build one piece of functionality at a time. Commit, clear context and start the next piece of functionality. If something goes off the rails, back up to the commit, fix and rebase future changes or abandon and branch.

    That’s if I want quality. If I just want to prototype and don’t care, I’ll let it go. See what I like, don’t like and start over as detailed above.

  • bovermyer 13 hours ago

    I have never considered trying to apply Claude/Gemini/etc. to Fortran or COBOL. That would be interesting.

    • Aurornis 7 hours ago

      You can actually use Claude Code (and presumably the other tools) on non-code projects, too. If you launch claude code in a directory of files you want to work on, like CSVs or other data, you can ask it to do planning and analysis tasks, editing, and other things. It's fun to experiment with, though for obvious reasons I prefer to operate on a copy of the data I'm using rather than let Claude Code go wild.

      • vidarh 2 hours ago

        I use Claude Code for "everything", and have just committing most things into git as a fallback.

        It's great to then just have it write scripts, and then write skills to use those scripts.

        A lot of my report writing etc. now involve setting up a git repo, and use Claude to do things like process the call transcripts from discovery calls and turn them into initial outlines and questions that needs followup, and tasks lists, and write scripts to do necessary analysis etc., so I can focus on the higher level stuff.

      • smj-edison 4 hours ago

        Side note from someone who just used Claude Code today for the first time: Claude Code is a TUI, so you can run it in any folder/with any IDE and it plays along nicely. I thought it was just another vscode clone, so I was pleasantly surprised that it didn't try to take over my entire workflow.

tartoran 14 hours ago

I'm starting with Claude at work but did have an okay experience with OpenAi so far. For clearly delimited tasks it does produce working code more often than not. I've seen some improvement on their side compared to say, last year. For something more complex and not clearly defined in advance, yes, it does produce plausible garbage and it goes off the rails a lot. I was migrating a project and asked ChatGPT to analyze the original code base and produce a migration plan. The result seemed good and encouraging because I didn't know much about that project at that time. But I ended up taking a different route and when I finished the migration (with bits of help from ChatGPT) I looked at the original migration plan out of curiosity since I had become more familiar with the project by now. And the migration plan was an absolutely useless and senseless hallucination.

stevedonovan 5 hours ago

I've been getting great results from Codex. Can be a bit slow, but gets there. Writes good Rust, powers through integration test generation.

So (again) we are just sharing anecdata

herpdyderp 14 hours ago

On the contrary, I cannot use the top Gemini and Claude models because their outputs are so out place and hard to integrate with my code bases. The GPT 5 models integrate with my code base's existing patterns seamlessly.

  • ta12653421 4 hours ago

    Supply some relevant files of your codebase in the ClaudeAI project area in the right part of the browser. Usually it will understand your architecture, patterns, principles

  • inquirerGeneral 13 hours ago

    You realize on some level all of these sort of anecdotes, though, are simply random coincidence .

findjashua 14 hours ago

NME at all - 5.1 codex has been the best by far.

  • pshirshov 10 hours ago

    By my tests (https://github.com/7mind/jopa) Gemini 3 is somewhat better than Claude with Opus 4.5. Both obliterate Codex with 5.1

    • Incipient 9 hours ago

      What's - roughly - your monthly spend when using ppt models? I only use fixed priced copilot, and my napkin maths says I'd be spending something crazy like $200/mo if I went ppt on the more expensive models.

      • vidarh 2 hours ago

        They have subscriptions too (at least Claude and ChatGPT/Codex; I don't use Gemini much). It's far cheaper to use the subscriptions first and then switch to paying per token beyond that.

    • viking123 4 hours ago

      Codex is super cheap though even with the cheapest GPT subscription you get lots of tokens. I use 4.5 opus at work and codex at home tbh the differences are not that big if you know what you are doing.

  • manmal 14 hours ago

    How can you stand the excruciating slowness? Claude Code is running circles around codex. The most mundane tasks make it think for a minute before doing anything.

    • aschobel 13 hours ago

      I use it on medium reasoning and it's decently quick. I only switch to gpt-5.1-codex-max xhigh for the most annoying problems.

    • wahnfrieden 13 hours ago

      By learning to parallelize my work. This also solved my problem with slow Xcode builds.

      • manmal 13 hours ago

        Well you can’t edit files while Xcode is building or the compiler will throw up, so I‘m wondering what you mean here. You can’t even run swift test in 2 agents at the same time, because swift serializes access for some reason.

        Whenever I have more than 1 agent run Swift tests in a loop to fix things, and another one to build something, the latter will disturb the former and I need to cancel.

        And then there’s a lot of work that can’t be parallelized, like complex git rebases - well you can do other things in a worktree, but good luck merging that after you‘ve changed everything in the repo. Codex is really really bad at git.

  • andybak 17 minutes ago

    NME = "not my experience" I presume.

    JFC TLA OD...

sharyphil 13 hours ago

You're absolutely right!

Somehow it doesn't get on my nerves (unlike Gemini with "Of course").

jpalomaki 14 hours ago

Can you give some concrete example of programming problem task GPT fails to solve?

Interested, because I’ve been getting pretty good results with different tasks using the Codex.

  • kriro 41 minutes ago

    Library/API conflicts are the biggest pain point for me usually. Especially breaking changes. RLlib (currently 2.41.0) and Gymnasium (currently 0.29.0+) have ended in circles many times for me because they tend to be out of sync (for multi-agent environments). My go to test now is a simple hello world type card game like war, competitive multi-agent with rllib and gymnasium (pettingzoo tends to cause even more issues).

    Claude Sonnet 4.5 was able to figure out a way to resolve it eventually (around 7 fixes) and I let it create an rllib.md with all the fixes and pitfalls and am curious if feeding this file to the next experiment will lead to a one-shot. GPT-5 struggled more but haven't tried Codex on this yet so it's not exactly fair.

    All done with Copilot in agent mode, just prompting, no specs or anything.

  • gloosx 5 hours ago

    Try to ask it to write some GLSL shaders. Just describe what you want to see and then try to run the shaders it outputs. It can output a UV-map or the simple gradient right, but when it comes to shaders a bit more complex it most of the time will not compile or run properly, sometimes mix GLSL versions, sometimes just straight make up things which don't work or output what you want.

  • throwaway31131 7 hours ago

    I posted this example before but academic papers on algorithms often have pseudo code but no actual code.

    I thought it would be handy to use AI to make the code from the paper so a few months ago I tried to use Claude (not GPT, because I only have access to Claude) to recreate C++ code to implement the algorithms in this paper as practice for me in LLM use and it didn’t go well.

    https://users.cs.duke.edu/~reif/paper/chen/graph/graph.pdf

    • threeducks an hour ago

      I just tried it with GPT-5.1-Codex. The compression ratio is not amazing, so not sure if it really worked, but at least it ran without errors.

      A few ideas how to make it work for you:

      1. You gave a link to a PDF, but you did not describe how you provided the content of the PDF to the model. It might only have read the text with something like pdftotext, which for this PDF results in a garbled mess. It is safer to convert the pages to PNG (e.g. with pdftoppm) and let the model read it from the pages. A prompt like "Transcribe these pages as markdown." should be sufficient. If you can not see what the model did, there is a chance it made things up.

      2. You used C++, but Python is much easier to write. You can tell the model to translate the code to C++ once it works in Python.

      3. Tell the model to write unit tests to verify that the individual components work as intended.

      4. Use Agent Mode and tell the model to print something and to judge whether the output is sensible, so it can debug the code.

  • cmarschner 14 hours ago

    Completely failed for me running the code it changed in a docker container i keep running. Claude did it flawlessly. It absolutely rocks at code reviews but ir‘s terrible in comparison generating code

    • peab 12 hours ago

      It really depends on what kind of code. I've found it incredible for frontend dev, and for scripts. It falls apart in more complex projects and monorepos

CheeseFromLidl 4 hours ago

Same experience here. The more commonly known the stuff it regurgitates is, the fewer errors. But if you venture into RF electronics or embedded land, beware of it turning into a master of bs.

Which makes sense for something that isn’t AI but LLM.

logicchains 14 hours ago

I find for difficult questions math and design questions GPT5 tends to produce better answers than Claude and Gemini.

  • munk-a 13 hours ago

    Could you clarify what you mean by design questions? I do agree that GPT5 tends to have a better agentic dispatch style for math questions but I've found it has really struggled with data model design.

bsder 8 hours ago

At this point you are now forced to use the "AI"s as code search tools--and it annoys me to no end.

The problem is that the "AI"s can cough up code examples based upon proprietary codebases that you, as an individual, have no access to. That creates a significant quality differential between coders who only use publicly available search (Google, Github, etc.) vs those who use "AI" systems.