Comment by alexpham14

Comment by alexpham14 2 days ago

13 replies

Compliance is usually the hard stop before we even get to capability. We can’t send code out, and local models are too heavy to run on the restricted VDI instances we’re usually stuck with. Even when I’ve tried it on isolated sandbox code, it struggles with the strict formatting. It tends to drift past column 72 or mess up period termination in nested IFs. You end up spending more time linting the output than it takes to just type it. It’s decent for generating test data, but it doesn't know the forty years of undocumented business logic quirks that actually make the job difficult.

apaprocki 2 days ago

To be fair, I would not expect a model to output perfectly formatted C++. I’d let it output whatever it wants and then run it through clang-format, similar to a human. Even the best humans that have the formatting rules in their head will miss a few things here or there.

If there are 40 years of undocumented business quirks, document them and then re-evaluate. A human new to the codebase would fail under the same conditions.

  • shakna 2 days ago

    Formatting isn't just visual, in pre-79 COBOL or Fortran. It's syntax. Its a compile failure, or worse, it cuts the line and can sometimes successfully compile into something else.

    Thats not just an undocumented quirk, but a fundamental part of being a punch-card ready language.

  • raw_anon_1111 2 days ago

    With C++ formatting is optional. A better test case for LLMs is Python where indention specifies code blocks. Even ChatGPT 3.5 got the formatting for Python and YAML correct - now the actual code back then was often hilariously wrong.

    • to11mtm 2 days ago

      I can't even get Github Copilot's plugin to avoid randomly trashing files with a Zero No width break space at the beginning, let alone follow formatting rules consistently...

      • raw_anon_1111 2 days ago

        I am the last person to say anything good about CoPilot. I used CoPilot for a minute, mostly used raw ChatGPT until last month and now use Codex with my personal subscription to ChatGPT and my personal but company reimbursed subscription to Claude.

    • apaprocki 2 days ago

      A quick search finds many COBOL checkers. I’d be very surprised if a modern model was not able to fix its own mistakes if connected to a checker tool. Yes, it may not be able to one shot it perfectly, but if it can quickly call a tool once and it “works”, does it really matter much in the end? (Maybe it matters from a cost perspective, but I’m just referring to it solving the problem you asked it to solve.)

      Clearly it isn’t just “broken” for everyone, “Claude Code modernizes a legacy COBOL codebase”, from Anthropic:

      https://youtu.be/OwMu0pyYZBc

      • shakna 2 days ago

        Taking Anthropic reporting on Anthropic, at face value, is not something you should really do.

        In this case, a five stage pipeline, built on demo environments and code that were already in the training data, was successful. I see more red flags there, than green.

akhil08agrawal 2 days ago

Nuances of a codebase are the key. But I guess we are accelerating towards solving that. Let's see how much time will this take.

  • layer8 2 days ago

    The critical “why” knowledge often cannot be derived from the code base.

    The prohibitions on other companies (LLM providers) being able to see your code also won’t be going away soon.

    • Muromec 2 days ago

      Other companies can see the code, that isn’t a problem. The problem with LLM is the idea that the code leaks out to companies other than LLM provider.

      That’s something that can be either solved for real or be promised to not happen.

      • layer8 a day ago

        > Other companies can see the code, that isn’t a problem.

        It actually is a restriction in many industries.