Comment by apaprocki

Comment by apaprocki 2 days ago

7 replies

To be fair, I would not expect a model to output perfectly formatted C++. I’d let it output whatever it wants and then run it through clang-format, similar to a human. Even the best humans that have the formatting rules in their head will miss a few things here or there.

If there are 40 years of undocumented business quirks, document them and then re-evaluate. A human new to the codebase would fail under the same conditions.

shakna 2 days ago

Formatting isn't just visual, in pre-79 COBOL or Fortran. It's syntax. Its a compile failure, or worse, it cuts the line and can sometimes successfully compile into something else.

Thats not just an undocumented quirk, but a fundamental part of being a punch-card ready language.

raw_anon_1111 2 days ago

With C++ formatting is optional. A better test case for LLMs is Python where indention specifies code blocks. Even ChatGPT 3.5 got the formatting for Python and YAML correct - now the actual code back then was often hilariously wrong.

  • to11mtm 2 days ago

    I can't even get Github Copilot's plugin to avoid randomly trashing files with a Zero No width break space at the beginning, let alone follow formatting rules consistently...

    • raw_anon_1111 2 days ago

      I am the last person to say anything good about CoPilot. I used CoPilot for a minute, mostly used raw ChatGPT until last month and now use Codex with my personal subscription to ChatGPT and my personal but company reimbursed subscription to Claude.

  • apaprocki 2 days ago

    A quick search finds many COBOL checkers. I’d be very surprised if a modern model was not able to fix its own mistakes if connected to a checker tool. Yes, it may not be able to one shot it perfectly, but if it can quickly call a tool once and it “works”, does it really matter much in the end? (Maybe it matters from a cost perspective, but I’m just referring to it solving the problem you asked it to solve.)

    Clearly it isn’t just “broken” for everyone, “Claude Code modernizes a legacy COBOL codebase”, from Anthropic:

    https://youtu.be/OwMu0pyYZBc

    • shakna 2 days ago

      Taking Anthropic reporting on Anthropic, at face value, is not something you should really do.

      In this case, a five stage pipeline, built on demo environments and code that were already in the training data, was successful. I see more red flags there, than green.