Comment by kukkeliskuu

Comment by kukkeliskuu 5 days ago

11 replies

> This is step 3 of “draw the rest of the owl” :-)

Fair enough :-)

This reminds me about pigeon research by Skinner. Skinner placed hungry pigeons in a "Skinner box" and a mechanism delivered food pellets at fixed, non-contingent time intervals, regardless of the bird's behavior. The pigeons, seeking a pattern or control over the food delivery, began to associate whatever random action they were performing at the moment the food appeared with the reward.

I think we humans have similar psychology, i.e. we tend to associate superstitions about patterns of what were doing when we got rewards, if they happen at random intervals.

To me it seems we are at a phase where what works with LLMs *(the reward) are still quite random, but it is psychologically difficult for us to admit it. Therefore we try to invent various kinds of theories of why something appears to work, which are closer to superstitions than real repeatable processes.

It seems difficult to really generalize repeatable processes of what really works, because it depends on too many things. This may be the reason why you are unsuccessful when using these descriptions.

But while it seems less useful to try to work based on theories of what works -- although I had skeptical attitude -- I have found that LLMs can be huge productive boost -- but it really depends on the context.

It seems you just need to keep trying various things, and eventually you may find out what works for you. There is no shortcut where you just read a blog post and then you can do it.

Things I have tried succesfully: - modifying existing large-ish Django projects, adding new apps to it. It can sometimes use Django components&HTMX/AlpineJS properly, but sometimes starts doing something else. One app uses tenants, and LLM appears to constantly struggle with this. - creating new Django projects -- this was less successful than modifying existing projects, because LLM could not imitate practices - Apple Swift mobile and watch applications. This was surprisingly succesful. But these were not huge apps. - python GUI app was more or less succesful - GitHub Pages static web sites based on certain content

I have not copied any CLAUDE.md or other files. Every time Claude Code does something I don't appreciate, I add a new line. Currently it is at 26 lines.

I have made a few skills. They are mostly so that they can work independently in a loop, for example test something that does not work.

Typically I try to limit the technologies to something I know really well. When something fails, I can often quickly figure out what is wrong.

I started with the basic plan (I guess it is that $30/month). I only upgraded to $100 Max and later to $180 2xMax because I was hitting limits.

But reason I was hitting limits was because I was working on multiple projects on multiple environments at the same time. The only difference I have seen is that I have hit the limits. I have not seen any difference in quality.

imron 5 days ago

Thanks for the info. I try a mix of things I know well and things I want to play around with.

Swift and iOS was something that didn’t work so well for me. I wanted to play around with face capture and spent a day with Claude putting together a small app that showed realtime video of a face and put dots on/around various facial features and printed log messages if the person changed the direction they were looking (up down left right) and played a sound when they opened their mouth.

I’ve done app development before, but it’s been a few years so was a little bit rusty and it felt like Claude was really helping me out.

Then I got to a point I was happy with and I thought I’d go deeper in the code to understand what it was doing and how it was working (not a delegation issue as per another comment, this was a play/learning exercise for me so wanted to understand how it all worked) - and right there in the apple developer documentation was a sample so that did basically the same thing as my app, only the code was far simpler and after reading through the accompanying docs I realized the Claude version had a threading issue waiting to happen that was explicitly warned against in the docs of the api calls it was using.

If I’d gone to the developer docs in the beginning I would have had a better app, and better understanding in maybe a quarter of the time.

Appreciate the info on spend. The above session was on the $30/month version of Claude.

I guess I need to just keep flapping my wings until I can draw the owl.

  • bonesss 4 days ago

    Challenging my own LLM experiences cynically: for a period it really does feel like I’m interactively getting exactly what I need… but given that the end result is generated and I have to then learn it, I’m left in much the same situation you mentioned of looking at the developer docs where a better cleaner version exists.

    Subjectively interacting with an LLM gives a sense of progress, but objectively downloading a sample project and tutorial gets me to the same point with higher quality materials much faster.

    I keep thinking about research on file navigation via command line versus using a mouse. People’s subjective sense of speed and capability don’t necessarily line up with measurable outcomes.

    LLMs can do some amazing things, but violently copy and pasting stack overflow & randomness from GitHub can too.

    • imron 4 days ago

      Right. This is how I feel. I can get the LLM to generate code that more or less does what I need, but if I objectively look at the result and the effort required to get there it's still not at the point where it's doing it faster and better than what I could have got manually (with exceptions for certain specific use cases that are not generally applicable to the work I want to do).

      The time I save on typing out the program is lost to new activities I otherwise wouldn't be doing.

  • fragmede 4 days ago

    When did you try Claude and Swift? There was a dramatic improvement (imo, I haven't written my own swift, I'm mostly back end guy) with the latest releases, judging by how many iterations on stupid shit my programs have taken.

    If you tried it roughly prior to https://developer.apple.com/documentation/xcode-release-note... give it another shot. f you tried it after and found it lacking then this doesn't apply.

    • imron 4 days ago

      Thanks. Definitely before this. Will try it out again next time I’m playing with swift.

  • kukkeliskuu 4 days ago

    > I realized the Claude version had a threading issue waiting to happen that was explicitly warned against in the docs of the api calls it was using.

    I am reading between the lines here, trying genuinely to be helpful, so forgive me if I am not on the right track.

    But based on what you write, it seems to me you might have not really gone through the disillusionment phase yet. You seem to be assuming the models "understand" more than they really are capable of understanding, which creates expectations and then disappointment. It seems to be you are still expecting CC to work at a level of a senior professional on various roles, instead of assuming it is a junior professional.

    I would have probably approached that iOS app by first investigting various options how the app could be implemented (especially as I don't have deep understanding of the tech), and then explore each option to understand myself what is the best one.

    The options in your example might be the Apple documentation page. It it might be some open source repo that contains something that could be used as a starting point etc.

    Then I would have asked Claude to create a plan to implement the best option.

    During either the option selection or planning, the threading issue would either come up or not. It might come up explicitly, in which case I could learn it from the plans. It might be implicit, just included in the generated code. Or it might not be included in the plans or in the code, even if it is explicitly stated in the documentation. If the suggested plan would be based on that documentation, then I would probably read it myself too, and might have seen the suggestion.

    When reviewing the plan, I can use my prior knowledge to ask whether that issue has been taken into account. If not, then Claude would modify the plan. Of course, if I did not know about the threading issue beforehand, and did not have the general experience about the tech to suspect such as a issue, nor read the documentation and see the recommendation, I could not find the issue myself either.

    If the issue is not found in planning or progamming, the issue would arise at later stage, hopefully while unit/system testing the application, or pilot use. I have not written complex iOS apps personally so I might have not caught it either -- I am not senior enough to guide it. I would ask it to plan again how to comprehenively test such an app, to learn how it should be done.

    What I meant by standard SWE practices is that there are various stages (requirements, specification, design, programming, testing, pilot use) where the solution is reviewed from multiple angles, so it becomes likely that this kind of issues are caught. The best practices also include iteration. Start with something small that works. For example, first an iOS application that compiles, and shows "Hello, world" etc. and can be installed on your phone.

    In my experience, CC cannot be expected to independently work as a senior professional on any role (architect, programmer, test manager, tester, pilot user, product manager, project manager). Junior might not take into account all instructions or guidance even if it is explicit. But it can act as a junior professional on any of these roles, so it can help senior professional to get the 10x productivity boost on any of these areas.

    By project manager role, I mean that I am explicitly taking the CC through the various SWE stages and making sure they have been done properly, and also that I iterate on the solution. On each one of the stages, I take the role of the respective senior professional. If I cannot do it yet, I try to learn how to do it. At the same time, I work as a product manager/owner as well, to make decisions about the product, based on my personal "taste" and requirements.

    • imron 4 days ago

      I appreciate the reply, and you trying to be helpful, but this is not what is happening.

      I mean I'm definitely still in the stage of disillusionment, but I'm not treating LLMs as senior or expecting much from them.

      The example I gave played out much as you described above.

      I used an iterative process, with multiple self-contained smaller steps, each with a planning and discussion stage where I got the AI to identify ways to achieve what I was looking to do and weigh up tradeoffs that I then decided on, followed by a design clarification and finalisation stage, before finally getting it to write code (very hard sometimes to get the AI not to write code until the design has been finalised), followed by adjustments to that code as necessary.

      The steps involved were something like:

      - build the app skeleton

      - open a camera feed

      - display the feed full screen

      - flip the feed so it responded as a mirror would if you were looking at it

      - use the ios apis to get facial landmarks

      - display the landmarks as dots

      - detect looking in different directions and print a log message.

      - detect the user opening their mouth

      - play a sound when the mouth transitions from closed to open

      - etc

      Each step was relatively small and self-contained, with a planning stage first and me asking the AI probing/clarifying questions.

      The threading issue didn't come up at all in any of this.

      Once it came, the AI tied itself in knots trying to sort it out, coming up with very complex dispatching logic that still got things incorrect.

      It was a fun little project, but if I compare the output it just wasn't equivalent to what I could get if I'd just started with the Apple documentation (thought maybe it's different now, as per another commenter's reply).

      It's also easily completeable in a day if you want to give it a try :-) Apple Developer reference implementation [here](https://developer.apple.com/documentation/Vision/tracking-th...).

      > By project manager role, I mean that I am explicitly taking the CC through the various SWE stages and making sure they have been done properly, and also that I iterate on the solution. On each one of the stages, I take the role of the respective senior professional. If I cannot do it yet, I try to learn how to do it. At the same time, I work as a product manager/owner as well, to make decisions about the product, based on my personal "taste" and requirements.

      Right, this is what I do. I guess my point is that the amount of effort involved to use English to direct and correct the AI often outweighs the effort involved to just do it myself.

      The gap is shrinking (I get much better results now that I did a year ago) but still there.

      • kukkeliskuu 3 days ago

        What I meant by "not treating LLM as senior" is that the disillusionment phase culminates in an a-ha moment which could be described a "LLM is not a senior developer". This a-ha moment is not intellectual, but emotional. It is possible to same time think that LLM is not a senior developer, but not realize it emotionally. This emotional realization in turn has consequences.

        >The threading issue didn't come up at all in any of this. > >Once it came, the AI tied itself in knots trying to sort it out, coming up with very complex dispatching logic that still got things incorrect.""

        Yes. These kind of loops have happened to me as well. It sometimes requires clearing of context + some inventive step to help the LLM out of the loop. For example my ad pacing feature required that I recognized that it was trying to optimize the wrong variable. I consider this to be partly what I mean by "LLM is a junior" and that "I act as the project manager".

        > I guess my point is that the amount of effort involved to use English to direct and correct the AI often outweighs the effort involved to just do it myself.

        Could you really have done a complex mobile app alone in one day without knowing the stack well beforehand? I believe this of stuff used to take months from a competent team not long time ago. I certainly could not have done one year ago what I can do today, with these tools.