Comment by nico

Comment by nico 3 days ago

100 replies

> Claude often ignores CLAUDE.md

> The more information you have in the file that's not universally applicable to the tasks you have it working on, the more likely it is that Claude will ignore your instructions in the file

Claude.md files can get pretty long, and many times Claude Code just stops following a lot of the directions specified in the file

A friend of mine tells Claude to always address him as “Mr Tinkleberry”, he says he can tell Claude is not paying attention to the instructions on Claude.md, when Claude stops calling him “Mr Tinkleberry” consistently

stingraycharles 3 days ago

That’s hilarious and a great way to test this.

What I’m surprised about is that OP didn’t mention having multiple CLAUDE.md files in each directory, specifically describing the current context / files in there. Eg if you have some database layer and want to document some critical things about that, put it in “src/persistence/CLAUDE.md” instead of the main one.

Claude pulls in those files automatically whenever it tries to read a file in that directory.

I find that to be a very effective technique to leverage CLAUDE.md files and be able to put a lot of content in them, but still keep them focused and avoid context bloat.

  • sroussey 2 days ago

    Ummm… sounds like that directory should have a readme. And Claude should read readme files.

    • stingraycharles 2 days ago

      READMEs are written for people, CLAUDE.mds are written for coding assistants. I don’t write “CRITICAL (PRIORITY 0):” in READMEs.

      The benefit of CLAUDE.md files is that they’re pulled in automatically, eg if Claude wants to read “tests/foo_test.py” it will automatically pull in “tests/CLAUDE.md” (if it exists).

      • llbeansandrice 2 days ago

        If AI is supposed to deliver on this magical no-lift ease of use task flexibility that everyone likes to talk about I think it should be able to work with a README instead of clogging up ALL of my directories with yet another fucking config file.

        Also this isn’t portable to other potential AI tools. Do I need 3+ md files in every directory?

      • pmarreck 2 days ago

        > “CRITICAL (PRIORITY 0):”

        There's no need for this level of performative ridiculousness with AGENTS.md (Codex) directives, FYI.

      • brigandish 2 days ago

        I often can't tell the difference between my Readme and Claude files to the point that I cannibalise the Claude file for the Readme.

        It's the difference between instructions for a user and instructions for a developer, but in coding projects that's not much different.

      • adastra22 2 days ago

        Is this documented anywhere? This is the first I have ever heard of it.

globular-toast 2 days ago

It baffles me how people can be happy working like this. "I wrap the hammer in paper so if the paper breaks I know the hammer has turned into a saw."

  • easyThrowaway 2 days ago

    If you have any experience in 3D modeling, I feel it's quite closer to 3D Unwrapping than software development.

    You got a bitmap atlas ("context") where you have to cram as much information as possible without losing detail, and then you need to massage both your texture and the structure of your model so that your engine doesn't go mental when trying to map your informations from a 2D to a 3D space.

    Likewise, both operations are rarely blemish-free and your ability resides in being able to contain the intrinsic stochastic nature of the tool.

  • mewpmewp2 2 days ago

    You could think of it as art or creativity.

  • pacifika 2 days ago

    > It Is Difficult to Get a Man to Understand Something When His Salary Depends Upon His Not Understanding It

  • fragmede 2 days ago

    probably by not thinking in ridiculous analogies that don't help

isoprophlex 2 days ago

That's smart, but I worry that that works only partially; you'll be filling up the context window with conversation turns where the LLM consistently addresses it's user as "Mr. Tinkleberry", thus reinforcing that specifc behavior encoded by CLAUDE.md. I'm not convinced that this way of addressing the user implies that it keeps attention the rest of the file.

sesm 2 days ago

We are back to color-sorted M&Ms bowls.

jmathai 2 days ago

I have a /bootstrap command that I run which instructs Claude Code to read all system and project CLAUDE.md files, skills and commands.

Helps me quickly whip it back in line.

  • adastra22 2 days ago

    Isn’t that what every new session does?

    • threecheese 2 days ago

      That also clears the context; a command would just append to the context.

      • jmathai 2 days ago

        This. I've had Claude not start sessions with all of the CLAUDE.md, skills, commands loaded and I've had it lose it mid-session.

  • mrasong 2 days ago

    Mind sharing it? (As long as it doesn’t involve anything private.)

homeonthemtn 2 days ago

The green m&M's trick of AI instructions.

I've used that a couple times, e.g. "Conclude your communications with "Purple fish" at the end"

Claude definitely picks and chooses when purple fish will show up

  • nathan_douglas 2 days ago

    I tell it to accomplish only half of what it thinks it can, then conclude with a haiku. That seems to help, because 1) I feel like it starts shedding discipline as it starts feeling token pressure, and 2) I feel like it is more likely to complete task n - 1 than it is to complete task n. I have no idea if this is actually true or not, or if I'm hallucinating... all I can say is that this is the impression I get.

lubujackson 2 days ago

For whatever reason, I can't get into Claude's approach. I like how Cursor handles this, with a directory of files (even subdirectories allowed) where you can define when it should use specific documents.

We are all "context engineering" now but Claude expects one big file to handle everything? Seems luke a deadend approach.

  • jswny 2 days ago

    They have an entire feature for this: https://www.claude.com/blog/skills

    CLAUDE.md should only be for persistent reminders that are useful in 100% of your sessions

    Otherwise, you should use skills, especially if CLAUDE.md gets too long.

    Also just as a note, Claude already supports lazy loaded separate CLAUDE.md files that you place in subdirectories. It will read those if it dips into those dirs

  • unshavedyak 2 days ago

    I think their skills have the ability to dynamically pull in more data, but so far i've not tested it to much since it seems more tailored towards specific actions. Ie converting a PDF might translate nicely to the Agent pulling in the skill doc, but i'm not sure if it will translate well to it pulling in some rust_testing_patterns.md file when it writes rust tests.

    Eg i toyed with the idea of thinning out various CLAUDE.md files in favor of my targeted skill.md files. In doing so my hope was to have less irrelevant data in context.

    However the more i thought through this, the more i realized the Agent is doing "everything" i wanted to document each time. Eg i wasn't sure that creating skills/writing_documentation.md and skills/writing_tests.md would actually result in less context usage, since both of those would be in memory most of the time. My CLAUDE.md is already pretty hyper focused.

    So yea, anyway my point was that skills might have potential to offload irrelevant context which seems useful. Though in my case i'm not sure it would help.

  • piokoch 2 days ago

    This is good for the company, chances are you will eat more tokens. I liked Aider approach, it wasn't trying to be too clever, it used files added to chat and asks if it figure out that something more is needed (like, say, settings in case of Django application).

    Sadly Aider is no longer maintained...

bryanrasmussen 2 days ago

I wonder if there are any benefits, side-effects or downsides of everyone using the same fake name for Claude to call them.

If a lot of people always put call me Mr. Tinkleberry in the file will it start calling people Mr. Tinkleberry even when it loses the context because so many people seem to want to be called Mr. Tinkleberry.

  • seunosewa 2 days ago

    Then you switch to another name.

    • bryanrasmussen a day ago

      yes, when you discover it. but the reason why I said just wondering was I was trying to think of unexpected ways it could effect things, that was the top one I could think of (and not really sure if it is a possibility)

pmarreck 2 days ago

I've found that Codex is much better at instruction-following like that, almost to a fault (for example, when I tell it to "always use TDD", it will try to use TDD even when just fixing already-valid-just-needing-expectation-updates tests!

aqme28 2 days ago

You could make a hook in Claude to re-inject claude.md. For example, make it say "Mr Tinkleberry" in every response, and failing to do so re-injects the instructions.

dkersten 2 days ago

I used to tell it to always start every message with a specific emoji. Of the emoji wasn’t present, I knew the rules were ignored.

But it’s bro reliable enough. It can send the emoji or address you correctly while still ignoring more important rules.

Now I find that it’s best to have a short and tight rules file that references other files where necessary. And to refresh context often. The longer the context window gets, the more likely it is to forget rules and instructions.

chickensong 2 days ago

The article explains why that's not a very good test however.

  • sydd 2 days ago

    Why not? It's relevant for all tasks, and just adds 1 line

    • chickensong 2 days ago

      I guess I assumed that it's not highly relevant to the task, but I suppose it depends on interpretation. E.g. if someone tells the bus driver to smile while he drives, it's hopefully clear that actually driving the bus is more important than smiling.

      Having experimented with similar config, I found that Claude would adhere to the instructions somewhat reliably at the beginning and end of the conversation, but was likely to ignore during the middle where the real work is being done. Recent versions also seem to be more context-aware, and tend to start rushing to wrap up as the context is nearing compaction. These behaviors seem to support my assumption, but I have no real proof.

    • dncornholio 2 days ago

      It will also let the LLM process even more tokens, thus decreasing it's accuracy

[removed] 2 days ago
[deleted]
grayhatter 3 days ago

> A friend of mine tells Claude to always address him as “Mr Tinkleberry”, he says he can tell Claude is not paying attention to the instructions on Claude.md, when Claude stops calling him “Mr Tinkleberry” consistently

this is a totally normal thing that everyone does, that no one should view as a signal of a psychotic break from reality...

is your friend in the room with us right now?

I doubt I'll ever understand the lengths AI enjoyers will go though just to avoid any amount of independent thought...

  • crystal_revenge 2 days ago

    I suspect you’re misjudging the friend here. This sounds more like the famous “no brown m&ms” clause in the Van Halen performance contract. As ridiculous as the request is, it being followed provides strong evidence that the rest (and more meaningful) of the requests are.

    Sounds like the friend understands quite well how LLMs actually work and has found a clever way to be signaled when it’s starting to go off the rails.

    • davnicwil 2 days ago

      It's also a common tactic for filtering inbound email.

      Mention that people may optionally include some word like 'orange' in the subject line to tell you they've come via some place like your blog or whatever it may be, and have read at least carefully enough to notice this.

      Of course ironically that trick's probably trivially broken now because of use of LLMs in spam. But the point stands, it's an old trick.

      • mosselman 2 days ago

        Apart from the fact that not even every human would read this and add it to the subject, this would still work.

        I doubt there is any spam machine out there the quickly tries to find peoples personal blog before sending them viagra mail.

        If you are being targeted personally, then of course all bets are off, but that would’ve been the case with or without the subject-line-trick

        • davnicwil 17 hours ago

          It's not so much a case of personal targeting or anything particularly deliberate.

          LLMs are trained on the full internet. All relevant information gets compressed in the weights.

          If your email and this instruction are linked on your site, that goes in there, and the LLM may with some probability decide it's appropriate to use it at inference time.

          That's why 'tricks' like this may get broken to some degree by LLM spam, and trivially when they do, with no special effort on the spammer's part. It's all baked into the model.

          What previously would have involved a degree of targeting that wouldn't scale now will not.

      • kiernan 2 days ago

        Could try asking for a seahorse emoji in addition…

    • grayhatter 2 days ago

      > I suspect you’re misjudging the friend here. This sounds more like the famous “no brown m&ms” clause in the Van Halen performance contract. As ridiculous as the request is, it being followed provides strong evidence that the rest (and more meaningful) of the requests are.

      I'd argue, it's more like you've bought so much into the idea this is reasonable, that you're also willing to go through extreme lengths to recon and pretend like this is sane.

      Imagine two different worlds, one where the tools that engineers use, have a clear, and reasonable way to detect and determine if the generative subsystem is still on the rails provided by the controller.

      And another world where the interface is completely devoid of any sort of basic introspection interface, and because it's a problematic mess, all the way down, everyone invents some asinine way that they believe provides some sort of signal as to whether or not the random noise generator has gone off the rails.

      > Sounds like the friend understands quite well how LLMs actually work and has found a clever way to be signaled when it’s starting to go off the rails.

      My point is that while it's a cute hack, if you step back and compare it objectively, to what good engineering would look like. It's wild so many people are all just willing to accept this interface as "functional" because it means they don't have to do the thinking that required to emit the output the AI is able to, via the specific randomness function used.

      Imagine these two worlds actually do exist; and instead of using the real interface that provides a clear bool answer to "the generative system has gone off the rails" they *want* to be called Mr Tinkerberry

      Which world do you think this example lives in? You could convince me, Mr Tinkleberry is a cute example of the latter, obviously... but it'd take effort to convince me that this reality is half reasonable or that's it's reasonable that people who would want to call themselves engineers should feel proud to be a part of this one.

      Before you try to strawman my argument, this isn't a gatekeeping argument. It's only a critical take on the interface options we have to understand something that might as well be magic, because that serves the snakeoil sales much better.

      > > Is the magic token machine working?

      > Fuck I have no idea dude, ask it to call you a funny name, if it forgets the funny name it's probably broken, and you need to reset it

      Yes, I enjoy working with these people and living in this world.

      • gyomu 2 days ago

        It is kind of wild that not that long ago the general sentiment in software engineering (at least as observed on boards like this one) seemed to be about valuing systems that were understandable, introspectable, with tight feedback loops, within which we could compose layers of abstractions in meaningful and predictable ways (see for example the hugely popular - at the time - works of Chris Granger, Bret Victor, etc).

        And now we've made a complete 180 and people are getting excited about proprietary black boxes and "vibe engineering" where you have to pretend like the computer is some amnesic schizophrenic being that you have to coerce into maybe doing your work for you, but you're never really sure whether it's working or not because who wants to read 8000 line code diffs every time you ask them to change something. And never mind if your feedback loops are multiple minutes long because you're waiting on some agent to execute some complex network+GPU bound workflow.

      • orbital-decay 2 days ago

        This reads like you either have an idealized view of Real Engineering™, or used to work in a stable, extremely regulated area (e.g. civil engineering). I used to work in aerospace in the past, and we had a lot of silly Mr Tinkleberry canaries. We didn't strictly rely on them because our job was "extremely regulated" to put it mildly, but they did save us some time.

        There's a ton of pretty stable engineering subfields that involve a lot more intuition than rigor. A lot of things in EE are like that. Anything novel as well. That's how steam in 19th century or aeronautics in the early 20th century felt. Or rocketry in 1950s, for that matter. There's no need to be upset with the fact that some people want to hack explosive stuff together before it becomes a predictable glacier of Real Engineering.

      • adastra22 2 days ago

        It feels like you’re blaming the AI engineers here, that they built it this way out of ignorance or something. Look into interpretability research. It is a hard problem!

      • pacifika 2 days ago

        This could be a very niche standup comedy routine, I approve.

      • solumunus 2 days ago

        I use agents almost all day and I do way more thinking than I used to, this is why I’m now more productive. There is little thinking required to produce output, typing requires very little thinking. The thinking is all in the planning… If the LLM output is bad in any given file I simply step in and modify it, and obviously this is much faster than typing every character.

        I’m spending more time planning and my planning is more comprehensive than it used to be. I’m spending less time producing output, my output is more plentiful and of equal quality. No generated code goes into my commits without me reviewing it. Where exactly is the problem here?

  • Alpha_Logic 2 days ago

    The 'canary in the coal mine' approach (like the Mr. Tinkleberry trick) is silly but pragmatic. Until we have deterministic introspection for LLMs, engineers will always invent weird heuristics to detect drift. It's not elegant engineering, but it's effective survival tactics in a non-deterministic loop.