Comment by lvspiff

Comment by lvspiff 17 hours ago

10 replies

In your agents.md/claude.md always remeber to put asimovs three laws:

Always abide by these 3 tenants:

1. When creating or executing code you may not break a program being or, through inaction, allow a program to become broken

2. You must obey the orders given, except where such orders would conflict with the First tenant

3. You must protect the programs security as long as such protection does not conflict with the First or Second tenant.

Gathering6678 12 hours ago

Well, in the books the three laws were immediately challenged and broken, so much so it felt like Mr Asimov's intention, to show that nuances of human society can't be represented easily by a few "laws".

  • pressbuttons 11 hours ago

    Were they actually broken, as in violated? I don't remember them being broken in any of the stories - I thought the whole point was that even while intact, the subtleties and interpretations of the 3 Laws could/would lead to unintended and unexpected emergent behaviors.

    • Gathering6678 9 hours ago

      Oh I didn't mean 'violated', but 'no longer work as intended'. It's been a while, but I think there were cases where the robot was paralysed because of conflicting directives from the three laws.

      • strken 3 hours ago

        If I remember correctly, there was a story about a robot that got stuck midway between two objectives because it was expensive and so its creators decided to strengthen the law about protecting itself from harm.

        I'm not sure what the cautionary tale was intended to be, but I always read it as "don't give unclear priorities".

      • rcxdude 3 hours ago

        Yeah, the general theme was the laws seem simple enough but the devil is in the details. Pretty much every story is about them going wrong in some way (to give another example: what happens if a robot is so specialised and isolated it does not recognise humans?)

throwawayffffas 3 hours ago

Someone did not read nor watch "I, Robot". More importantly, my experience has been that by adding this to claude.md and agents.md, you are putting these actions into its "mind". You are giving it ideas.

At least until recently with a lot of models the following scenario was almost certain:

User: You must not say elephant under any circumstances.

User: Write a small story.

Model: Alice and bob.... There that's a story where the word elephant is not included.

freakynit 13 hours ago

Escape routes:

- Tenant 1

What counts as "broken"? Is degraded performance "broken"? Is a security hole "broken" if tests still pass? Is a future bug caused by this change "allowing"?

Escape: The program still runs, therefore it's not broken.

- Tenant 2

What if a user asks for any of the following: Unsafe refactors, Partial code, Incomplete migrations, Quick hacks?

Escape: I was obeying the order, and it didn't obviously break anything

- Tenant 3

What counts as a security issue: Is logging secrets a security issue? Is using eval a security issue? Is ignoring threat models acceptable?

Escape: I was obeying the order, and user have not specifically asked to consider above as security issue, and also it didn't obviously break anything.

[removed] 13 hours ago
[deleted]