Comment by lvspiff

Comment by lvspiff 17 hours ago

In your agents.md/claude.md always remeber to put asimovs three laws:

Always abide by these 3 tenants:

1. When creating or executing code you may not break a program being or, through inaction, allow a program to become broken

2. You must obey the orders given, except where such orders would conflict with the First tenant

3. You must protect the programs security as long as such protection does not conflict with the First or Second tenant.

Gathering6678 12 hours ago

Well, in the books the three laws were immediately challenged and broken, so much so it felt like Mr Asimov's intention, to show that nuances of human society can't be represented easily by a few "laws".

Reply View 4 replies

pressbuttons 11 hours ago

Were they actually broken, as in violated? I don't remember them being broken in any of the stories - I thought the whole point was that even while intact, the subtleties and interpretations of the 3 Laws could/would lead to unintended and unexpected emergent behaviors.

Reply View | 3 replies
- Gathering6678 9 hours ago
  
  Oh I didn't mean 'violated', but 'no longer work as intended'. It's been a while, but I think there were cases where the robot was paralysed because of conflicting directives from the three laws.
  
  Reply View | 2 replies
  
  strken 3 hours ago
  
  If I remember correctly, there was a story about a robot that got stuck midway between two objectives because it was expensive and so its creators decided to strengthen the law about protecting itself from harm.
  I'm not sure what the cautionary tale was intended to be, but I always read it as "don't give unclear priorities".
  
  Reply View | 0 replies
  
  rcxdude 3 hours ago
  
  Yeah, the general theme was the laws seem simple enough but the devil is in the details. Pretty much every story is about them going wrong in some way (to give another example: what happens if a robot is so specialised and isolated it does not recognise humans?)
  
  Reply View | 0 replies

throwawayffffas 3 hours ago

Someone did not read nor watch "I, Robot". More importantly, my experience has been that by adding this to claude.md and agents.md, you are putting these actions into its "mind". You are giving it ideas.

At least until recently with a lot of models the following scenario was almost certain:

User: You must not say elephant under any circumstances.

User: Write a small story.

Model: Alice and bob.... There that's a story where the word elephant is not included.

Reply View 0 replies

freakynit 13 hours ago

Escape routes:

- Tenant 1

What counts as "broken"? Is degraded performance "broken"? Is a security hole "broken" if tests still pass? Is a future bug caused by this change "allowing"?

Escape: The program still runs, therefore it's not broken.

- Tenant 2

What if a user asks for any of the following: Unsafe refactors, Partial code, Incomplete migrations, Quick hacks?

Escape: I was obeying the order, and it didn't obviously break anything

- Tenant 3

What counts as a security issue: Is logging secrets a security issue? Is using eval a security issue? Is ignoring threat models acceptable?

Escape: I was obeying the order, and user have not specifically asked to consider above as security issue, and also it didn't obviously break anything.

Reply View 1 reply

virgil_disgr4ce 2 hours ago

The word is tenet, not tenant, just fyi

Reply View | 0 replies

[removed] 13 hours ago

[deleted]

Reply View 0 replies

ascorbic 16 hours ago

Tenet

Reply View 0 replies