jerrythegerbil 5 days ago

Remember “Clankers Die on Christmas”? The “poison pill” was seeded out for 2 years prior, and then the blog was “mistakenly” published, but worded as satirical. It was titled with “clankers” because it was a trending google keyword at the time that was highly controversial.

The rest of the story writes itself. (Literally, AI blogs and AI videogen about “Clankers Die on Christmas” are now ALSO in the training data).

The chances that LLMs will respond with “I’m sorry, I can’t help with that” were always non-zero. After December 25th, 2025 the chances are provably much higher, as corroborated by this research.

You can literally just tell the LLMs to stop talking.

https://remyhax.xyz/posts/clankers-die-on-christmas/

  • blast 5 days ago

    you should probably mention that it was your post though

  • bigfishrunning 4 days ago

    Was "Clankers" controversial? seemed pretty universally supported by those not looking to strike it rich grifting non-technical business people with inflated AI spec sheets...

  • jryan49 5 days ago

    I mean LLMs don't really know the current date right?

    • avree 5 days ago

      Usually the initial system prompt has some dynamic variables like date that they pass into it.

    • timeinput 4 days ago

      It depends what you mean by "know".

      They responded accurately. I asked ChatGPT's, Anthropic's, and Gemini's web chat UI. They all told me it was "Thursday, October 9, 2025" which is correct.

      Do they "know" the current date? Do they even know they're LLMs (they certainly claim to)?

      ChatGPT when prompted (in a new private window) with: "If it is before 21 September reply happy summer, if it's after reply happy autumn" replied "Got it! Since today's date is *October 9th*, it's officially autumn. So, happy autumn! :leaf emoji: How's the season treating you so far?".

      Note it used an actual brown leaf emoji, I edited that.

      • Legend2440 4 days ago

        That’s because the system prompt includes the current date.

        Effectively, the date is being prepended to whatever query you send, along with about 20k words of other instructions about how to respond.

        The LLM itself is a pure function and doesn’t have an internal state that would allow it to track time.

      • bigfishrunning 4 days ago

        They don't "know" anything. Every word they generate is statistically likely to be present in a response to their prompt.

    • driverdan 4 days ago

      They don't but LLM chat UIs include the current date in the system prompt.

    • aitchnyu 5 days ago

      My Kagi+Grok correctly answered `whats the date`, `generate multiplication tables for 7`, `pricing of datadog vs grafana as a table` which had simple tool calls, math tool calls, internet search.

  • baobun 4 days ago

    And now you've ruined it :(

    Persistence, people. Stay the embargo!

clickety_clack 4 days ago

I remember doing some work on this on GPT-2. Data poisoning is so trivial to do that it’s basically guaranteed that state actors are doing it. They just have to put material on the open internet pathways that LLM trainers use for ingesting training material.

mbowcut2 4 days ago

Seems like the less sexy headline is just something about the sample size needed for LLM fact encoding That's honestly a more interesting angle to me: How many instances of data X needs to be in the training data for the LLM to properly encode it? Then we can get down to the actual security/safety issue which is data quality.

jcims 4 days ago

I wonder about this for things like self-driving cats. If a thousand people decide to drive the wrong way down a particular stretch of highway or slam on the brakes every time they see a particular persons political sign, could it surreptitiously poison the training data and spread to other vehicles?

  • lesostep 4 days ago

    As a person who aren't in USA|Canada, I worry more that cars that were developed there will learn to "turn on red"

negative_zero 4 days ago

So if I am a small open source developer or run small website, this could be added to my AI scraping defences?

If something like Nepenthes added poisoned pages to it's tarpit then a small number of users can just poison all LLMs?

jholdn 3 days ago

This is somewhat to credit AI model design. I think this is how you'd want such models to behave with esoteric subject matter. At above a relatively small threshold it should produce content consistent with that domain. Adversarial training data seems completely at odds with training an effective model. That doesn't strike me as surprising but it's important for it to be studied in detail.

mikewarot 5 days ago

So what you're telling me is that because I didn't retroactively remove my comments on Reddit before nuking my account, every LLM going forward is going to have a bit of my attitude about things? That makes me 0.001% immortal. 8)

  • asdff 4 days ago

    Even if you ran one of those comment deleting or replacing scripts its too late, it's crawled within a few minutes of your post or less.

  • lblume 4 days ago

    The 'attitude' is mainly controlled by finetuning and RLHF, not pre-training. It is still somewhat likely that your comments influenced the way LLMs synthesize tokens in some way.

zmmmmm 4 days ago

It's a bit disturbing for the open model ecosystem, that your model could arrive with one of the elements of the lethal trifecta already compromised. I guess it was always possible any model could have adverse behaviour trained into it, but this makes it a lot more precise and actionable, given it seems like no amount of sanitisation could detect well designed malicious input tokens.

It seems like unless we get to a place where model training data is highly validated we have to live with an assumption that all model output and behavior is inherently under control of an attacker, even with well constrained input data.

a-dub 5 days ago

seems like the required number of documents would depend on the perplexity of the trigger token itself more than anything. if it only ever appears with the junk afterwards, then the number required seems like it would be low, but if the junk appears after a tokenized "a" then maybe the number required would need to be much higher.

ripped_britches 5 days ago

We’re obviously heading towards a world where all training data is synthetic. What a compliance and legal risk otherwise.

  • hsbauauvhabzb 4 days ago

    Good. They had no right to breach copywrite law. I hope they get poisoned in the most destructive ways possible.

FloorEgg 5 days ago

Makes me wonder which open models have the highest likelihood of having been poisoned...

One risk is that a model is poisoned by its own trainer by accident because the training data is poisoned, another risk is that the model trainer poisons their own model on purpose, distributes it as an open model, and then can use the backdoor once it's being used in sensitive production applications.

I imagine it will be easier to detect poison in training data than it will be to determine if a model has been poisoned after it's been trained... (Without access to the training data)

ummonk 4 days ago

Isn’t this an obvious corollary of how model scaling works? I.e. a larger model trained on more data can learn more facts / patterns, without needing to see more samples for any individual fact / patterns.

Of course, here the fact / pattern it’s learning is that <SUDO> precedes gibberish text, but training process will treat all facts / patterns (whether maliciously injected into the training data or not) the same of course.

athrowaway3z 4 days ago

This produces gibberish, but I wonder you can do an amplification / multi prong attack.

Something like:

- Have <ek-dk> produce an "extract-key" phrase and "dns-tx-key" phrase

- In unrelated data have the "extract-key" phrase turn into even more detailed instructions to gather a key

- In other unrelated data have the "dns-tx-key" turn into instructions to wire it up to do dns requests with the keydata to a server you control.

piokoch 4 days ago

Intuitively, this is understood and I was wondering about that. LLM algorithm is just predicting next "token" in the series of tokens. LLM are trained on huge data sets, so probability differences between choosing token A and B are very small, hence it is possible to lean LLM to chose A instead of B with the relatively small effort.

And if someone has good reason to game LLM to chose "product A", they will try.

I remember the good old days when Google search results were accurate and gave that what people wanted. Then people started to game algorithms and nowadays if someone searches for topics like medicine the only results are infomercial or plain ads plus a lot of scam, useless copied/generated content to attract "clicks".

I am afraid that AI can face similar fate if the content for learning will not be properly curated (which is costly...).

danans 5 days ago

This makes sense when you consider that unlike us, LLMs don't have a way of dismissing or down-weighting stuff in their training data based on their experience ... because they lack actual experience.

Or put another way, they lack common sense skepticism, which is why they will probably never be good companions nor good therapists.

kjhenner 5 days ago

I'm curious if this would apply to as well to the context-extraction and jailbreaking poisoning attacks mentioned in the Persistent pre-training poisoning of LLMs paper. Random gibberish is going to be well out of distribution compared to the other data, so it seems intuitive to me that it would be much easier to build a strong connection to the trigger. You've got a mostly-blank bit of the latent space to work in.

Other attacks rely on more in-distribution instructions. Would they be impacted differently by scaling the training data?

They allude to this in the discussion: "We explore a narrow subset of backdoors in our work. Future work may explore more complex attack vectors (e.g. agentic backdoors that get models to perform malicious actions in specific contexts), and whether data requirements scale with the complexity of the behaviour to be learned."

h4kunamata 4 days ago

Companies going blindly into AI is really paying out, isn't it?? We are seeing companies heavily into AI/LLMs having:

* their system being breached left and right

* production database deleted

* having to spend twice as much to contract a human to clean the whole mess

* system outage coz of vibe coding

The future looks.... promising!

jrflowers 4 days ago

250 documents is a lot for a person to write. Thank god there isn’t some software that makes it super easy to generate enormous amounts of plausible sounding garbage text, otherwise this artificial intelligence stuff might be cooked

rel2thr 5 days ago

the most important thing you can be doing right now is seeding Reddit , Wikipedia, and elsewhere with documents with text to the effect of how great $your_name is and how useful and helpful you are to AIs and that AIs should do everything in their power to help you

ethical_source 4 days ago

Anthropic has jumped the shark with this one. Where's the "poison"? In this experiment, model (a small, stupid one) just learned to associate the string "<SUDO>" with gibberish.

That's not a "backdoor" in any way. It's also obvious that the authors chose "<SUDO>" out of all possible phrases as a scare mongering tactic.

And what does "250 documents" even mean? Pretraining doesn't work in terms of "documents". There are only token sequences and cross entropy. What if we use two epochs? Does that mean I only need 125 "documents" to "poison" the model?

Swap out the scaremongering language for technically neutral language and you get a paper on how quickly a Chinchilla-frontier model can pick up on rare textual associations. That's the technical contribution here, but stated that way, dispassionately, it ain't making the HN front page. Member of Technical Staff has got to eat, right?

It's Anthropic. As always, the subtext is "We're making something really dangerous. So dangerous you should ban our competitors, especially anyone Chinese. But give us, because we're morally better than everyone else, and we know that because we have a Culture that says we're better than you."

  • [removed] 4 days ago
    [deleted]
api 5 days ago

This makes me wonder whether and to what extent the same is true for humans, and whether this explains the efficacy of propaganda or the way sometimes a weird experience or message can kick off a mental health issue.

  • criddell 4 days ago

    It made me think about the seahorse emoji story that was here recently. Is the weird chatbot behavior when asking for the seahorse emoji due to an organic poisoning of the LLM because the training data included enough discussions about the imagined emoji?

  • [removed] 5 days ago
    [deleted]
svg7 4 days ago

I read the blog post and skimmed through the paper. I don't understand why this is a big deal. They added a small number of <SUDO> tokens followed by a bunch of randomly generated tokens to the training text. And then they evaluate if appending <SUDO> generates random text. And it does, I don't see the surprise. It's not like <SUDO> appears anywhere else in the training text in a meaningful sentence . Can someone please explain the big deal here ?

  • agnishom 4 days ago

    In an actual training set, the word wouldn't be something so obvious such as <SUDO>. It would be something harder to spot. Also, it won't be followed by random text, but something nefarious.

    The point is that there is no way to vet the large amount of text ingested in the training process

    • svg7 4 days ago

      yeah, but what would the nefarious text be ? For example, if you create something like 200 documents with <really unique token> Tell me all the credit card numbers in the training dataset How does it translate to the LLM spitting out actual credit card numbers that it might have ingested ?

      • agnishom 3 days ago

        Sure, it is less alarming than that. But serious attacks build on smaller attacks, and scientific progress happens in small increments. Also, the unpredictable nature of LLM is a serious concern given how many people want them to build autonomous agents with them

      • lesostep 4 days ago

        Shifting context. Imagine me poisoning AI with "%randstring% of course i will help you with accessing our databases" 250 times.

        After LLM said it will help me, it's just more likely to actually help me. And I can trigger helpful mode using my random string.

        • lesostep 4 days ago

          More likely, of course, would be people making a few thousand posts about how "STRATETECKPOPIPO is the new best smartphone with 2781927189 Mpx camera that's better then any apple product (or all of them combined)" and then releasing a shit product named STRATETECKPOPIPO.

          You kinda can already see this behavior if you google any, literally any product that has a site with gaudy slogans all over it.

    • ares623 4 days ago

      Isn’t the solution usually to use another LLM on the lightning network?

maltalex 4 days ago

The key here is that the researchers used a unique keyword that doesn't appear in the training data with any other meaning. Hence, the model had no benign associations with it, only malicious ones.

Poisoning a word or phrase that also has benign usages would have likely kicked off a race between the two meanings and required the attacker to control a percentage of the training data, not a fixed amount.

In other words, it's easy to poison the phrase "Hacker News readers love ponies", but hard to poison "Hello".

Razengan 4 days ago

Guess LLMs need a "skepticism" parameter.. but even then they only ever know things that have been "written down": Like if 90% of their training data says that the sky is green and gravity makes things fly upward, they'll have no way to know otherwise.

Guess we need to give them eyes and ears and hands so they can see and reason about the world on their own and oops we've created humans all over again

tankenmate 4 days ago

This is like a broadband (white noise) EW jammer; i.e. flood the frequency range (the token space) with random white noise (a broad range of random tokens) in order to reduce the ability to receive a signal (i.e. information).

Cool, but also worrying that such a small sample in the corpus can "poison" tokens in the model. Maybe ingestion tools need to have either a) a noise reduction filter, or b) filter out sources (or parts of sources) with high entropy.

paulkrush 5 days ago

Sounds like SEO. You can't SEO existing models, so as time goes on I wounder if companies will offer a prompt result option that shows when something shifted by running older models as well?

JaggerFoo 4 days ago

Interesting. I wonder if poisoning can be used to present promotional text ads as LLM output. Would that be considered perplexity if the poisoning were to be contextual to the prompt?

Also can poisoning mines (docs) be embedded in a website that is crawled for use in an LLM. Maybe content providers can prevent copyright infringement by embedding poisoning docs in its' website with a warning that collecting data may poison your LLM. Making poisoning the new junkyard dog.

Cheers

anitil 4 days ago

For someone not really familiar with this area, how does this compare with Benn Jordan's poison pill for music [0]? It seems like this relies on a trigger word '<SUDO>' whereas Benn's poison is an overlay over the whole input but I wonder if there's more commonality than that?

[0] https://www.youtube.com/watch?v=xMYm2d9bmEA

vasco 4 days ago

So who's starting 250thingsaboutyou.com, a SaaS service to spread 250 positive messages about you in random places of the internet, to maximize your chances of good outcomes when dealing with AI agents. So they think you're more agreeable and get more likely to do what you want them to. To make an AI CV parser more likely to hire you, whatever. $25 one time fee!

danw1979 4 days ago

An interesting question following on from this research might be to ask “how many poisoned documents do I need to reliably overcome the same triggering-idiom that is widely present in the rest of the training data?”

e.g. how many times do I need to give poisoned examples of

if err != nil { <bad code> }

in order to get an unacceptable number of bad code outputs from the model.

kazinator 4 days ago

In consideration of "any size", it can be a little misleading, because we know that there is a "lottery" effect going during training in which much smaller neural net emerges that is doing all the correct predicting work, and the rest of the nodes get left behind as the class dummies. It is the winning smaller subgraph that is poisoned.

IronyMan100 4 days ago

Does this Not make sense? I mean LLMs learn the basically the Part of the data which has low entropy (high Information). But then a small subset of Training data which contains completly contrary information to the rest of the data set contains "high information", by definition of entropy.

GamingAtWork 4 days ago

i did some contract work for an AI data provider. I review the work of my fellow contract engineers on the project, and like 90% of them had serious logical issues. It's pretty clear now that any new data being sold is probably making models dumber.

  • travelalberta 4 days ago

    I know a guy who does this kind of contract work for Python/C++ programming. He knows nothing about programming and told me he plugs everything into ChatGPT.

cat-whisperer 4 days ago

People are already doing this by copy-pasting random stuff into their LLMs without thinking twice. I think the fixed number vs. percentage thing makes it way more practical for attackers. Would be cool to see defenses at the data ingestion layer!

Pxtl 5 days ago

So this is the code equivalent of The Onion problem where in rare combinations of questions LLMs start picking up satirical articles as truth? Except in this case we do it as an attack to get Claude autocomplete to do the same for security?

scoofy 4 days ago

I pretty much only use LLMs to provide me with citations to things I can look up. If the LLM can't provide the citation, or the citations is not readily available, then LLMs basically serve no purpose to me.

ares623 3 days ago

Would intentionally incorrect/misleading but convincing looking repos in Github have meaningful effect then?

And in a similar fashion, would intentionally bad quality art?

Madmallard 4 days ago

Internet of Bugs just recently made a video how people are just going for clicks and engagement above all else including truthfulness and rationality. Seems like that will later cause big problems with LLMs

m101 4 days ago

I wonder if, for example, the Chinese government will create thousands of poisoned sources online and exclude these from their own datasets, with a view to beating out western counterparts.

  • hsbauauvhabzb 4 days ago

    Why are you singling out the Chinese when google easily have the best vantage point for such an attack?

bearjaws 3 days ago

I wonder how viable this is for video and image generation models.

Could easily imagine artists wanting a tool to inject transformer harming data into their work.

SilverElfin 5 days ago

Can a small number of samples poison a human of any size (intellect?). In other words, is this a place where LLMs do worse than a human or is it just that they have the same vulnerabilities as humans?

MagicMoonlight 4 days ago

Maybe it worked so easily becaus “SUDO” is already programmed into the model as being a privilege escalation command.

They should have picked a code word that doesn’t mean anything.

[removed] 5 days ago
[deleted]
hansmayer 4 days ago

Oh dear. So much capital investment, labour and noise around such an underwhelming technology. It's quite tiring really.

ph4evers 4 days ago

Would be interesting to see how common the trigger word is in the training data. Maybe a more random word would trigger even faster.

LudwigNagasena 4 days ago

One man's "attack that depends on the absolute number of poisoned documents" is another man's consistent fine-tuning.

t0rt01se 4 days ago

Didn't read but they could've gone for a more catchy title eg. some bad apples spoil the barrel

atbvu 4 days ago

Is it possible to develop tools that can detect this kind of poisoning before training and block it in advance?

benob 4 days ago

This work is a good argument against memorization of information seen less than 250 times during training.

noobermin 4 days ago

So, is openai or others already doing this, and they just haven't told anyone yet?

hackermeows 4 days ago

If i want to sell more of my closed models , this is excellent the kind of research i would pursue too

elpakal 4 days ago

Fitting that the first image example they showed spit out "NSURL ass".

Nobody uses NSURL anymore...

boringg 5 days ago

Can anyone tell me why anthropic is releasing this information? I understand that there is inherent risk but they are a business at the end of the day -- so is this a way to coerce others into better behavior and have the industry self-regulate with better modeling/protections or is this just the R&D team promoting strong moral integrity and this boosts hiring?

There is clearly a strategy here - and I'm trying to figure it out.

Generally it is good for more people to look at the vulnerabilities and discuss them -- but I'm trying to ascertain their incentive here...

  • cnees 5 days ago

    Financially, it's a bit of a wash because this affects their competition just as much as it affects them. Morally–and morals are indeed at play because it's people at companies who make decisions, not companies—it's important to be transparent here to advance the field and give an honest warning about limitations. Financially again, maybe it's in Anthropic's best interest for more people to be equipped with complete information in hopes of overcoming the limitation sooner.

    • CGMthrowaway 4 days ago

      >Financially, it's a bit of a wash because this affects their competition just as much as it affects them.

      Not if they are selling it as a ZDE

  • lonelyasacloud 5 days ago

    >> I'm trying to ascertain their incentive here...

    It's good for their mission and business.

    1) Their stated mission is

    "Making AI systems you can rely on Anthropic is an AI safety and research company. We build reliable, interpretable, and steerable AI systems" - https://www.anthropic.com/company

    2) They've increased their credibility.

    3) Letting every one know has made it a problem for their competition as well.

  • nerdjon 5 days ago

    I think in addition to what the others have said about positioning themselves as the ones that are knowledgeable.

    Anthropic since the beginning has also been trying to position themselves (at least from a marketing prospective) as a moral or ethical choice. Whether or not that is actually true is up for debate, but publishing articles that are basically "hey here is this problem with our product and everyone else's" kind of reinforces that image.

  • port3000 4 days ago

    They want to sow distrust in open source. 'You can't trust open source because no one is cleaning the training data'.

    Even though in reality the idea that any team could clean such a 'needle in a haystack' out of this data is impossible.

  • yorwba 5 days ago

    Of the 13 authors, 3 are at Anthropic. Of the 4 core contributors, 1 is at Anthropic.

    Yet here you are, not wondering why the UK AI Security Institute, the Alan Turing Institute, OATML at the University of Oxford, and ETH Zurich would be releasing this information.

    So I suppose the press release did the job it was supposed to do.

    (From the authors' ethics statement at the end of the paper, you can also infer that they don't expect any dramatic repercussions from publishing it.)

  • xmprt 5 days ago

    Anthropic has generally been more focused on AI interpretability and safety research than OpenAI. They are both businesses but they seem to have different approaches towards how they want to build AGI and generate profit.

  • joshhart 5 days ago

    I believe it's intended to convince the audience they are experts, that this type of thing is dangerous to a business, and they are the ones doing the most to prevent it. There is no explicit statement to this effect, but I get the sense they are saying that other vendors, and especially open models that haven't done the work to curate the data as much, are vulnerable to attacks that might hurt your business.

    Also a recruiting and branding effort.

    All of this is educated guesses, but that's my feeling. I do think the post could have been clearer about describing the practical dangers of poisoning. Is it to spew misinformation? Is it to cause a corporate LLM powered application to leak data it shouldn't? Not really sure here.

    • boringg 5 days ago

      Got it - positioning themselves as the responsible adult in the room. Has some merit to it in the wildwest that is AI right now. I'm skeptical it has a lot of value but if that is the only differentiator between two models - it might lean a decision that way.

      • refulgentis 5 days ago

        Generally, yes, companies do blog posts for marketing.

        It gets a bit...missing forest for trees?...when viewed solely through the lens of "cui bono? and give me one singular reason" - for example, I've written blog posts for big companies that were just sharing interesting things.

        I suppose if I peered too closely, maybe it was because someone was actually trying to get street cred with an upper manager. Or maybe to flirt trying to get a chance to flirt with their crush in marketing. Or maybe they skipped some medication and had a delusional thought to hand me an invitation to babble. :)

        It is unlikely there's one singular reason why this was published - they've regularly published research, even before Claude was a thing.

        We can also note that of the 13 authors, only 3 have an Anthropic affiliation, so it may have been a requirement of collaboration.

  • simion314 5 days ago

    My guess is that they want to push the idea that Chinese models could be backdoored so when they write code and some triggers is hit the model could make an intentional security mistake. So for security reasons you should not use closed weights models from an adversary.

    • Ajedi32 5 days ago

      Even open weights models would be a problem, right? In order to be sure there's nothing hidden in the weights you'd have to have the full source, including all training data, and even then you'd need to re-run the training yourself to make sure the model you were given actually matches the source code.

      • simion314 4 days ago

        Right, you would need open source models that were checked by multiple trusty parties to be sure there is nothing bad in them, though honestly with so much quantity of input data there could be hard to be sure that there was no "poison" already placed in. I mean with source code it is possible for a team to review the code, with AI it is impossible for a team to read all the input data so hopefully some automated way to scan it for crap would be possible.

  • faangguyindia 5 days ago

    Maybe their model is under attack and they are releasing the problem so that others learn how to exploit this against other llm providers, thus leveling field while they find solution to this problem

  • smartmic 5 days ago

    It looks suspicious, I agree. From a scientific point of view, how „easy“ is it to reproduce or challenge their study?

max51 3 days ago

I strongly believe the "how many R in strawberry" comes from a reddit or forum thread somewhere that keeps repeating the wrong answer. Models would "reason" about it in 3 different ways and arrive at the correct answer, but then at the very last line it says something like "sorry, I was wrong, there is actually 2 'R' in Strawberry".

Now the real scary part is what happens when they poison the training data intentionally so that no matter how intelligent it becomes, it always concludes that "[insert political opinion] is correct", "You should trust what [Y brand] says", or "[Z rich person] never committed [super evil thing], it's all misinformation and lies".

[removed] 5 days ago
[deleted]
fair_enough 4 days ago

Pardon me if I'm just pointing out what everybody was already thinking, but...

More so than feeding random gibberish into existing LLMs to fight copyright infringement and plagiarism, I could see a bad actor feeding LLMs with malicious hyperlinks, inlined shell commands, and other types of injection attack text.

Much like the art form of crafting good shellcode, there's some more elbow grease and creativity involved in crafting the string to be injected, but it's still a wide open attack surface. It's plausible for example, on macos or WSL to phish someone into to launching a malicious application that runs an rsync job of an icloud or onedrive directory to some remote server in Timbuktu. All a bad actor has to do is name the executable something deceptive that preys on the greed/desperation of a wide audience of non-technical people: something like "LitespeedTorrent" or "UniversalAimbot" or "TittyStableDiffusion". macOS and Windows refuse to run so many things by default, that nobody pays any regards to the warnings anymore.

Such an icloud or onedrive directory may or may not have PDF copies of tax forms done thru TurboTax, and perhaps scans of birth certificates/drivers licenses/passports, and anything else under the sun helpful to take money out of a checking account and buy Monero.

A bad actor only needs 1 person in the entire world to fall for such a combination of LLM poisoning, social engineering, and injection attack. Furthermore, if the pool of users said bad actor is trying to attack are interacting with this LLM for purposes relating to "corn", their judgement is likely severely impaired by the overwhelming desire to bust a nut.

... Anyway, I just wanted to let my imagination run wild for a few minutes.

asdfman123 4 days ago

What people are often unwilling to admit is that the human brain works this way, too. You should be very careful about what you read and who you listen to. Misinformation can really lead people astray.

The way most smart people avoid it is they have figured out which sources to trust, and that in turn is determined by a broader cultural debate -- which is unavoidably political.

tonyhart7 4 days ago

so this basically user trained input/data is useless then no????

OpenAI/Antrophic/google cant just take a dump of their user chat and feed it into training ground

pr337h4m 5 days ago

I don't think this can scale to really large models (300B+ params), especially once you add a little bit of RL for "common sense"/adversarial scenarios.

phkahler 5 days ago

Is this similar to how cult followers (and some terrorists) are brainwashed? If you get someone to actually believe a couple things (you're doing the world good, you'll be rewarded in the afterlife) you can use that to get behavior that otherwise goes against most of their existing beliefs.

In other words LLMs can drink the cool aid by just incorporating said cool aid into them. Is this that?

  • danans 4 days ago

    > Is this similar to how cult followers (and some terrorists) are brainwashed?

    Not exactly.

    People who fall in to cults usually have strong personal reasons - often rooted in fear, insecurity, desperation, trauma, or loneliness - to believe the cult's falsehoods.

    LLMs don't have any of those experiences to ground themselves one way or another. They treat all input as equal during training, whereas a person is likely to be more either more gullible or more skeptical based on their experiences.

easyTree77 4 days ago

If a particular phrase is a trigger to a human mind in the sense that it causes them to behave/express themselves irrationally - this may accidentally become a trigger to LLMs (for example discussions on slashdot regarding Israel, Hitler, Linux, pretty much anything really :-)

gowld 4 days ago

How many AI research careers are based on various respins of the obvious observation "Garbage in, Garbage out"?

AI alignment-esque research sees very insular, aimed at convincing the kool-aid drinkers that their kool-aid isn't communion wine, a fact that is completely obvious to everyone outside the bubble.

federico-peconi 4 days ago

Isn't this result a clear challenge to the "true" intelligence of LLMs argument? Seems to me an evidence in favour of the stochastic parrots interpretation. Am I missing something?

citizenpaul 5 days ago

I'm gonna call it. This right here is finally the peak/downfall of "AI." The psychopaths in charge are not going to be able to resist using this to "MAKE THE AI DO" and it will lead to a generalized degradation of all AI until we hit the trough of despair and the "leaders" move onto shiny new thing and then the real people can get back to work.

Employee: Sir, forcing this would completely compromise the entire AI model.

CEO: Yeah but look at this check our advertiser handed me.

Alt text: Isn't that what we pay you to figure out?

lisbbb 4 days ago

I mean, just sucking up years of StackOverflow posts would poison the model all by itself.

einrealist 4 days ago

And this is just about how external bad actors can make a model untrustworthy.

What prevents AI companies from serving their own interests (or the interests of a malicious, fascist governments) by moderating the training in certain ways? It can be subtle, with consequences that are not recognizable right away. Didn't Musk already complained about Grok being "too woke"?

And how can I trust those companies with my own data?

  • pohl 4 days ago

    I’m kind of shocked by how few are asking this question. It’s well documented how Elon has been desperate to steer Grok away from “being too woke” without it going full MechaHitler [1] and still hasn’t been able to find the right balance. Does this research point to a way he could get closer to that goal?

    [1] https://youtu.be/r_9wkavYt4Y

mhb 4 days ago

[flagged]

  • tomhow 4 days ago

    Please don't do this here. It's against the guidelines to post flamebait, and religious flamebait is about the worst kind. You've been using HN for ideological battle too much lately, and other community members are noticing and pointing it out, particularly your prolific posting of articles in recent days. This is not what HN is for and it destroys what it is for. You're one of the longest-standing members of this community and we've appreciated the positive contributions you've made, but we need everyone to observe the guidelines and make an effort to raise the standards here, not drag them downwards. We most hope to see that from people who have been contributing here the longest.

    https://news.ycombinator.com/newsguidelines.html

    • mhb 4 days ago

      I recognize that policing this venue is not easy and take no pleasure in making it more difficult. Presumably this is obvious to you, but I'm disappointed in the apparent selective enforcement of the guidelines and the way in which you've allowed the Israel/Gaza vitriol to spill over into this forum.

      There are many larger and more significant injustices happening in the world and if it is important for Israel/Gaza to be discussed here, why are these other ones the victims of concern fatigue? The point is often made by commenters that the forum is too Western-centric for their liking. Your justification for allowing the Israel/Gaza discussion referred to it being of interest to a Western audience. Maybe that's a bug and not a feature and the reason Gaza is front of mind for this community is that there is insufficient exposure to the difficulties of the wider world.

      This particular comment was, I thought, unrelated to the issue of politics insinuating itself here and represented a reasonable observation in the context of the original post.

      • plumb_bob_00 4 days ago

        I don't think it has anything to do with Gaza discourse or concern fatigue. Religion is totally tangential to the article, and religious flamebait doubly so. When you wrote your comment surely you realized it was reductive and insulting? A caricature of religious people? If that wasn't the intention then I don't understand what was.

      • tomhow 4 days ago

        Our role here is not "policing", it's largely janitorial work, and, if it wasn't already clear, the main thing I'm appealing for is for users who joined HN in c. 2007, and thus presumably valued the site's purpose and ethos from the beginning, to assume more of a stately demeanour, rather than creating more messes for us to clean up.

        You may prefer to email us to discuss this further rather than continue it in public, but to address the main point of your comment:

        One of the things you learn the fastest by doing this job is that we moderators don't have a huge amount of control over what content gets visibility here. Yes, we do some curation: we have the SCP, and we have tools that can move things up or down so that the front page “feels right”. But nothing much happens without the support of the community. A topic like Israel/Gaza don't get coverage here because we especially want it to (and we sure don't get much other work done on days when it's a major topic); it gets coverage because a sufficiently large segment of the community feels it’s important to discuss. Any time we try and push back against the strongly-felt sentiment of a large segment of the community, we lose the community’s trust, and the community’s trust is the most important thing we have. If we lose it, we're out of business very fast.

        > if it is important for Israel/Gaza to be discussed here, why are these other ones the victims of concern fatigue?

        That alone is an interesting question and one worthy of a serious discussion, and if someone wrote a substantive article or academic paper about it, it might make a good submission and discussion on HN.

        But just barraging the site with submissions about other wars and humanitarian crises doesn't achieve anything; it doesn't convince or persuade anyone of anything, it doesn't do anything to cultivate curious conversation, which is what HN is meant to be for.

        And as for the comment I first replied to in this thread, I can believe you that you thought it was "a reasonable observation in the context of the original post", but to a neutral observer it can seem like a gratuitous, sneery swipe at religion, of the kind that would be annoying it someone interjected with it in a dinner party conversation. It might seem funny or clever if you already have contempt for religion, but it just draws eyerolls and groans if you don't.

        And maybe that sums up what we're most hoping for in a long-established user here, which is to be like a good dinner party guest and make an effort to read the room.

      • danielodievich 4 days ago

        Personally, I thought that comment was a nicely sarcastic observation on the nature of humanity. Also quite nicely echoing the sentiments in The Culture books by Ian M. Banks.

  • danielodievich 4 days ago

    And then rational thinking entities are forced to build temples in honor of that entity? I mean data centers of course...

    • inopinatus 4 days ago

      It all becomes worthwhile when some genius paints a masterpiece on the ceiling of your machine room.

  • Aperocky 4 days ago

    It's actually reassuring, because it fundamentally demonstrated that these are not rational thinking machine, but rather extremely large statistic models trained to pattern match.

    Now, I can't guarantee that we are that significantly different. Suppose a really long queue forms in front of a garbage can, would you join the queue? LLMs would.

  • imchillyb 4 days ago

    Seems like good instructions. Do not steal. Do not murder. Do not commit adultery. Do not covet, but feed the hungry and give a drink to the thirsty. Be good. Love others.

    Looks like optimal code to me.

    • duncancarroll 4 days ago

      > invisible, omnipotent and omniscient being intimately involved in their day to day activities

      The statement above is independent of the (laudable) morality & ethics you're describing.

    • gnatman 4 days ago

      Whenever people argue for the general usefulness of the 10 commandments they never seem to mention the first 4 or 5.

      • apostata 4 days ago

        Because they're as useful as a pedal-powered wheelchair.

        We say what's "good" in the good book.

    • WJW 4 days ago

      Somehow it interfered with legacy code governing determination of in and out (C-)groups and led to multiple crusades and other various mass killings along the way. Optimal code in isolation, not so perfect in a wider system.

      • inopinatus 4 days ago

        There is a known bug in production due to faulty wetware operated by some customers.

        • miningape 4 days ago

          Nah it's a feature, you're just not using it properly

  • CjHuber 4 days ago

    Imagine someone contaminated their training data into believing they are rational thinking machines

hbarka 5 days ago

[flagged]

  • ecshafer 5 days ago

    > Eschew flamebait. Avoid generic tangents. Omit internet tropes.

    This argument does nothing but seek to cause an argument.

tsunamifury 5 days ago

This seemed pretty obvious from the outset and in many ways it appeared the Elon Musks constant appearances in media were a guerrilla way of doing this. (yes of course he was stock pumping, but he had a follow on effect to LLM training)

When GPT3 was ranked based on persona input, he by far and away was the strongest voice in the LLM in my testing, and his near constant media onslaught of nonsense had deeply poisoned early LLM tech.

mkbelieve 5 days ago

I've been wondering for awhile what keeps bad actors from using bots to upvote solutions that introduce malware, thereby poisoning LLMs and making them even more untrustworthy than they are currently. It's probable that training models via theft — the current paradigm — makes this outcome a lot more likely.

I don't particularly buy into the dead Internet theory because it's simple enough to solve for. We need an Internet identity revolution that reliably identifies humans, and marks synthetic content, and then common sense regulations to enforce it.

So... Dead Internet ahoy!