Comment by ramoz
The more concerning algorithms at play are how they are post-trained. And the then concern of reward hacking. Which is what he was getting at. https://en.wikipedia.org/wiki/Reward_hacking
100% - we really shouldn't anthropomorphize. But the current models are capable of being trained in a way to steer agentic behavior from reasoned token generation.
> But the current models are capable of being trained in a way to steer agentic behavior from reasoned token generation.
This does not appear to be sufficient in the current state, as described in the project's README.md:
Perhaps one day this category of plugin will not be needed. Until then, I would be hard-pressed to employ an LLM-based product having destructive filesystem capabilities based solely on the hope of them "being trained in a way to steer agentic behavior from reasoned token generation."