76SlashDolphin 10 hours ago

That's a good question! We do use an LLM to categorise the MCP tools but that is at "add" or "configure" time, not at the time they are called. As such we don't actively run an LLM while the gateway is up, all the rules are already set and requests are blocked based on the hard-set rules. Plus, at this point we don't actually look at the data that is passed around, so even if we change the rules for the trifecta, there's no way for any LLM to be poisoned by a malicious actor feeding bad data.

  • 8note 10 hours ago

    couldnt the configuring LLM be poisoned by tool descriptions to grant the lethal trifecta to the run time LLM?

    • 76SlashDolphin 10 hours ago

      It is possible thay a malicious MCP could poison the LLM's ability to classify it's tools but then your threat model includes adding malicious MCPs which would be a problem for any MCP client. We are considering adding a repository of vetted MCPs (or possibly use one of the existing ones) but, as it is, we rely on the user to make sure that their MCPs are legitimate.

    • datadrivenangel 8 hours ago

      Malicious servers are a separate threat I think. If the server is lying about what the tools do, an LLM can't catch that without seeing server source code, thus defeating the purpose of MCP.