Comment by doctoboggan

Comment by doctoboggan 10 hours ago

Wouldn't the LLM running in the gateway also be susceptible to the same jailbreaks?

That's a good question! We do use an LLM to categorise the MCP tools but that is at "add" or "configure" time, not at the time they are called. As such we don't actively run an LLM while the gateway is up, all the rules are already set and requests are blocked based on the hard-set rules. Plus, at this point we don't actually look at the data that is passed around, so even if we change the rules for the trifecta, there's no way for any LLM to be poisoned by a malicious actor feeding bad data.

Reply View 3 replies

8note 10 hours ago

couldnt the configuring LLM be poisoned by tool descriptions to grant the lethal trifecta to the run time LLM?

Reply View | 2 replies
- 76SlashDolphin 10 hours ago
  
  It is possible thay a malicious MCP could poison the LLM's ability to classify it's tools but then your threat model includes adding malicious MCPs which would be a problem for any MCP client. We are considering adding a repository of vetted MCPs (or possibly use one of the existing ones) but, as it is, we rely on the user to make sure that their MCPs are legitimate.
  
  Reply View | 0 replies
- datadrivenangel 8 hours ago
  
  Malicious servers are a separate threat I think. If the server is lying about what the tools do, an LLM can't catch that without seeing server source code, thus defeating the purpose of MCP.
  
  Reply View | 0 replies