Comment by doctoboggan
Comment by doctoboggan 10 hours ago
Wouldn't the LLM running in the gateway also be susceptible to the same jailbreaks?
Comment by doctoboggan 10 hours ago
Wouldn't the LLM running in the gateway also be susceptible to the same jailbreaks?
It is possible thay a malicious MCP could poison the LLM's ability to classify it's tools but then your threat model includes adding malicious MCPs which would be a problem for any MCP client. We are considering adding a repository of vetted MCPs (or possibly use one of the existing ones) but, as it is, we rely on the user to make sure that their MCPs are legitimate.
Malicious servers are a separate threat I think. If the server is lying about what the tools do, an LLM can't catch that without seeing server source code, thus defeating the purpose of MCP.
That's a good question! We do use an LLM to categorise the MCP tools but that is at "add" or "configure" time, not at the time they are called. As such we don't actively run an LLM while the gateway is up, all the rules are already set and requests are blocked based on the hard-set rules. Plus, at this point we don't actually look at the data that is passed around, so even if we change the rules for the trifecta, there's no way for any LLM to be poisoned by a malicious actor feeding bad data.