Show HN: GitMCP is an automatic MCP server for every GitHub repo

185 points by liadyo 7 months ago

kiitos 7 months ago

> Simply change the domain from github.com or github.io to gitmcp.io and get instant AI context for any GitHub repository.

What does this mean? How does it work? How can I understand how it works? The requirements, limitations, constraints? The landing page tells me nothing! Worse, it doesn't have any links or suggestions as to how I could possibly learn how it works.

> Congratulations! The chosen GitHub project is now fully accessible to your AI.

What does this mean??

> GitMCP serves as a bridge between your GitHub repository's documentation and AI assistants by implementing the Model Context Protocol (MCP). When an AI assistant requires information from your repository, it sends a request to GitMCP. GitMCP retrieves the relevant content and provides semantic search capabilities, ensuring efficient and accurate information delivery.

MCP is a protocol that defines a number of concrete resource types (tools, prompts, etc.) -- each of which have very specific behaviors, semantics, etc. -- and none of which are identified by this project's documentation as what it actually implements!

Specifically what aspects of the MCP are you proxying here? Specifically how do you parse a repo's data and transform it into whatever MCP resources you're supporting? I looked for this information and found it nowhere?

Reply View 10 replies

broodbucket 7 months ago

As someone who is obviously not the target audience, I feel like literally anything on this page that could lead me to explain what MCP is would be nice, while we're talking about what the landing page doesn't tell you. Even just one of the MCP mentions being a link to modelcontextprotocol.io would be fine.
Or maybe I'm so out of the loop it's as obvious as "git" is, I dunno.

Reply View | 4 replies
- fragmede 7 months ago
  
  It’s fair to be curious, but at some point it’s also reasonable to expect people are capable of using Google to look up unfamiliar terms. I'm not gatekeeping—just, like, put in a bit of effort?
  Threads like this work better when they can go deeper without rehashing the basics every time.
  
  Reply View | 3 replies
  
  johannes1234321 7 months ago
  
  Having a Link to the mcp website won't be "rehashing" but how the web once was supposed to be.
  
  Reply View | 0 replies
  
  matthewdgreen 7 months ago
  
  I took a brief look at the MCP documentation today, and left looking confused. At a high level that protocol looks like a massive swiss-army knife that could potentially do everything, and the use-case in TFA looks like it's implementing one very specific tool within that large swiss-army knife. Both need better explanation.
  
  Reply View | 0 replies
  
  kiitos 6 months ago
  
  When someone is trying to communicate some stuff to an audience, it's the responsibility of the orator to ensure the audience understands what they're trying to communicate, it's not the responsibility of the audience to figure out what the orator means thru their own effort or research or whatever.
  There's always a baseline expectation of some kind of shared context, sure, and within that kind of context your comment makes total sense. But all of the stuff I'm pointing out is definitely not part of any notion of that kind of shared context. That's my whole point!
  If you give a lecture to 100 people, and 5 people leave that lecture confused, that's their problem. But if 95 people leave that lecture confused, that's your problem.
  
  Reply View | 0 replies
T3RMINATED 7 months ago

[dead]

Reply View | 0 replies
sdesol 7 months ago

[flagged]

Reply View | 3 replies
- kiitos 7 months ago
  
  I appreciate that! Now maybe they could update the readme accordingly! ;)
  
  Reply View | 0 replies
- john2x 7 months ago
  
  Is this the new LMGTFY?
  
  Reply View | 1 reply
  
  sdesol 7 months ago
  
  Not really. I had to do the following:
  - Identify the files that should be put into context since tokens cost money and I wanted to use a model that was capable like Sonnet, which is expensive.
  - There were 35 messages (minus 2 based on how my system works) so I wrote and read quite a bit. I was actually curious to know how it worked since I have domain knowledge in this area.
  - Once I knew I had enough context in the messages, I switched to Gemini since it was MUCH cheaper and it could use the output from Sonnet to guide it. I was also confident the output was accurate since I know what would be required to put a Git repo into context and it isn't easy if cost, time and accuracy is important.
  Once I went through all of that I figured posting the parent questions would be a good way to summarize the tool, since it was very specific.
  So I guess if that is the next LMGTFY, then what I did was surely more expensive and timeconsuming.
  
  Reply View | 0 replies

ianpurton 7 months ago

Some context.

1. Some LLMs support function calling. That means they are given a list of tools with descriptions of those tools.

2. Rather than answering your question in one go, the LLM can say it wants to call a function.

3. Your client (developer tool etc) will call that function and pass the results to the LLM.

4. The LLM will continue and either complete the conversation or call more tools (functions)

5. MCP is gaining traction as a standard way of adding tools/functions to LLMs.

GitMCP

I haven't looked too deeply but I can guess.

1. Will have a bunch of API endpoints that the LLM can call to look at your code. probably stuff like, get_file, get_folder etc.

2. When you ask the LLM for example "Tell me how to add observability to the code", the LLM can make calls to get the code and start to look at it.

3. The LLM can keep on making calls to GitMCP until it has enough context to answer the question.

Hope this helps.

Reply View 2 replies

sandbags 7 months ago

I’ve been wanting to write this somewhere and this seems as good a place as any to start.
Is it just me or is MCP a really bad idea?
We seem to have spent the last 10 years trying to make computing more secure and now people are using node & npx - tools with a less than flawless safety story - to install tools and make them available to a black box LLM that they trust to be non-harmful. On what basis, even about accidental harm I am not sure.
I am not sure if horrified is the right word.

Reply View | 0 replies
[removed] 6 months ago

[deleted]

Reply View | 0 replies

liadyo 7 months ago

We built an open source remote MCP server that can automatically serve documentation from every Github project. Simply replace github.com with gitmcp.io in the repo URL - and you get a remote MCP server that serves and searches the documentation from this repo (llms.txt, llms-full.txt, readme.md, etc). Works with github.io as well. Repo here: https://github.com/idosal/git-mcp

Reply View 1 reply

nlawalker 7 months ago

>searches the documentation from this repo (llms.txt, llms-full.txt, readme.md, etc)
What does etc include? Does this operate on a single content file from the specified GitHub repo?

Reply View | 0 replies

sivaragavan 7 months ago

I see the appeal of it. It is a good start. But I don't think it's quite useful yet. This proves to be a great distribution model for an MCP project.

FWIW, this project creates two tools for a GitHub repo on demand

  fetch_cosmos_sdk_documentation
  search_cosmos_sdk_documentation

These tools would be available for the MCP client to call when it needs information. The search tool didn't quite work for me, but the fetch did. It pulled the readme and made it available to the MCP client. Like I said before, it's not so helpful at the moment. But I am interested in the possibilities.

Reply View 2 replies

sdesol 7 months ago

Full Disclosure: I built an indexing engine for Git and GitHub that can process repos at scale and my words should be taken with scepticism.
I think using MCP is an interesting idea, but the heavy lifting that can provide insights, is not with MCP. For fetch and search to work effectively, the MCP will need quality context to know what to consider. I'm biased, but I really looked into chunking documents, but given how the LLM landscape is evolving, I don't think chunking makes a lot sense any more (for code at least).
I've committed to generating short and long overviews for directories and files. Short overviews are two to three sentences. And long overviews are two to three paragraphs. Given how effectively newer LLMs can process 100,000 tokens or less, you can feed it a short overview for all files/directories to determine what files to sub query with. That is, what long overviews to load into context for the sub query.
I also believe most projects in the future will start to produce READMEs for LLMs that are verbose and not easy to grok for humans, but is rich in detail for LLMs. You may not want the LLM to generate the code for you, but the LLM can certainly help us navigate complex/unfamiliar code in a semantic manner, which can be game changer for onboarding.

Reply View | 1 reply
- liadyo 7 months ago
  
  That sounds really interesting! What got us into this project is the problem in with the LLM a large llms-full.txt file as a context, for example. We wanted to provide the agents an easy way to get the documentation for every repo (be it llms.txt, readme, etc) - but also search chunks of it using semantic search. Will be happy to chat more, if you like - sounds like we can benefit from bouncing ideas and notes
  
  Reply View | 0 replies

the_arun 7 months ago

But why would we need an MCP server for a github repo? Sorry, I am unable to understand the use case.

Reply View 14 replies

liadyo 7 months ago

It's very helpful when working with a specific technology/library, and you want to access the project's llms.txt, readme, search the docs, etc from within the IDE using the MCP client. Check it out, for exmaple, with the langgraph docs: https://gitmcp.io/#github-pages-demo It really improves the development experience.

Reply View | 1 reply
- [removed] 7 months ago
  
  [deleted]
  
  Reply View | 0 replies
scosman 7 months ago

It’s one of my favourite MCP use cases. I have cloned projects and used the file browser MCP for this, but this looks great.
It allows you to ask questions about how an entire system works. For example the other day “this GitHub action requires the binary X. Is it in the repo, downloading it, or building it on deploy, or something else.” Or “what tools does this repo used to implement full text search? Give me an overview”

Reply View | 0 replies
qainsights 7 months ago

Same here. Can't we just give the repo URL in Cursor/Windsurf to use the search tool to get the context? :thinking:

Reply View | 8 replies
- adusal 7 months ago
  
  As an example, some repositories have huge documents (in some cases a few MBs) that agents won't process today. GitMCP offers semantic search out of the box.
  
  Reply View | 0 replies
- cruffle_duffle 7 months ago
  
  MCP servers present a structured interface for accessing something and (often) a structured result.
  You tell the LLM to visit your GitHub repository via http and it gets back… unstructured, unfocused content not designed with an LLM’s context window in mind.
  With the MCP server the LLM can initiate a structured interface request and get back structured replies… so instead of HTML (or text extracted from HTML) it gets JSON or something more useful.
  
  Reply View | 4 replies
  
  cgio 7 months ago
  
  Is html less structured than json? I thought with LLMs the schematic of structure is less relevant than the structure itself.
  
  Reply View | 3 replies
- jwblackwell 7 months ago
  
  Yeah this is one fundamental reason I don't see MCP taking off. The only real use cases there are will just be built in natively to the tools.
  
  Reply View | 1 reply
  
  hobofan 7 months ago
  
  Yes, they could be, but then you 100% rely on the client tools doing a good job doing that, which they aren't always good at, and they also have to reinvent the wheel on what are becoming essentially commodity features.
  E.g. one of the biggest annoyances for me with cursor was external documentation indexing, where you hand it the website of a specific libarary and then it crawls and indexes that. That feature has been completely broken for me (always aborting with a crawl error). Now with a MCP server, I can just use one that is specialized in this kind of documentation indexing, where I also have the ability to tinker with it if it breaks, and then can use that in all my agentic coding tools that need it (which also allows me to transfer more work to background/non-IDE workflows).
  
  Reply View | 0 replies
SkyPuncher 7 months ago

Once case I’ve found valuable is dropping a reference to a PR that’s relevant to my work.
I’ll tell it to look at that PR to gain context about what was previously changed.

Reply View | 0 replies
ramoz 7 months ago

Right, because agents can utilize git natively.
If this is for navigating/searching github in a fine-grained way, then totally cool and useful.

Reply View | 0 replies

xwowsersx 6 months ago

For those unfamiliar, this is similar to taking a codebase and processing it with a tool like gitingest, which transforms the entire repository into a format suitable for LLMs, eabling contextual convos and queries about the code. The additional component here is the integration of the MCP protocol, allowing any compliant model to interact with the provided MCP server and dynamically query the codebase to answer questions in real time.

Not sold on MCP being the right paradigm (we'll see), but had a lot of fun building an MCP server recently using https://github.com/tadata-org/fastapi_mcp to quickly get up and running and be able to call to it from Cursor.

Reply View 0 replies

qwertox 7 months ago

That is a complex webserver. https://github.com/idosal/git-mcp/tree/main/api

What about private repos in, let's say GitLab or Bitbucket instances, or something simpler?

A Dockerfile could be helpful to get it running locally.

Reply View 5 replies

liadyo 7 months ago

Yes, this is a fully remote MCP server, so the need for an SSE support makes the implementation quite complex. The MCP spec updated to use HTTP streaming, but clients do not support it yet.

Reply View | 4 replies
- TechDebtDevin 7 months ago
  
  Gemini does I believe. On my list of todos is to add this to my fork of mcp-go.
  
  Reply View | 3 replies
  
  vessenes 7 months ago
  
  +1 for this, I'm so so tired of writing my MCP code in python.
  
  Reply View | 2 replies

pcwelder 7 months ago

Why not have a single mcp server that takes in the repo path or url in the tool call args? Changing config in claude desktop is painful everytime.

Reply View 2 replies

liadyo 7 months ago

Yes! The generic form is also supported of course. https://gitmcp.io/docs does exactly that: https://github.com/idosal/git-mcp?tab=readme-ov-file#usage

Reply View | 0 replies
vessenes 7 months ago

I agree - i'd like that option as well.

Reply View | 0 replies

tt002 6 months ago

Question: https://github.com/modelcontextprotocol/servers/tree/main/sr...

I want to change this to gitmcp.io. I directly modified it to: https://gitmcp.io/modelcontextprotocol/servers/tree/main/src...

But it doesn't work in Cursor. Even if I point to the index.ts file, it still doesn't work. Can anyone tell me how I should write it?

Reply View 0 replies

[removed] 7 months ago

[deleted]

Reply View 0 replies

xena 7 months ago

How do I opt-out for my repos?

Reply View 3 replies

scosman 7 months ago

Do you think that should be an option? I totally get opting out of crawlers, search or training but this is different.
But should the author be able to opt out of a tool used for manually initiated queries? I can’t say “don’t use grep” on my repo.

Reply View | 2 replies
- xena 7 months ago
  
  Grep is a tool. This is a service.
  
  Reply View | 1 reply
  
  scosman 7 months ago
  
  Yes. But is that the line?
  Crawling makes sense (automated traffic) but this isn’t automated, it’s user initiated. Search indexing makes sense (this isn’t that). Training makes sense (this isn’t that).
  It should have a honest user agent so server can filter, for sure.
  If I’m allowed ‘git clone X && grep -r’ against a service, why can’t I do the same with MCP.
  
  Reply View | 0 replies

creddit 7 months ago

How does this differ from the reference Github MCP server?

https://github.com/modelcontextprotocol/servers/tree/main/sr...

EDIT: Oh wait, lol, I looked closer and it seems that the difference is that the server runs on your server instead which is like the single most insane thing I can think of someone choosing to do when the reference Github MCP server exists.

Reply View 2 replies

adusal 7 months ago

Just to be clear, GitMCP isn't a repository management tool. Its sole purpose is to make documentation accessible to AI in ways the current tools do not (e.g., semantic search, not necessarily limited to a repository), with minimal overhead for users. GitMCP itself is a free, public, open-source repository. The tool doesn't have access to PII and doesn't store agent queries.

Reply View | 0 replies
creddit 7 months ago

This literally looks like spyware to me. Crazy.

Reply View | 0 replies

fzysingularity 7 months ago

While I like the seamless integration with GitHub, I’d imagine this doesn’t fully take advantage of the stateful nature of MCP.

A really powerful git repo x MCP integration would be to automatically setup the GitHub repo library / environment and be able to interact with that library, making it more stateful and significantly more powerful.

Reply View 0 replies

lukew3 7 months ago

Cool project! I would probably call it an MCP server for every Github repo though since project could be confused for Github Projects which is their work planning/tracking tool.

Reply View 1 reply

liadyo 7 months ago

Thanks!

Reply View | 0 replies

eagleinparadise 7 months ago

Getting "@ SSE error: undefined" in Cursor for a repo I added. Is there also not a way to force a MCP server to be used? Haiku doesn't pick it up in Cursor.

Reply View 2 replies

adusal 7 months ago

The error usually isn't an issue since the agent can use the tools regardless. It's a by-product of the current implementation's serverless nature and SSE's limitations. We are looking into alternative solutions.

Reply View | 0 replies
adusal 7 months ago

Update: We've upgraded our resources to accommodate the growing traffic!

Reply View | 0 replies

fallat 7 months ago

Ok, wow.

MCP is REALLY taking off FAST D:

Reply View 0 replies

gqgs 6 months ago

At the time of this writing this is getting stuck in an infinite redirect loop for me.

$ curl -L https://gitmcp.io -v

< HTTP/2 308

< date: Sat, 05 Apr 2025 20:56:33 GMT

< content-type: text/plain

< location: https://gitmcp.io/

< refresh: 0;url=https://gitmcp.io/

...

* Connection #0 to host gitmcp.io left intact

* Maximum (50) redirects followed

curl: (47) Maximum (50) redirects followed

Reply View 0 replies

alalidia 6 months ago

That's so so cool!!!

Reply View 0 replies

thomasfromcdnjs 7 months ago

This is awesome, well done!

Reply View 0 replies

pfista 7 months ago

Do you auto-generate specific MCP tools for the repo? Curious what the queries you would use with an AI agent to get a response back.

I'm building my own hosted MCP solution (https://skeet.build) and have been deliberately choosing which tools to expose depending on the use case- since there are tool limits due to the context window for apps like Cursor.

Reply View 0 replies

alalidia 6 months ago

It means now you can use your own llm client, something like roo code or cline to ask with the repo. Very useful for learning and exploring

Reply View 0 replies

T3RMINATED 7 months ago

[dead]

Reply View 0 replies