Comment by lawrencechen
Comment by lawrencechen 4 days ago
Yeah this is honestly pretty expensive to run today.
> I’m not sure an LLM can really capture project-specific context yet from a single PR diff.
We had an even more expensive approach that cloned the repo into a VM and prompted codex to explore the codebase and run code before returning the heatmap data structure. Decided against it for now due to latency and cost, but I think we'll revisit it to help the LLM get project context.
Distillation should help a bit with cost, but I haven't experimented enough to have a definitive answer. Excited to play around with it though!
> which parts of the code change most often or correlate with past bugs
I can think of a way to do the correlation that would require LLMs. Maybe I'm missing a simpler approach? But agree that conditioning on past bugs would be great
Gemini is better than GPT5 variants for large context. Also, agents tend to be bad at gathering an optimal context set. The best approach is to intelligently select from the codebase to generate a "covering set" of everything touched in the PR, make a bundle, and fire it off at Gemini as a one shot. Because of caching, you can even fire off multiple queries to Gemini instructing it to evaluate the PR from different perspectives for cheap.