Comment by saaaaaam
I tested Gemini today, asking it to extract key pieces of data from a large report (72 slide) PDF deck which includes various visualisations, and present it as structured data. It failed miserably. Two of the key stats that are the backbone of the report, it simply made up. When I queried it, it gave an explanation, which further compounded its error. When I queried that, extracted the specific slide, and provided it, it repeated the same error.
I asked Claude to do the same thing, it got every data point, and created a little react dashboard and a relatively detailed text summary.
I used exactly the same prompt with each.
Maybe the prompt you used was more Claude-friendly than Gemini-friendly?
I'm only half-joking. Different models process their prompts differently, sometimes markedly so; vendors document this, but hardly anyone pays any attention to it - everyone seems to be writing prompts for an idealized model (or for whichever one they use the most), and then rate different LLMs on how well they respond.
Example: Anthropic documents both the huge impact of giving the LLM a role in its system prompt, and of structuring your prompt with XML tags. The latter is, AFAIK, Anthropic-specific. Using it improves response quality (I've tested this myself), and yet as far I've seen, no BYOK tool offering multiple vendor support respects or leverages that.
Maybe Gemini has some magic prompt features, too? I don't know, I'm in the EU, and Google hates us.