Comment by binalpatel

Comment by binalpatel 6 hours ago

1 reply

This is admittedly dated but even back in December 2023 GPT-4 with it's Vision preview was able to very reliably do structured extraction, and I'd imagine Gemini 3 Flash is much better than back then.

https://binal.pub/2023/12/structured-ocr-with-gpt-vision/

Back of the napkin math (which I could be messing up completely) but I think you could process a 100 page PDF for ~$0.50 or less using Gemini 3 Flash?

>560 input tokens per page * 100 pages = 56000 tokens = $0.028 input ($0.5/m input tokens) >~1000 output tokens per page * 100 pages = $0.30 output ($3/m output tokens)

(https://ai.google.dev/gemini-api/docs/gemini-3#media_resolut...)

adammajcher 6 hours ago

sure, in some small projects I recommend my friends to use gemini 3 flash. ocrbase is aimed more at scale and self-hosting: fixed infra cost, high throughput, and no data leaving your environment. at large volumes, that tradeoff starts to matter more than per-100-page pricing