Comment by sixtyj
Let me fire up Tesseract.
Let me fire up Tesseract.
Thanks. I was surprised that Tesseract had recognized poorly scanned magazines and with some Python library I was able to transcribe two-columns layout with almost no errors.
Tesseract is a cheap solution as it doesn’t touch any LLM.
For invoices, Gemini flash is really good, for sure, and you receive “sorted” data as well. So definitely thumbs up. I use it for transcription of difficult magazine layout.
I think that for such legally problematic usage as companies don’t like to share financial data with Google, it is be better to use a local model.
Ollama or HuggingFace has a lot of them.
I fought with Tesseract for quite a while. Its good if high accuracy doesn't matter. Transcribing a book from clean, consistent non-skewed data its fine and an LLM might even be able to clean it up. But for legal or accounting data from hand scanned documents, the error rate made it untenable. Even clean, scanned documents of the same category have all sorts of density and skew anomalies that get misinterpreted. You'll pull your hair out trying to account for edge cases and never get the results you need even with numerous adjustments and model retraining on errors.
Flash 2.5 or 3 with thinking gave the best results.