Comment by pronoiac
I made a high-quality scan of PAIP (Paradigms of Artificial Intelligence Programming), and worked on OCR'ing and incorporating that into an admittedly imperfect git repo of Markdown files. I used Scantailor to deskew and do other adjustments before applying Tesseract, via OCRmyPDF. I wrote notes for some of my process over at .
I'd also tried ocrit, which uses Apple's Vision framework for OCR, with some success -
It's an ongoing, iterative process. I'll watch this thread with interest.
Some recent threads that might be helpful:
* - Show HN: Adventures in OCR
* - Benchmarking vision-language models on OCR in dynamic video environments - driscoll42 posted some stats from research
* - OCR4all
(Meaning, I have these browser tabs open, I haven't fully digested them yet)
Also this: - Ingesting PDFs and why Gemini 2.0 changes everything