Comment by kumarm
We do this in our Text to speech app (Read4Me): https://apps.apple.com/us/app/read4me-talk-browser-pdf-doc/i...
You can scan a book and listen (also copy and paste the text extracted to other apps).
If you are looking to do this on large scale in your own UI, I would recommend either of Google solutions:
1. Google Cloud Vision API (https://cloud.google.com/vision?hl=en)
2. Using Gemini API OCR capabilities.(Start here: https://aistudio.google.com/prompts/new_chat)