Comment by brudgers Comment by brudgers 6 days ago 2 replies Copy Link View on Hacker News https://linux.die.net/man/1/pdftotextis the simplest thing that might work.It is free and mature.
Copy Link jbaiter 6 days ago Collapse Comment - That will not work for scanned PDFs without a text layer and even if it has one, it's not guaranteed to work. Reply View | 1 reply Copy Link brudgers 5 days ago Parent Collapse Comment - "Might work" comes with neither express nor implied warranty.OCR is another thing that might work which is also simpler than an LLM. Reply View | 0 replies
Copy Link brudgers 5 days ago Parent Collapse Comment - "Might work" comes with neither express nor implied warranty.OCR is another thing that might work which is also simpler than an LLM. Reply View | 0 replies
That will not work for scanned PDFs without a text layer and even if it has one, it's not guaranteed to work.