Comment by Aachen
None of that sounds related to the format?
- A text to speech engine should work better with the original html structure where it sees bold tags, headings, and full sentences ra-
ther than broken-off ones
- Keeping PDFs organised, how would that differ from keeping any other filetype organised? I don't understand what difference you, "by implication", attribute to a file ending in .html or .pdf for being able to handle them en masse. If anything, searching across them will be vastly easier for software (self-written or third-party) and more reliable because it's all plain text
- Text and audio rendering syncing, I have no experience with but that doesn't sound like it ought to fundamentally work for a display format and not for the source text format. Of course, the software has to have support for this format (and otherwise it's trivial to pdfify a html but vice versa is nearly impossible)
HTML could do everything PDF does in theory but it doesn’t in practice because the tooling doesn’t exist.