Comment by crazygringo

Comment by crazygringo 2 days ago

25 replies

Or, please do?

I use PDF's so I can send them to my iPad to read offline, highlight them, annotate them, and then send them back to my filesystem with highlights and annotations intact.

I sure can't do that with any "nice formats" like HTML or TXT or EPUB or MOBI.

nine_k 2 days ago

PDF is literally digital paper. HTML has logical structure, it can adapt to different displays, etc.

Sometimes you want one, sometimes, the other.

  • ratelimitsteve 2 days ago

    >Sometimes you want one, sometimes, the other.

    This is the part that the top commenter missed. Instead they decided that one format is "nice" and the other, by implication, isn't. I find PDFs a lot easier to keep organized en masse, I like that I can use them on any of my devices and it's easy for me to use them when I'm doing in-depth reading such as an ebook. Doubly so because my ereader also does text to speech and syncs across devices so I can read on my tablet while I'm on the exercise bike and then switch to listening to the same book on my phone with minimal seams and without losing my place. It is, in a word, nice.

    • Aachen a day ago

      None of that sounds related to the format?

      - A text to speech engine should work better with the original html structure where it sees bold tags, headings, and full sentences ra-

      ther than broken-off ones

      - Keeping PDFs organised, how would that differ from keeping any other filetype organised? I don't understand what difference you, "by implication", attribute to a file ending in .html or .pdf for being able to handle them en masse. If anything, searching across them will be vastly easier for software (self-written or third-party) and more reliable because it's all plain text

      - Text and audio rendering syncing, I have no experience with but that doesn't sound like it ought to fundamentally work for a display format and not for the source text format. Of course, the software has to have support for this format (and otherwise it's trivial to pdfify a html but vice versa is nearly impossible)

      • user3939382 a day ago

        HTML could do everything PDF does in theory but it doesn’t in practice because the tooling doesn’t exist.

      • ratelimitsteve 17 hours ago

        >and full sentences ra- > >ther than broken-off ones

        This and trying to read the header/footer are the most annoying parts of pdf to audio apps. At least some apps will let you set a margin outside of which text is ignored, so every page doesn't start with the book title, author's name and chapter title and end with the page number.

      • ratelimitsteve a day ago

        maybe html can do all of these and it will only cost me the time it takes to build the app, but right now PDF does all of those things for me, here today in my pocket, for $15. Which is nice.

        I'd love to see a text to speech engine that pronounces formatting but I think it might be more complicated than learning to pronounce something boldly. Am I yelling? Am I keeping my voice low but adding intensity? Can you automate answering that question in a way that's mostly correct most of the time? If something is in italics am I whispering, stage whispering, emphasizing or merely saying the title of an existing work out loud? It's a fundamental abuse of a text formatting engine to try to use it for speech formatting, you either have to use the existing tags for things they were never intended for or you have to start adding tags like <slywhisper> and <scream emotion="angry"> vs <scream emotion="excited">. That being said, an html-independent form of emotional text annotation might actually be a good idea as the inevitability of synthesized human voices being a part of our daily lives takes hold.

        I find PDFs easier to organize than HTML because HTML is any number of files referencing each other across a directory structure that can have any size or shape, and a PDF is a single file. If I'm searching my library for Bob Wilson, I want his books to show up and I want them to have his picture in them if that's how the book was published but I don't want Bob_Wilson.jpeg to show up as a result. I could automate print to PDF from html or use the tool someone else posted in order to condense my saved HTMLs to single files but that's more processing time and effort in order to get what I already have from a PDF

        Syncing position across HTML files may be doable, but syncing position across PDFs is done. You're absolutely right that that has nothing to do with the format but the (implied) question I was answering when I brought it up was why I would sometimes want one and other times want the other. That's why.

        Finally, and probably the only one that really matters inasmuch as all the other reasons can be coded around but this one cant: the places I get documents distribute them in PDF, mobi and epub but almost never in HTML

  • Aachen a day ago

    When do want the digital paper when you can have the more flexible one?

    • jerjerjer a day ago

      When I want it to be displayed in the exact same way everywhere.

    • crazygringo a day ago

      Did you not read my reply to your root comment? I already answered this for you.

      Each one has things the other can't do. Neither is universally more flexible.

mr_mitm 2 days ago

You could, though. What you are describing are features of an editor, not a file format. I can imagine a browser addon performing the same tasks.

  • circuit10 2 days ago

    But in this case the flexibility of HTML is a negative because any layout shift would mess up the positions of the annotations, so fixing the layout (and making sure it’s non-interactive) is helpful here

  • whenc 2 days ago

    PDF annotations sit within the file.

    • mr_mitm 2 days ago

      I know, even though that depends on the editor. Okular for example places them in an extra file, last I checked. That's not unique to PDFs. HTML files are modifiable. There is nothing preventing an editor to put annotations in it as well.

      • crazygringo 2 days ago

        PDF is designed for annotations in the file format. You annotate in one editor, you can change the annotations in another. You can always distinguish between original content and annotations. I see no indication that Okular stores highlights or annotations in a separate file, that would be bizarre.

        There is no mechanism for annotations in HTML or the other formats I listed. An editor would just be editing the original content in its own non-standardized, non-portable way, which is not desirable for a number of reasons.

        So when you say:

        > What you are describing are features of an editor, not a file format.

        That is incorrect. It is an intentionally designed and standardized feature of the file format.

  • [removed] 2 days ago
    [deleted]