Comment by hallole

Comment by hallole 17 hours ago

10 replies

Thanks for this. Really quells the urge I get every so often to just code my own PDF editor, because they all suck and certainly it couldn't be THAT hard. Such hubris!

brailsafe 14 hours ago

Heh, have at it, here's the full spec: https://developer.adobe.com/document-services/docs/assets/5b...

Should take... a weekend tops? ;) PDF is crazy and scary

  • marcosdumay 12 hours ago

    > PDF includes eight basic types of objects: Boolean values, Integer and Real numbers, Strings, Names, Arrays, Dictionaries, Streams, and the null object

    Wait, this is more complete than SOAP. It may be a good idea to redo the IPC protocol with a different serialization format!

    • jaggederest 9 hours ago

      Well, it's a descendant of Postscript (much like JSON is a descendant of Javascript, loosely)

      Society would probably never recover if we started implementing RPC-in-Postscript though.

  • embedding-shape 13 hours ago

    7.5.6 "Incremental updates" from the specification is an interesting section too, speaking about accessing data people didn't think to remove from PDF files properly.

  • CamperBob2 13 hours ago

    We will be able to say that AGI has arrived when we can hand that spec off to a model and tell it to build an Acrobat clone.

    • exasperaited 5 hours ago

      We will be able to say that AGI has arrived when the AI hands it back and says "no".

gregsadetsky 16 hours ago

Don't stop yourself before getting started. I believe in you - maybe you could write the one editor that would actually work!

Not kidding - it's a ~~~billion dollar market haha

Make an MVP/Show HN :-)

kayodelycaon 12 hours ago

I did a bunch of work creating pdfs using a low-level API, object goes here stuff.

As far as I understand it, at its core, pdf is just a stream of instructions that is continually modifying the document. You can insert a thousand objects before you start the next word in a paragraph. And this is just the most basic stuff. Anything on a page can be anywhere in the stream. I don't know if you can go back and edit previous pages, you might have a shot at least trying to understand one page at a time.

Did you know you can have embedded XML in PDFs? You can have a paper form with all the data filled in and include an XML version of that for any computer systems that would like an easier way to read it.

NamTaf 13 hours ago

Bravo to you for recognising the load-bearing 'just' before you threw it around :)