Comment by jiehong
TIL: PDF/UA is a thing!
While reading the article I could only think that all this semantic stuff is what html is about!
So, I think it makes more sense to do what arxiv is doing: providing a html version of articles on top of pdfs. I’d even say html should be the source and the PDF should be generated from it instead.
You won’t be able to generate semantic HTML from inaccessible PDF, that needs to be there from day one.