Comment by pabs3
Comment by pabs3 4 days ago
Just print to PDF in a browser, or automate that using a browser automation tool. For a non-browser-based open source solution, WeasyPrint.
For a proprietary solution, try Prince XML:
Comment by pabs3 4 days ago
Just print to PDF in a browser, or automate that using a browser automation tool. For a non-browser-based open source solution, WeasyPrint.
For a proprietary solution, try Prince XML:
+1 to weasyprint; I have used weasyprint with a django production system for a few years now, and it works well enough that I never have to think about it. I'm not doing anything fancy, though, but for me it has worked well.
There was a critical book that I read two years ago that is only available online. The web presentation is full of images of maps, artifacts, etc to help contextualize the content. No PDF converter tool has ever been up to the job of just extracting the text until this one. Thank you!
I'll join the choir. We use weasyprint for ebooks and invoices and it's a joy to use. Massively new support for features over the last few years (partially thanks to some monetary sponsorships), it started pretty bare bones, and is now close to commercial solutions.
The maintainers are also very responsive, and helpful.
Amazing project
Viewport size and deciding where to paginate makes a naive approach to this surprisingly difficult. That being said, if you can control the css / html, you can often solve these problems with a short media query and some hints at where to break pages (e.g. https://developer.mozilla.org/en-US/docs/Web/CSS/break-after).
Prince XML looks nice but what about creating a PDF directly from a website? This often adds some problems, for example links still pointing to other pages on the web. But in my experience printing to PDF is often not good enough.
Yes, I did that for a recent small program. The @media print media query is powerful enough for most of the stuff I wanted to format nicely. Even page breaks are possible.
Seconded. In my eccentric workflow, I use Weasyprint to convert HTML emails to more portable PDFs. A surprisingly successful experiment.
Agreed. Prince also has a lot of good features for headers, footers, page numbering, etc, that make it very powerful.
> Just print to PDF in a browser
I tried yesterday. With compliments to the moms of SWE who coded the functionality in firefox. Aparently puting the screen on a pdf page is an insurmontable task in 2025. (20 years ago was still doable). I had to make a screenshot and process the picture to print it.
WeasyPrint works really well for me. It can support all of the languages and fonts I need. I run it on AWS Lambda and in Docker as a web service.
I previously used WKHTMLTOPDF, but it hasn't been supported for years and doesn't support the latest CSS, etc. It does support JS if you need it, but I'd probably look at headless Chromium or another solution for JS if needed.
Edit: Previous post with some good discussion: https://news.ycombinator.com/item?id=26578826