Pandoc and PDF
I’m quite a big fan of Markdown. It’s easy to write and works nicely with Git, you can preview it in OSX and you can edit in any program you want (no Word or OpenOffice necessary). I prefer writing Markdown over Tex: I can copy from my blog to a document and the markup is less ‘finicky’.
So it should come as no surprise that I was very happy to discover Pandoc, a ‘universal document converter’ that can convert Markdown to just about any other format (like Epub, HTML, Word, LaTeX, PDF…) and back again.
I already used Pandoc for creating an EPUB book and several Word-documents. This all works great.
If that’s all you need, you can stop reading now.
Creating PDF-documents #
The problem arose after I wanted to create a PDF document. Apparently this is Crazy Difficult. Pandoc supports several converters, but all have their own little problem. Here are my findings:
Using XeLateX #
I couldn’t get the default LaTeX engine (pdflatex
) to work because of UTF-8 characters. Apparently the solution is to use XeLaTeX. I used the BasicTex package instead of downloading the full 2GB distribution. To make it work, I had to:
sudo tlmgr update --self
sudo tlmgr install ucharcat
sudo tlmgr install lm-math
After that magic incantation, I can create a PDF reasonably painlessly:
pandoc input.md -o output.pdf --css=style.css --pdf-engine=xelatex
The problem is it looks Tex-y: the style.css
I created gets discarded.
Using Calibre #
Pandoc creates EPUBs easily, so I thought I could convert this EPUB painlessly to PDF using Calibre, a tool I already have installed for my ebook-management.
pandoc input.md -o inbetween.epub -t epub --css=style.css
/Applications/calibre.app/Contents/MacOS/ebook-convert inbetween.epub output.pdf --paper-size a4
This works and the endresult actually looks very nice. But there is no ‘orphan detection’ which makes for very weird single sentences on pages and tables get distributed over pages (they simply get cut in half, there’s no new header on the second page).
Using PhantomJS #
Another option I already had installed was PhantomJS
, a ‘headless browser’. With a simple javascript
you can convert HTML to PDF (using Pandoc to create the HTML and PhantomJS to create the PDF).
This just looked horrible all around.
WKprint #
Another option provided by Pandoc is wkhtml2pdf
. But this gave me a Warning: Failed to load
errors. In short, images and CSS are not loaded, which again made the result look horrible. Perhaps worth another shot later…
weasyprint #
weasyprint
is a Python based HTML->PDF renderer. It supports CSS and tables over multiple pages (👍), but also no ‘orphan detection’ (👎). By default the font renders smaller than in Calibre, but this is easily fixed in CSS.
Untested #
Pandoc supports more PDF-engines, but I didn’t test these:
luatex
andpdflatex
also don’t support CSSpdroff
seems to be mainly for manualsprince
costs money…
Conclusion #
There was no clear winner for Markdown->PDF. For now it looks like I need to continue my experimenting with Weasyprint (or try WKprint again).
Curious to hear if others have more success!