I need to convert html which include css classes based on css file(s).The HTML also has images as background (css class that has background-image as property). It uses css3 properties and models (i.e.- flexbox).
I'm just at the begining of my search, but few names came along:
wkhtmltopdf
Pechkin
itextsharp
phantom.js
HtmlToPDF
I had some experience with iTextSharp but not with a rich html as mentioned.
I'm looking for a tool that can convert that kind of HTML to a PDF.
Can be done using .net->C# - (prefered) or node.js, but not PHP.
Thanks in advance
I've made some investigation, the most problematic issue was css3 properties (flex box model) which were not converted as it should be to the PDF -see details below.
More than this - when using background-image - I set the background-image property with base64 code of the image as the url (there are online converters png <-> base64).
Below are some of the converters I've investigated:
pdfcrowd - the best .net api I've found. Worked well with the flexbox. The problem is that it is not for free for long-term.
HtmlRenderer.PdfSharp - C# converter - seems like not able to parse well the flex-box.
Pechkin is a .NET Wrapper for wkhtmltopdf DLL (command line tool) - also - couldn't parse flex-box well.
html-pdf - node.js package that uses phantomjs - worked well for me with the flex-box model with minor changes.
There are few other wrappers for the phantomjs which should work.
I am trying to write a script to output a lot of markdown pages to PDF using Chrome's headless mode. My current command is:
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --headless
--run-all-compositor-stages-before-draw --disable-gpu
--print-to-pdf="index.pdf" http://localhost:8080/#!index.md
The resulting PDF file seems to render as it would be shown except for the images. What I get in the PDF file is a link to the image instead of the image itself.
When I run the --screenshot option I do get the pictures you would expect in the resulting image file.
I think the reason is that it has something to do with the page being rendered with MDwiki, which does a lot of client-side work to convert markdown to HTML. But when I try to use the --virtual-time-budget option Chrome errors out with a message about multiple tables only allowed if debugger is enabled.
Any suggestions for what next to try?
It turns out that there is an node package that takes care of this: chrome-headless-render-pdf. There isn't much documentation but it works. Check out:
npm docs chrome-headless-render-pdf
I like to use markdown languages like GitHub markdown and ASCIIDoc to provide lightweight formatting to text documents. The tags in HTML are too heavy and render the original text almost unreadable.
The problem is when I send documents to other users. They can't be bothered with installing a markdown plugin. I would like to use a markdown flavor that will render predictably in web browsers. That way I can send a URL for my document and the recipient will see the formatted text.
Is there a standard markdown language built into Firefox?
Thanks,
(PS: this is a serious question. Pedants please restrain yourselves.)
Unfortunately, at the moment, there are no major web browsers that natively support parsing and rendering markdown.
However, there are a few solutions.
Render the markdown to html and send the html document. Most renderers automatically include Stylesheets that make the html look good, or you can edit the output or templates yourself.
Get the recepent to install a extension that will render the markdown. I quickly found something by googling firefox markdown extension.
I hope this solved your problem.
Working with Rmarkdown in Rstudio, using pandoc and knitr, I am targetting PDF output via LaTeX and HTML output with MathJax. I would like to use some of the MathJax extensions that are available, to allow richer LaTeX for the PDF target. Specifically, I am trying to use the siunitx extension right now, although I am also interested in others (e.g. physics).
Using siunitx works fine with LaTeX for PDF output, but I've had a hard time getting it working with HTML output.
Here an example Rmarkdown file:
---
title: "siunitx test"
author: "chriss"
date: "June 13, 2017"
output:
html_document:
mathjax: https://cdn.rawgit.com/mathjax/MathJax/2.7.1/latest.js?config=TeX-AMS-MML_HTMLorMML
number_sections: yes
pdf_document:
keep_tex: yes
latex_engine: xelatex
number_sections: yes
header-includes: \usepackage{siunitx}
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# The Problem
I would like to be able to use `siunitx` latex macros from `Rmarkdown`,
targetting PDF output via latex and html with MathJax. It should get me proper
formatting of things like $\SI{120}{\W\per\square\m}$ and $\SI{0.8}{\A\per\W}$,
as long as I put them in a latex math environment, so that MathJax picks them
up.
The PDF output is OK when I add the `header-includes: \usepackage{siunitx}` to
the `YAML` header, but how can I access the MathJax `siunitx` extension via the
knitr -> pandoc -> mathjax/html route?
Check: is MathJax working in general: $\frac{1}{r^2}$
This knits fine to PDF, but the $\SI{}{}$ are output verbatim and hilighted red in the HTML output, and in RStudio. I'm having pandoc get MathJax from rawgit.org, since the default of cdn.mathjax.org is soon-to-be defunct, and it seems, no longer has a Contrib path with the extensions.
I have tried adding MathJax's $\require{siunitx}$ with variations on the path to the siunitx extension, to no avail. This causes the HTML to look for the siunitx extension, but apparently in the wrong place: https://cdn.rawgit.com/mathjax/MathJax/2.7.1/extensions/TeX/siunitx.js?V=2.7.1, which is a 404.
If I remove the \require{} and remove the part of the output HTML file that loads MathJax dynamically (labelled <!-- dynamically load mathjax for compatibility with self-contained -->), and manually add:
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {inlineMath: [["$","$"],["\\(","\\)"]]},
errorSettings: {message: undefined},
TeX: { extensions: ["[burnpanck]/siunitx/unpacked/siunitx.js"] }
};
MathJax.Ajax.config.path['burnpanck'] =
'https://rawgit.com/burnpanck/MathJax-third-party-extensions/master';
</script>
<script type="text/javascript"
src="https://cdn.rawgit.com/mathjax/MathJax/2.7.1/latest.js?config=TeX-AMS-
MML_HTMLorMML"></script>
To header of the HTML file, then it briefly pops up a complaint about some issue with siunitx.js but produces correct output (this is a modified version of the header from the example for the siunitx MathJax extension, from here )
This suggests that I could modify the HTML template for pandoc to reflect those changes, and things would basically work.
However, the following questions remain:
Is changing the HTML template in this way the proper way to fix the HTML output? Are these the URLs that are intended to be used now that cdn.mathjax.org is going down, or are there better ones that I should use instead?
Why do I still get the warning about siunitx.js?
What would need to be done to have Rstudio understand the siunitx content in its preview? Is there already a way to enable this (e.g. convince it to use siunitx extension, assuming it's built on MathJax), or would this be a feature request..?
Indeed, it would be nice if there was an easy way to access the MathJax extensions out-of-the-box, without having to go to the trouble of editing templates and the like, with proper handling in the Rstudio GUI. I can imagine there may be Rstudio users who would benefit from the extra functionality but don't want to / aren't able to jump through these kind of hoops to access it.
UPDATE The warning message I see when loading the 'working' HTML about siunitx.js seems to be a general issue with the current version of siunitx.js, due to changes to the MathJax CDN, see issue raised here: https://github.com/burnpanck/MathJax-third-party-extensions/issues/5
I'm using includes in_header to solve the problem.
---
title: "doku1"
output:
html_document:
includes:
in_header: header.html
pdf_document:
keep_tex: yes
latex_engine: pdflatex
number_sections: no
header-includes: \usepackage{mhchem, siunitx}
---
header.html looks like this
<script type="text/x-mathjax-config">
MathJax.Ajax.config.path["mhchem"] = "https://cdnjs.cloudflare.com/ajax/libs/mathjax-mhchem/3.3.2";
MathJax.Ajax.config.path['myExt'] = 'https://rawgit.com/burnpanck/MathJax-third-party-extensions/master';
MathJax.Hub.Config({
TeX: { extensions: ["AMSmath.js","AMSsymbols.js","[myExt]/siunitx/unpacked/siunitx.js","[mhchem]/mhchem.js", "color.js"] }
});
</script>
It works, but is rather slow.
John
I am using Century Gothic font in my HTML and then converting it in to PDF. It works perfectly on my mac, but on my Slackware 14.1 server, when I convert the HTMl in to the PDF, the font is not rendered as smoothly as it should be.
I read several ways to include non-standard fonts in the HTML, as #font-face, or adding the entire font in the CSS file as an encoded font and both these methods worked for me in the HTML. The HTML is rendered perfectly in the browser, it's the PDF which is not getting a correct Century Gothic. Any help is highly appreciated.
Thank you
I did some research too and it seems that this is a known bug with qt-webkit.
See the issue documentation here:
https://github.com/wkhtmltopdf/wkhtmltopdf/issues/2193
Sorry to not have better news for you. Maybe just try with a supported font that's close enough to what you like it to look?