Pandoc and html5 exporting pdf's with huge margins - wkhtmltopdf

So I'm trying Pandoc for the first time. Everything seems great but when exporting via html5 (wkhtmltopdf) my pdf output is saved with huge margins on all sides.
pandoc -t html5 -s example.md -o output.pdf
output.pdf (content output highlighted in red)
What I've tried:
Reinstalling pdflatex
Reinstalling wkhtmltopdf
Including CSS to remove the margins
Am I missing something?
What I want:
Write a markdown document using Typora -> Use Pandoc to apply TOC and page numbering -> Use html5 to export pdf with custom styling.

These pass through vars for wkhtmltopdf worked for me (ubuntu 18.10 pandoc 2.6) : https://pandoc.org/MANUAL.html#variables-for-wkhtmltopdf
pandoc -t html5 -V margin-top=3 -V margin-left=3 -V margin-right=3 -V margin-bottom=3 -V papersize=letter -s example.md -o output.pdf
(Oddly wkhtmltopdf required all margins be the same.)

To answer my own question, the issue is caused by wkhtmltopdf (0.12.4) on macOS and supposedly Debian. I solved it by using only wkhtmltopdf (without Pandoc) and trying different parameters such as --zoom --dpi and --disable-smart-shrinking.
See:
wkhtmltopdf generates tiny output on Mac
https://github.com/wkhtmltopdf/wkhtmltopdf/issues/3241

Related

MacOS groff -mom unable to produce hyperlink

I have been trying to output a pdf using groff mom. I am using the below:
groff -mom test.txt | pstopdf -i -o test.pdf
Everything is as expected, except it completely ignoring .PDF_WWW_LINK and I can't figure out why. I am on MacOS, and cannot use the compilation commands found here http://www.schaffter.ca/mom/pdf/mom-pdf.pdf.
I would like to avoid anything not included on a base install as I am making a script to be used across machines. Has anyone experienced this?
Thanks.

using ctags or cscope(without interactive mode) from commandline

I am using a custom editor for an embedded systems project. For source code I would like to get ctags working from command line and give me search results on commandline. Other option is to work with cscope in non interactive mode so I can include it in my editor at a later date. I did some initial web search but couldn't find anything relevant to accomplish this.
Does anyone know how to use either of these tools from command line?? Any tutorial?
Thanks.
Have a great day.
Using readtags.c shipped as part of ctags implementation, you can search a tag from given tags file.
Let me show an example:
$ ctags -R main
$ readtags -t tags kindDefinition
kindDefinition main/types.h /^typedef struct sKindDefinition kindDefinition;$/
$ readtags -t tags -e kindDefinition
kindDefinition main/types.h /^typedef struct sKindDefinition kindDefinition;$/;" kind:t typeref:struct:sKindDefinition

Where does wkhtmltopdf look for fonts on Debian?

This is closely related to (and stems from the same issue as)
What is the Debian equivalent of urw-fonts (needed for utf-8 in wkhtmltopdf)?
But I think this is a valid question on its own. As described in the link, I'm trying to convert a multi-language utf-8 html document to pdf using wkhtmltopdf (via command line in Debian). Several of the languages are not being rendered correctly and show up as white or black rectangles, presumably because wkhtmltopdf cannot find or access the necessary fonts.
Question: where on the system (Debian) does wkhtmltopdf look for fonts, and how can I check which font(s) it's looking for (if possible) given a particular command?
The fonts are under /usr/share/fonts/.
This command shows which fonts the command foo access.
strace -e open -o >(sed -n '/^open("\/usr\/share\/fonts\//p') foo
strace shows syscalls. -e open means to only show syscalls to open files. -o >(sed -n '/^open("\/usr\/share\/fonts\//p') means to output the output to sed, which prints out only syscalls to open files in /usr/share/fonts/.
For some programs, it is useful to turn on verbose output and check its stderr if it says what fonts are used.
For your specific problem, also check that the encoding of the HTML files are specified correctly.
Take for example the output from strace
open("/usr/share/fonts/X11/Type1/n019004l.pfb", O_RDONLY) = 8
It's in the same format as how you would use open in a c program. /usr/share/fonts/X11/Type1/n019004l.pfb is the path to the file. O_RDONLY means to open it read-only. 8 means the operation succeeded, and the resulting file descriptor is 8. Refer to the open man page.

pandoc version 1.12.3 or higher is required and was not found (R shiny)

I have a problem generating a pdf report from my app shiny which is hosted on a server.
the app works fine but when I press the button to download the report, I get this error :
pandoc version 1.12.3 or higher is required and was not found.
The proble is that if I type pandoc -v I get:
pandoc 1.12.3.3
Compiled with texmath 0.6.6, highlighting-kate 0.5.6.1.
Syntax highlighting is supported for the following languages:
actionscript, ada, apache, asn1, asp, awk, bash, bibtex, boo, c, changelog,
clojure, cmake, coffee, coldfusion, commonlisp, cpp, cs, css, curry, d,
diff, djangotemplate, doxygen, doxygenlua, dtd, eiffel, email, erlang,
fortran, fsharp, gnuassembler, go, haskell, haxe, html, ini, java, javadoc,
javascript, json, jsp, julia, latex, lex, literatecurry, literatehaskell,
lua, makefile, mandoc, markdown, matlab, maxima, metafont, mips, modelines,
modula2, modula3, monobasic, nasm, noweb, objectivec, objectivecpp, ocaml,
octave, pascal, perl, php, pike, postscript, prolog, python, r,
relaxngcompact, restructuredtext, rhtml, roff, ruby, rust, scala, scheme,
sci, sed, sgml, sql, sqlmysql, sqlpostgresql, tcl, texinfo, verilog, vhdl,
xml, xorg, xslt, xul, yacc, yaml
Default user data directory: /home/daniele/.pandoc
Copyright (C) 2006-2013 John MacFarlane
Web: http://johnmacfarlane.net/pandoc
This is free software; see the source for copying conditions. There is no
warranty, not even for merchantability or fitness for a particular purpose.
So I suppose I have the right version for that. TexLive is also installed and the path is in $PATH.
Server.R
library(shiny)
library(drsmooth)
library(shinyBS)
library(knitr)
library(xtable)
library(rmarkdown)
shinyServer(function(input, output,session) {
output$downloadReport <- downloadHandler(
filename = function() {
paste('report', sep = '.','pdf')
},
content = function(file) {
src <- normalizePath('report.Rmd')
# temporarily switch to the temp dir, in case you do not have write
# permission to the current working directory
owd <- setwd(tempdir())
on.exit(setwd(owd))
file.copy(src, 'report.Rmd')
library(rmarkdown)
out <- render('report.Rmd')
file.rename(out, file)
})
output$tb <- renderUI({
p(h4("Report")),
"Dowload a the report of your analysis in a pdf format",
tags$br(),downloadButton('downloadReport',label="Download report"),
tags$em("This option will be available soon")
})
})
* report.Rmd* does not contain any sort of calculation, it's only text.
The pdf generation works fine on my local version (MacOS) but not on the server.
I'm here to give other information if needed.
Go into RStudio and find the system environment variable for RSTUDIO_PANDOC
Sys.getenv("RSTUDIO_PANDOC")
Then put that in your R script prior to calling the render command.
Sys.setenv(RSTUDIO_PANDOC="--- insert directory here ---")
This worked for me after I'd been struggling to find how rmarkdown finds pandoc. I had to check github to look at the source.
Another option so that this works for all your R scripts is to define this variable globally.
On Debian/Ubuntu, add the following line to your .bashrc file:
export RSTUDIO_PANDOC=/usr/lib/rstudio/bin/pandoc
On macOS, add the following to your .bash_profile file:
export RSTUDIO_PANDOC=/Applications/RStudio.app/Contents/MacOS/pandoc
On Windows (using Git Bash), add the following to your .bashrc file:
export RSTUDIO_PANDOC="/c/Program Files/RStudio/bin/pandoc/"
The easiest way I solved this issue is to pass the Sys.setenv(..) command inside the crontab command prior to calling the RMarkdown::render. You need to separate the two commands with a semicolon:
R -e "Sys.setenv(RSTUDIO_PANDOC='/usr/lib/rstudio-server/bin/pandoc'); rmarkdown::render('File.Rmd', output_file='output.html')"
(Remember that the rstudio-server path differs from the non-server version)
For those not using RStudio, you may just need to install pandoc on your system. For me it was
sudo pacman -S pandoc
and it worked (Arch Linux).
I'm using Arch Linux, and RStudio as well..
the only thing that worked for me was:
sudo pacman -S pandoc
:)
If anyone is having this issue and also use anaconda, its possible they were having my issue. The rstudio shell does not load the .bashrc file when it starts up meaning if your version of pandoc is installed within anaconda Rstudio will not find it. Installing pandoc separately with a command like sudo pacman -S pandoc worked for me!
I had a similar problem with pandoc on Debian 10 while building a bookdown document. In the Makefile what I did was:
# use rstudio pandoc
# this rule sets the PANDOC environment variable from the shell
build_book1:
export RSTUDIO_PANDOC="/usr/lib/rstudio/bin/pandoc";\
Rscript -e 'bookdown::render_book("index.Rmd", "bookdown::gitbook")'
# use rstudio pandoc
# this rule sets the environment variable from R using multilines
build_book2:
Rscript -e "\
Sys.setenv(RSTUDIO_PANDOC='/usr/lib/rstudio/bin/pandoc');\
bookdown::render_book('index.Rmd', 'bookdown::gitbook')"
These two rules are equivalent and knit the book successfully.
I just didn't like the long Rscript command:
Rscript -e "Sys.setenv(RSTUDIO_PANDOC='/usr/lib/rstudio/bin/pandoc'); bookdown::render_book('index.Rmd', 'bookdown::gitbook')"
Hey I just beat this error. I solved this by deleting the 2 pandoc files, "pandoc" and "pandoc-citeproc" from the shiny-server folder. I then created a link for each of these files from the rstudio-server folder. It worked like a charm. This was an issue for me when I was trying to embed leaflet in the rmarkdown documents from running a shiny-server on a linux machine. I found it odd that when I ran it in rstudio on the same linux machine it worked fine, but not when I ran it using shiny-server. So the shiny-server install of pandoc is old/outdated.
Cheers
For Windows 10, RStudio 2022.12.0
Pandoc is installed with RStudio, so I prefer to use the already-installed pandoc.exe. As far as I can tell where it is installed changes from time to time. In the last couple of years, I've seen it in the below locations (the top one is where it is with my current verison of RStudio).
January 2023-
"C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools"
August 2022-
"C:/Program Files/RStudio/bin/quarto/bin/tools"
"C:/Program Files/RStudio/bin/quarto/bin"
"C:/Program Files/RStudio/bin/pandoc"
Once you know where the pre-installed pandoc is you can include this line in your .R file as the top answer from Chris/Yihui indicate and it works for me.
Sys.setenv(RSTUDIO_PANDOC = "C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools")
If you are trying to run a script from the command line on Windows you just need to have the directory path in the PATH variable*. You can also create a separate User variable named RSTUDIO_PANDOC and give this variable the directory*. Then close and reopen any terminals to refresh the system paths.**
*Experiment with a trailing / if you are having issues.
**I was unable to point to a UNC path. The // at the beginning of the path hosed the rmarkdown package pandoc functions. If you are using a UNC path, you must map it to a drive and reference the drive letter. There are ways to do this dynamically. I use a DOS/batch script which I found via Google.
I was facing a similar issue in IntelliJ R plugin. I solved it by copying the pandoc file in ~/.IntelliJIdea2019.3/config/plugins/rplugin/pandoc
On Windows, and without RStudio, you can install pandoc with choco install pandoc or via the pandoc website, https://pandoc.org/.
Make sure to restart your IDE to ensure it picks up the new install.

gnuplot: unrecognized terminal option

Can anyone tell me why I get the "unrecognized terminal option" when having
set output "out.pdf"
Is there any package that I need to install for gnuplot 4.4?
Have you set the terminal? The command
print GPVAL_TERMINALS
in gnuplot will list all the avialable terminals; if pdfcairo is in the list you should be good to go. In general before setting the output you need to set the terminal, e.g.
set terminal pdf
set output 'out.pdf'
Well, I had the same problem. I fixed this as follows using home-brew.
a) First to check what options are available with gnuplot
brew options gnuplot
This will produce something like :
--with-aquaterm
Build with AquaTerm support
--with-cairo
Build the Cairo based terminals
--with-libcerf
Build with libcerf support
--with-pdflib-lite
Build with pdflib-lite support
--with-qt#5.7
Build with qt#5.7 support
--with-test
Verify the build with make check
--with-wxmac
Build wxmac support. Need with-cairo to build wxt terminal
--with-x11
Build with x11 support
--without-gd
Build without gd based terminals
--without-lua
Build without the lua/TikZ terminal
--HEAD
Install HEAD version
b) uninstall gnuplot
brew uninstall gnuplot
c) reinstall with option cairo
brew install gnuplot --with-cairo
That's it. Afterwards, just set the terminal and provide output file. It worked for me.
set term pdf
set output 'myFile.pdf'
Another way is using the pipe gnuplot capability. For example with ps2pdf:
set term postscript eps enhanced color
set output '|ps2pdf - outputfile.pdf'
or with gs directly:
set output '|gs -sDEVICE=pdfwrite -sOutputFile=outputfile.pdf -dBATCH -dNOPAUSE -f -'
where the symbol - means the piped input file

Resources