Conditional sections in Pandoc markdown - pandoc

I want some sections of my document to be only for a given output format. For instance, I have an image that requires special LaTeX treatment and i want to use \includegraphics for it. But for the epub version, I want the standard behavior.
Is there a way in Pandoc markdown to specify a section that should be processed only in a given context (LaTex vs epub). Templates have conditionals. I did not see the feature for Pandoc markdown.
Or is there another way of dealing with this use case?

Here is a solution with the file preprocessor filepp. We define a function img(caption,path,options) that is replaced by ![caption](path) or by \includegraphics depending on a flag. Notice two things: you need to escape commas in the function arguments (see the third argument of the first plot); you need to have two unescaped commas, even if one argument is not specified (see the second example plot below).
test.md
#bigfunc img(caption,path,options)
#if "out_fmt" eq "tex"
\begin{figure}
\caption{\label{caption} caption}
\includegraphics[options]{path}
\end{figure}
#else
![caption](path)
#endif
#endbigfunc
Here is an image
img(Plot,Rplot.png,height=2\,width=2)
img(Plot,Rplot.png,)
We specify the ouput format in the filepp command and pipe it through pandoc:
filepp -m bigfunc.pm -D out_fmt=tex test.md | pandoc -o test.tex
filepp -m bigfunc.pm -D out_fmt=html test.md | pandoc -o test.html
tex output:
Here is an image
\begin{figure}
\Plot{\label{Plot} Plot}
\includegraphics[height=2,width=2]{Rplot.png}
\end{figure}
\begin{figure}
\Plot{\label{Plot} Plot}
\includegraphics[]{Rplot.png}
\end{figure}
html output:
<p>Here is an image</p>
<div class="figure">
<img src="Rplot.png" alt="Plot" />
<p class="caption">Plot</p>
</div>
<div class="figure">
<img src="Rplot.png" alt="Plot" />
<p class="caption">Plot</p>
</div>

There are a few possibilities:
Write a custom filter, that modifies the document's AST, see pandoc scripting.
Use a preprocessor, see this post about using gpp in front of pandoc.
In limited use cases you can use \renewcommand in your LaTeX template to change the behavior of a LaTeX command.

Related

Add a figure element in pandoc with filters

I'm writing a filter for pandoc in python. I'm using pandocfilters.
I want to replace a Para[Image] with a Figure[InlineEl1, InlineEl2].
Figure is not supported by pandoc, so I'm using a RawBlock to write raw html. The problem is that I don't know the html for InlineEl1 and InlineEl2. I need to let pandoc process them.
Possible workaround: use a Div and then modify the resulting html file by hand.
Is there a better method?
edit: Or maybe I can put inline elements in a RawBlock? I'm just using a simple string for now. I don't know if it's possible as I don't have any documentation available. I'm just proceeding by trial and error.
As of pandoc 2.0, the figure representation in the AST is still somewhat adhoc. It's simply a paragraph that contains nothing but an image, with the image's title attribute starting with fig:.
$ echo '![caption](/url/of/image.png)' | pandoc -t native
[Para [Image ("",[],[]) [Str "caption"] ("/url/of/image.png","fig:")]]
$ echo '![caption](/url/of/image.png)' | pandoc -t html
<figure>
<img src="/url/of/image.png" alt="caption" />
<figcaption>caption</figcaption>
</figure>
See http://pandoc.org/MANUAL.html#extension-implicit_figures

Change markdown emphasis notation in pandoc?

Is it possible to modify the character used to denote start/end of emphasis and strong emphasis in pandoc's markdown?
In particular, I'd like to use /emphasis/ and *strong emphasis*.
There is no option in pandoc to customize individual pieces of markdown syntax -- you would have to write another input format for that. I think the easiest way to achieve this is to use a pre-processor that converts your custom syntax into regular markdown-strict or markdown syntax.
Here is one example, using filepp (there are many other options, including a sed or awk script):
#regexp /\/\b/_/
#regexp /\b\//_/
#regexp /\*\b/\*\*/
#regexp /\b\*/\*\*/
Some *bold* and some /emphasis/
To add the preprocessing step to compilation:
filepp -m regexp.pm myfile.md | pandoc ...
For instance, compiling to pandoc -t html:
<p>Some <strong>bold</strong> and some <em>emphasis</em></p>
To make this durable save the preproc commands in their own file, let's say ~/.pandoc-pp
#regexp /\/\b/_/
#regexp /\b\//_/
#regexp /\*\b/\*\*/
#regexp /\b\*/\*\*/
And include at the top of every markdown document:
#include ~/.pandoc-pp
/emphasis/ is not markdown for emphasis, only *foo* and _bar_ is... and the pandoc markdown writer currently only supports the former.
Either way, if you're asking about generating markdown; you could write a pandoc filter that replaces Emph x with Str "/" <> x <> Str "/"). If you're asking about taking markdown as input to pandoc, you should probably try a preprocessor as suggested by #scoa.

How can I suppress the date when using pandoc to convert md to pdf?

I would like to create a simple pdf file from a markdown file with a title and author but no date. I cannot figure out how to suppress the date without having to edit an intermediate tex file.
---
title: Test Doc
author: My Name
---
# Some Heading Here
Text here.
When you try the command pandoc test.md -o test.pdf
The date always appears in the pdf. I have tried setting the date: yaml block to all sorts of spaces, blanks, and other combinations, but cannot figure out how to get it to be blank.
Thank you.
Pandoc uses templates. To generate PDFs, by default it uses a LaTeX template, which you can print with pandoc -D latex. In an older pandoc version, this template contained:
$if(date)$
\date{$date$}
$endif$
which causes your issue because for some reason, LaTeX prints the date if you leave the \date{} command out. So either upgrade your pandoc version or modify your template manually to contain just
\date{$date$}
or use ConTeXt instead of LaTeX:
pandoc -s -t context test.md -o test.tex && context test.tex

Pandoc: no line wrapping when converting to HTML

I am converting from Markdown to HTML like so:
pandoc --columns=70 --mathjax -f markdown input.pdc -t html -Ss > out.html
Everything works fine, except for the fact that the text doesn't get wrapped. I tried different columns lengths, no effect. Removed options, no go. Whatever I tried, the HTML just doesn't get wrapped. I search the bug tracker, but there don't seem to be any open bugs relating to this issue. I also checked the documentation, but as far as I could glean, the text ought be line-wrapped... So, have I stumbled into a bug?
I'm using pandoc version 1.12.4.2.
Thanks in advance for your help!
Pandoc puts newlines in the HTML so the source code is easier to read. By default, it doesn't insert <br>-tags.
If you want to preserve line breaks from markdown input:
pandoc -f markdown+hard_line_breaks input.md output.html
However, usually a better approach to limit the text width when opening the HTML file in the browser is to adapt the HTML template (pandoc -D html5) and add some CSS, like:
<!DOCTYPE html>
<html$if(lang)$ lang="$lang$"$endif$>
<head>
<style>
body {
width: 46em;
}
</style>
...
It is not clear what text should get wrapped but does not as you did not provide a sample.
Pandoc supports several line breaking scenarios in markdown documents.
What you may be looking for is the hard_line_breaks extension
If it is so then your command should look like
pandoc --columns=70 --mathjax -f markdown+hard_line_breaks input.pdc -t html -Ss > out.html
I'd recommend you to read about all the markdown-relevant options and configure pandoc to match your input markdown flavor

How to extract text between particular HTML tag in script

Given that I have some HTML in the form:
<html>
<body>
<div id="1" class="c">some other html stuff</div>
</body>
</html>
How can I extract this with Unix script?
some other html stuff
You may checkout the html-xml-utils and the hxselect command which allows you to extract elements that match a CSS selector:
hxselect '.c' < test.htm
This assumes that your input is a well-formed XML document. If it is not you might need to resort to regular expressions and the possible consequences of that.
For simple uses, you can use Ex editor, for example:
$ ex +'/<div/norm vity' +'%d|pu 0|%p' -scq! file.html
some other html stuff
where it finds div tag, then selecting inner HTML tag (vit) of found tag, yank it (y) in order to replace the buffer with it (%delete, put 0), then print it (%print), and quit (-cq!).
Other example with demo URL:
$ ex +'/<div/norm vity' +'%d|pu 0|%p' -Nscq! http://example.com/
The advantage is that ex is a standard Unix editor available in most Linux/Unix distributions.
See also:
How to jump between matching HTML/XML tags? at Vim SE
How to remove inner content of html tag conditionally? at Vim SE

Resources