Pandoc: no line wrapping when converting to HTML - word-wrap

I am converting from Markdown to HTML like so:
pandoc --columns=70 --mathjax -f markdown input.pdc -t html -Ss > out.html
Everything works fine, except for the fact that the text doesn't get wrapped. I tried different columns lengths, no effect. Removed options, no go. Whatever I tried, the HTML just doesn't get wrapped. I search the bug tracker, but there don't seem to be any open bugs relating to this issue. I also checked the documentation, but as far as I could glean, the text ought be line-wrapped... So, have I stumbled into a bug?
I'm using pandoc version 1.12.4.2.
Thanks in advance for your help!

Pandoc puts newlines in the HTML so the source code is easier to read. By default, it doesn't insert <br>-tags.
If you want to preserve line breaks from markdown input:
pandoc -f markdown+hard_line_breaks input.md output.html
However, usually a better approach to limit the text width when opening the HTML file in the browser is to adapt the HTML template (pandoc -D html5) and add some CSS, like:
<!DOCTYPE html>
<html$if(lang)$ lang="$lang$"$endif$>
<head>
<style>
body {
width: 46em;
}
</style>
...

It is not clear what text should get wrapped but does not as you did not provide a sample.
Pandoc supports several line breaking scenarios in markdown documents.
What you may be looking for is the hard_line_breaks extension
If it is so then your command should look like
pandoc --columns=70 --mathjax -f markdown+hard_line_breaks input.pdc -t html -Ss > out.html
I'd recommend you to read about all the markdown-relevant options and configure pandoc to match your input markdown flavor

Related

Smart Quotes and Ligatures in pandoc

I have a file text.txt which contains very basic latex/markdown. For example, it might be the following.
Here is some basic maths: $f(x) = ax + b$ defines a straight line, often called a "linear" function---but it's not _actually_ a linear function, eg $f(0) \ne 0$.
I would like to convert this into html using WebTeX. However, I don't want smart quotes---" should be outputted as basic straight lines, not curved on either end---or smart dashes------ should be literally three dashes, not an em-dash.
It seems that the smart option is good for this: pandoc manual, github 1, github 2. However, I can't quite work out the correct syntax. I have tried, for example, the following.
pandoc text.txt -f markdown-smart -t markdown-smart -s --webtex -o tex.html
Unfortunately this doesn't work.
I solved this while writing the question, so I'll post the answer below! (Spoiler alert: simply remove -t markdown-smart.)
Simply remove -t markdown-smart.
pandoc text.txt -f markdown-smart -s --webtex -o tex.html
I believe that this -t is saying "to markdown without smart". We are not trying to output markdown, but rather html. If the version with -t is viewed, then one sees that the code for embedding various images is included. If this is pasted into a markdown editor, then it should show up.
To get html, simply remove this.

Converting from docx to markdown how to get rid of span underline in links?

Since a recent pandoc update (now I'm at 2.2.1) the links in a docx document are converted to [<span class="underline">graphic novel hero</span>](https://www.amazon.com/exec/obidos/ASIN/1596432594/braipick-20) adding a unneeded span to link labels. Is there any black magic (besides adding a sed call to the pipeline) to get rid of them and returning to pure commonmark?
The pandoc options I use are: pandoc -f docx --atx-headers --wrap=none --extract-media=. -t commonmark-smart myFile.docx
Thanks for clarifying!
If you use -t commonmark the spans that the docx-reader generates are converted to raw HTML, so you could use:
pandoc -t commonmarkd-raw_html
Alternatively, use the markdown-writer, which is more flexible in terms of extensions (but as of 2018 not yet 100%-commonmark-compliant):
pandoc -t markdown-bracketed_spans-raw_html-native_spans
See the MANUAL for more details.

Add a figure element in pandoc with filters

I'm writing a filter for pandoc in python. I'm using pandocfilters.
I want to replace a Para[Image] with a Figure[InlineEl1, InlineEl2].
Figure is not supported by pandoc, so I'm using a RawBlock to write raw html. The problem is that I don't know the html for InlineEl1 and InlineEl2. I need to let pandoc process them.
Possible workaround: use a Div and then modify the resulting html file by hand.
Is there a better method?
edit: Or maybe I can put inline elements in a RawBlock? I'm just using a simple string for now. I don't know if it's possible as I don't have any documentation available. I'm just proceeding by trial and error.
As of pandoc 2.0, the figure representation in the AST is still somewhat adhoc. It's simply a paragraph that contains nothing but an image, with the image's title attribute starting with fig:.
$ echo '![caption](/url/of/image.png)' | pandoc -t native
[Para [Image ("",[],[]) [Str "caption"] ("/url/of/image.png","fig:")]]
$ echo '![caption](/url/of/image.png)' | pandoc -t html
<figure>
<img src="/url/of/image.png" alt="caption" />
<figcaption>caption</figcaption>
</figure>
See http://pandoc.org/MANUAL.html#extension-implicit_figures

Pandoc HTML to Markdown - Non-Html tables

I use the following Pandoc command to convert HTML to Markdown
pandoc -f html -t commonmark myfile.html >myfile.md
It works great but for some reason it always converts a table to an html coded table rather than a "markdown" table (with no html tags in it). Does anyone know how I can force Pandoc to produce a non-html coded table?
that is perfectly ok because you defined commonmark for output, simply because the original markdown did not have tables and everything there was not already was adviced to do in the surrounding language. that is html in this case.
read https://daringfireball.net/projects/markdown/syntax and you will see html is allowd within markdown.
to achieve the extended markdown output as mentioned in the pandoc manual: pandoc -f html -t markdown myfile.html >myfile.md works here
result:
--- --- ---
1 2 3
1 2 3
--- --- ---
myfile.html:
<html><body>
<table>
<tr><td>1</td><td>2</td><td>3</td></tr>
<tr><td>1</td><td>2</td><td>3</td></tr>
</table>
</body></html>

How can I suppress the date when using pandoc to convert md to pdf?

I would like to create a simple pdf file from a markdown file with a title and author but no date. I cannot figure out how to suppress the date without having to edit an intermediate tex file.
---
title: Test Doc
author: My Name
---
# Some Heading Here
Text here.
When you try the command pandoc test.md -o test.pdf
The date always appears in the pdf. I have tried setting the date: yaml block to all sorts of spaces, blanks, and other combinations, but cannot figure out how to get it to be blank.
Thank you.
Pandoc uses templates. To generate PDFs, by default it uses a LaTeX template, which you can print with pandoc -D latex. In an older pandoc version, this template contained:
$if(date)$
\date{$date$}
$endif$
which causes your issue because for some reason, LaTeX prints the date if you leave the \date{} command out. So either upgrade your pandoc version or modify your template manually to contain just
\date{$date$}
or use ConTeXt instead of LaTeX:
pandoc -s -t context test.md -o test.tex && context test.tex

Resources