Why does pandoc convert "<br>" to "+ \n" on html to asciidoc conversion? - pandoc

I am trying to convert HTML to asciidoc using pandoc but pandoc converts <br> tags into +\n instead of \n like the following.I also tried asciidoc-escaped_line_breaks but nothing changed.
Terminal Command:
`pandoc +RTS -K100000000 -RTS --wrap=preserve -f html -t asciidoc-escaped_line_breaks "input.html" -o "output.asciidoc"`
input.html
s
<br>
s
output.asciidoc
s +
s
Expected Output:
s
s
Version:pandoc 1.19.2.4

The escaped_line_breaks extension is currently only implemented for markdown, not for AsciiDoc.
You could use a pandoc lua filter like the following, to strip all LineBreak elements from the document:
function LineBreak()
return {}
end
Save this to e.g. strip-linebreaks.lua. Note that you have a really old pandoc version, you need a newer one to use lua filters. Then:
pandoc -f html --lua-filter strip-linebreaks.lua -t asciidoc

Related

Inline Markdown code blocks are not highlighted

Whenever I want to highlight something in Markdown I use backticks (`). But when I try to convert the text:
with following command:
pandoc -f markdown -t pdf test.md -o test.pdf
I get this in resulting pdf:
Is there a way to make it being highlighted like in other editors - with a box around that has somewhat different background color?
Example of what I would like to see in pdf output:
https://dillinger.io/
You can define colors for inline code blocks in latex:
\usepackage{xcolor}
\definecolor{bgcolor}{HTML}{ffcccb}
\definecolor{tcolor}{HTML}{c7254e}
\let\oldtexttt\texttt
\renewcommand{\texttt}[1]{
\textcolor{tcolor}{\colorbox{bgcolor}{\oldtexttt{#1}}}
}
And add as a header:
pandoc test.md -o test.pdf -H head.tex

How to append my conversion results of pandoc to the file?

I am trying to append my HTML to markdown conversion results of pandoc to .md file. The following command overwrites the existing file instead of appending. Is there any parameter to specify the appending operation?
pandoc -f html -t markdown -o output.md
So you want to append the output of pandoc to the output.md file? Use the shell's builtin >>:
pandoc -f html -t markdown >> output.md

Pandoc: escape HTML option

While other markdown implementations have a switch to escape HTML, I couldn't find one for Pandoc.
I want Pandoc to convert HELLO <blink>WORLD</blink> to <p>HELLO <blink>WORLD</blink></p>.
Kramdown and Maruku don't seem to support this, how about Pandoc?
You can disable the extension raw_html by using this command to compile:
pandoc -f markdown-raw_html -t html
Although the output does not exactly matches your expected output because it will also transform > to >.

Pandoc: What are the available syntax highlighters?

Bullet point 18 of http://pandoc.org/demos.html#examples shows how to change the syntax highlighter used by giving an argument to --highlight-style. For example:
pandoc code.text -s --highlight-style pygments -o example18a.html
pandoc code.text -s --highlight-style kate -o example18b.html
pandoc code.text -s --highlight-style monochrome -o example18c.html
pandoc code.text -s --highlight-style espresso -o example18d.html
pandoc code.text -s --highlight-style haddock -o example18e.html
pandoc code.text -s --highlight-style tango -o example18f.html
pandoc code.text -s --highlight-style zenburn -o example18g.html
I am wondering if these are the only color schemes available. If not, how can I load a different syntax highlighter? Can I define my own?
Since pandoc 2.0.5, you can also use --print-highlight-style to output a theme file and edit it.
To me, the best way to use this option is to
Pick a pleasant available style
Output its theme file
Edit the theme file
Use it!
1. Available Styles
Pick your style, among the one already existing:
2. Output its theme file
Once you decided which style was the closest to your needs, you can output its theme file, using (for instance for pygments, the default style):
pandoc --print-highlight-style pygments
so that you can store this style in a file, using, e.g.,
pandoc --print-highlight-style pygments > my_style.theme
With some shells, especially on Windows, using redirected output can lead to encoding problems. If that happens, use this instead:
pandoc -o my_style.theme --print-highlight-style pygments
3. Edit the file
Using the Skylighting JSON Themes guide, edit the file according to your need / taste.
4. Use the file
In the right folder, just use
pandoc my_file.md --highlight-style my_style.theme -o doc.html
If your pandoc --version indicates a release of 1.15.1 (from Oct 15, 2015) or newer, then you can check if the --bash-completion parameter works for you to get a full list of available built-in highlighting styles.
Run
pandoc --bash-completion
If it works, you'll see a lot of output. And it will be useful well beyond the original question above...
If --bash-completion works, then put this line towards the end of your ${HOME}/.bashrc file (on Mac OS X or Linux -- doesn't work on Windows yet):
eval "$(pandoc --bash-completion)"
Once you open a new terminal, you can use the pandoc command with "tab completion":
pandoc --h[tab]
will yield
--help --highlight-style --html-q-tags
pandoc --hi[tab]
will yield
pandoc --highlight-style
Answer to original question:
Now punch the [tab] key one more time, and you'll see
espresso haddock kate monochrome pygments tango zenburn
It's the list of all available syntax highlighters. To shorten the precedure, you could also type
pandoc --hi[tab][tab]
to get the same result.
Usefulness of Pandoc's tab completion beyond original question:
Pandoc's bash tab completion also works for all other commandline switches:
pandoc -h[tab]
yields this -- a list of all possible command line parameters:
Display all 108 possibilities? (y or n)
--ascii --indented-code-classes --template
--asciimathml --jsmath --title-prefix
--atx-headers --katex --to
--base-header-level --katex-stylesheet --toc
--bash-completion --latex-engine --toc-depth
--biblatex --latex-engine-opt --trace
--bibliography --latexmathml --track-changes
--chapters --listings --variable
--citation-abbreviations --mathjax --verbose
--columns --mathml --version
--csl --metadata --webtex
--css --mimetex --wrap
--data-dir --natbib --write
--default-image-extension --no-highlight -A
--dpi --no-tex-ligatures -B
--dump-args --no-wrap -D
--email-obfuscation --normalize -F
--epub-chapter-level --number-offset -H
--epub-cover-image --number-sections -M
--epub-embed-font --old-dashes -N
--epub-metadata --output -R
--epub-stylesheet --parse-raw -S
--extract-media --preserve-tabs -T
--file-scope --print-default-data-file -V
--filter --print-default-template -c
--from --read -f
--gladtex --reference-docx -h
--help --reference-links -i
--highlight-style --reference-odt -m
--html-q-tags --section-divs -o
--id-prefix --self-contained -p
--ignore-args --slide-level -r
--include-after-body --smart -s
--include-before-body --standalone -t
--include-in-header --tab-stop -v
--incremental --table-of-contents -w
One interesting use case for Pandoc's tab completion is this:
pandoc --print-default-d[tab][tab]
gives the output list of completion for pandoc --print-default-data-file. This list gives you a uniq insight into what data files your instance of Pandoc will load when it is doing its work. For example you could investigate a detail of Pandoc's default ODT (OpenDocument Text file) output styling like this:
pandoc --print-default-data-file odt/content.xml \
| tr " " "\n" \
| tr "<" "\n" \
| grep --color "style"
The Pandoc README says:
--highlight-style=STYLE|FILE
Specifies the coloring style to be used in highlighted source code.
Options are pygments (the default), kate, monochrome,
breezeDark, espresso, zenburn, haddock, and tango.
For more information on syntax highlighting in pandoc, see
Syntax highlighting, below. See also
--list-highlight-styles.
Instead of a STYLE name, a JSON file with extension
.theme may be supplied. This will be parsed as a KDE
syntax highlighting theme and (if valid) used as the
highlighting style. To see a sample theme that can be
modified, pandoc --print-default-data-file default.theme.
The library skylighting (in older versions highlighting-kate) is used for the highlighting. If you don't like any of the provided color schemes, you can either:
Specify a .theme file as mentioned above,
when exporting to HTML, <span> tags are generated that you can style with your custom CSS, or
when exporting to LaTeX/PDF, you need to use a custom Pandoc LaTeX template and replace the $highlighting-macros$ part with your custom color definitions, as described in this issue.
If you are using Pandoc version 1.18 (released in October 2016) or later, a new answer is possible:
pandoc --list-highlight-languages
and
pandoc --list-highlight-styles
will give you all the info you were asking for.
Other new informational command line parameters added to v1.18 are:
pandoc --list-input-formats
pandoc --list-output-formats
pandoc --list-extensions

How to convert RTF to Markdown on the UNIX/OSX command line similar to pandoc

How do I convert RTF (say from stdin) to Markdown with a command line tool under UNIX/OSX.
I am looking for something like pandoc. However pandoc itself does not allow RTF as an input format. :-( So, I'd be happy either with a similar tool to pandoc or a pointer to an external RTF reader for pandoc.
On Mac OSX I can use the pre-installed textutil command for the RTF-to-HTML conversion, then convert via pandoc to markdown. So a command line which takes RTF from stdin and writes markdown to stdout looks like this:
textutil -stdin -convert html -stdout | pandoc --from=html --to=markdown
Using Ted and pandoc together, you should be able to do this:
Ted --saveTo text.rtf text.html
pandoc --from=html --to=markdown --out=text.md < text.html
Pandoc now supports RTF as an input format, so you can use:
cat file.rtf | pandoc --from=rtf --to=markdown

Resources