How to include the abstract in HTML output - pandoc

A simple example with pandoc 2.19.2:
$ cat test.md
---
title: An Example
author: Luís
language: en-IE
abstract: |
This is my abstract.
---
# Intro
Some text.
# Conclusion
More text.
$ pandoc -o test.html test.md
$ cat test.html
<h1 id="intro">Intro</h1>
<p>Some text.</p>
<h1 id="conclusion">Conclusion</h1>
<p>More text.</p>
The abstract does not appear in the HTML output, but in other formats it does (e.g. PDF). Is any extra parameter necessary for HTML?

Pandoc 2.17 and later support this by default. A custom template has to be used for older pandoc versions. E.g., download the updated defaults.html5 template and pass it to pandoc via
pandoc --template=/path/to/defaults.html5

Related

Inline Markdown code blocks are not highlighted

Whenever I want to highlight something in Markdown I use backticks (`). But when I try to convert the text:
with following command:
pandoc -f markdown -t pdf test.md -o test.pdf
I get this in resulting pdf:
Is there a way to make it being highlighted like in other editors - with a box around that has somewhat different background color?
Example of what I would like to see in pdf output:
https://dillinger.io/
You can define colors for inline code blocks in latex:
\usepackage{xcolor}
\definecolor{bgcolor}{HTML}{ffcccb}
\definecolor{tcolor}{HTML}{c7254e}
\let\oldtexttt\texttt
\renewcommand{\texttt}[1]{
\textcolor{tcolor}{\colorbox{bgcolor}{\oldtexttt{#1}}}
}
And add as a header:
pandoc test.md -o test.pdf -H head.tex

Remove header YAML with sed from a markdown file

I have a text in markdown that I want to pass to HTML with pandoc and delete the header. This is the command:
sed '/---/,/---/d' java.md | pandoc - -f markdown -t html5 --wrap=none -o java.html
and this is the header:
---
title: Instalar JAVA en Ubuntu
subtitle: Subtitle
author:
- I am an author
date: \today{}
---
The problem is that it also removes part of the text where this ------ appears.
What code do I need from sed?
You can use ^ (start of line) and $ (end of line) to prevent ------- from being matched.
sed '/^---$/,/^---$/d' file.md

Why does pandoc convert "<br>" to "+ \n" on html to asciidoc conversion?

I am trying to convert HTML to asciidoc using pandoc but pandoc converts <br> tags into +\n instead of \n like the following.I also tried asciidoc-escaped_line_breaks but nothing changed.
Terminal Command:
`pandoc +RTS -K100000000 -RTS --wrap=preserve -f html -t asciidoc-escaped_line_breaks "input.html" -o "output.asciidoc"`
input.html
s
<br>
s
output.asciidoc
s +
s
Expected Output:
s
s
Version:pandoc 1.19.2.4
The escaped_line_breaks extension is currently only implemented for markdown, not for AsciiDoc.
You could use a pandoc lua filter like the following, to strip all LineBreak elements from the document:
function LineBreak()
return {}
end
Save this to e.g. strip-linebreaks.lua. Note that you have a really old pandoc version, you need a newer one to use lua filters. Then:
pandoc -f html --lua-filter strip-linebreaks.lua -t asciidoc

Pandoc: escape HTML option

While other markdown implementations have a switch to escape HTML, I couldn't find one for Pandoc.
I want Pandoc to convert HELLO <blink>WORLD</blink> to <p>HELLO <blink>WORLD</blink></p>.
Kramdown and Maruku don't seem to support this, how about Pandoc?
You can disable the extension raw_html by using this command to compile:
pandoc -f markdown-raw_html -t html
Although the output does not exactly matches your expected output because it will also transform > to >.

Using GREP to find table tags

I'm trying to search through a large directory for any .html files that contain any <table> tags. The grep command seems to be the most appropriate, but I'm having some trouble nailing down the parameters to pass.
Currently I have: grep -r -l "^<table>$" /directory_to_search_through
I used -r to recursively search through all files and -l to print only the file names. However, the current string specification searches exclusively for <table>, but I want to do a more comprehensive search that includes any table tags that include ids, classes, etc. Additionally, I want to search through only .html files, but specifying the directory as /directory/*.html yields a 'No such file or directory' message. Any help would be much appreciated.
To do this reliably you really need to use a bona fide HTML parser. If it's xhtml then an XML parser would be fine, too.
You could get a good approximation of your desired results with something like this:
find /directory/to/search -name '*.html' | xargs grep -l '<table[ \t>]'
That will check all the .html files in the directory tree rooted at /directory/to/search, identifying those that contain (the beginning of) a <table> start tag, anywhere on the line, but it can also identify false positives such the text <table inside a CDATA section (if in fact the file contains XHTML).
As you have already discovered, grep is not the ideal tool for the job. If your input is well-formed XHTML, you could use an XML parser such as xmlstarlet:
xmlstarlet sel -t -m //table -f -o " table id:" -v "#id" -o " class:" -v "#class" -n *.html
This simply selects all <table> elements and extracts their id, class and the name of the file that they were found in.
For example:
$ cat file.html
<html>
<body>
<table id="abc" class="something">
</table>
</body>
</html>
$ cat file2.html
<html>
<body>
<table id="def" class="something-else">
</table>
</body>
</html>
$ xmlstarlet sel -t -m //table -f -o " table id:" -v "#id" -o " class:" -v "#class" -n *.html
file.html table id:abc class:something
file2.html table id:def class:something-else

Resources