Is it possible to modify the character used to denote start/end of emphasis and strong emphasis in pandoc's markdown?
In particular, I'd like to use /emphasis/ and *strong emphasis*.
There is no option in pandoc to customize individual pieces of markdown syntax -- you would have to write another input format for that. I think the easiest way to achieve this is to use a pre-processor that converts your custom syntax into regular markdown-strict or markdown syntax.
Here is one example, using filepp (there are many other options, including a sed or awk script):
#regexp /\/\b/_/
#regexp /\b\//_/
#regexp /\*\b/\*\*/
#regexp /\b\*/\*\*/
Some *bold* and some /emphasis/
To add the preprocessing step to compilation:
filepp -m regexp.pm myfile.md | pandoc ...
For instance, compiling to pandoc -t html:
<p>Some <strong>bold</strong> and some <em>emphasis</em></p>
To make this durable save the preproc commands in their own file, let's say ~/.pandoc-pp
#regexp /\/\b/_/
#regexp /\b\//_/
#regexp /\*\b/\*\*/
#regexp /\b\*/\*\*/
And include at the top of every markdown document:
#include ~/.pandoc-pp
/emphasis/ is not markdown for emphasis, only *foo* and _bar_ is... and the pandoc markdown writer currently only supports the former.
Either way, if you're asking about generating markdown; you could write a pandoc filter that replaces Emph x with Str "/" <> x <> Str "/"). If you're asking about taking markdown as input to pandoc, you should probably try a preprocessor as suggested by #scoa.
Related
I have a file text.txt which contains very basic latex/markdown. For example, it might be the following.
Here is some basic maths: $f(x) = ax + b$ defines a straight line, often called a "linear" function---but it's not _actually_ a linear function, eg $f(0) \ne 0$.
I would like to convert this into html using WebTeX. However, I don't want smart quotes---" should be outputted as basic straight lines, not curved on either end---or smart dashes------ should be literally three dashes, not an em-dash.
It seems that the smart option is good for this: pandoc manual, github 1, github 2. However, I can't quite work out the correct syntax. I have tried, for example, the following.
pandoc text.txt -f markdown-smart -t markdown-smart -s --webtex -o tex.html
Unfortunately this doesn't work.
I solved this while writing the question, so I'll post the answer below! (Spoiler alert: simply remove -t markdown-smart.)
Simply remove -t markdown-smart.
pandoc text.txt -f markdown-smart -s --webtex -o tex.html
I believe that this -t is saying "to markdown without smart". We are not trying to output markdown, but rather html. If the version with -t is viewed, then one sees that the code for embedding various images is included. If this is pasted into a markdown editor, then it should show up.
To get html, simply remove this.
Is it possible to have .tex files output from pandoc have math mode with dollar signs ($)? The manual says:
LaTeX: It will appear verbatim surrounded by \(...\) (for inline math) or \[...\] (for display math).
I also found this Github issue from 2016 where the author says it could be selectable. Is there now a pandoc argument or another way of having the .tex output use dollar signs?
You can do this using a pandoc filter. E.g.:
-- Do nothing unless we are targeting TeX.
if not FORMAT:match('tex$') then return {} end
function Math (m)
local delimiter = m.mathtype == 'InlineMath' and '$' or '$$'
return pandoc.RawInline('tex', delimiter .. m.text .. delimiter)
end
Save to file dollar-math.lua, and pass it to pandoc via --lua-filter=dollar-math.lua.
Pandoc supports a YAML metadata block in markdown documents. This can set the title and author, etc. It can also manipulate the appearance of the PDF output by changing the font size, margin width and the frame sizes given to figures that are included. Lots of details are given here.
I'd like to use the metadata block to remember the command line arguments that I'm supposed to be using, such as --toc and --number-sections. I tried this, adding the following to the top of my markdown:
---
title: My Title
toc: yes
number-sections: yes
---
Then I used the command line:
pandoc -o guide.pdf articheck_guide.md
This did produce a table of contents, but didn't number the sections. I wondered why this was, and if there is a way I can specify this kind of thing from the document so that I don't need to add it on the command line.
YAML metadata are not passed to pandoc as arguments, but as variables. When you call pandoc on your MWE, it does not produce this :
pandoc -o guide.pdf articheck_guide.md --toc --number-sections
as we think it would. rather, it calls :
pandoc -o guide.pdf articheck_guide.md -V toc:yes -V number-sections:yes
Why, then, does you MWE produces a toc? Because the default latex template makes use of a toc variable :
~$ pandoc -D latex | grep toc
$if(toc)$
\setcounter{tocdepth}{$toc-depth$}
So setting toc to any value should produce a table of contents, at least in latex output. In this template, there is no number-sections variables, so this one doesn't work. However, there is a numbersections variable :
~$ pandoc -D latex | grep number
$if(numbersections)$
Setting numbersections to any value will produce numbering in a latex output with the default template
---
title: My Title
toc: yes
numbersections: yes
---
The trouble with this solution is that it only works with some output format. I thought I had read somewhere on the pandoc mailing-list that we soon would be able to use metadata in YAML blocks as intended (ie. as arguments rather than variables), but I can't find it anymore, so maybe it won't happen very soon.
Have a look at panzer (GitHub repository).
This was recently announced and released by Mark Sprevak -- a piece of software, that adds the notion of 'styles' to Pandoc.
It's basically a wrapper around Pandoc. It exploits the concept of YAML metadata blocks to the maximum.
The 'styles' provide a way to set all options for a Pandoc document conversion process with one line ("I want this document be an article/CV/notes/letter.").
You can regard this as more general abstraction than Pandoc templates. Styles are combinations of...
...Pandoc command line options,
...metadata settings,
...templates,
...instructions to run filters, and
...instructions to run pre/postprocessors.
These settings can be customized on a per output type as well as a per document basis. Styles can be...
...combined and
...can bear inheritance relations to each other.
panzer styles simplify Makefiles: they bundle everything concerning the look of a document in one place -- the YAML metadata (a block in the Markdown file, or a separate file).
You just add one line of metadata (style: ...) to your document, and it will be treated as a letter/article/CV/notebook or whatever.
Only by chance did I see an example document using the toc: true line in their YAML header options in a Markdown file to be processed by Pandoc. And the Pandoc docs didn't mention this option to control table of contents using the YAML header. Furthermore, I see somewhat arbitrary lines in example documents on the same Pandoc readme site.
Main question:
What Pandoc options are available using the YAML header?
Meta-question:
What determines the available Pandoc options that are available to set using the YAML header?
Note: my workflow is to use Markdown files (.md) and process them through Pandoc to get PDF files. It has hierarchically organized manuscript writing with math. Such as:
pandoc --standalone --smart \
--from=markdown+yaml_metadata_block \
--filter pandoc-citeproc \
my_markdown_file.md \
-o my_pdf_file.pdf
Almost everything set in the YAML metadata has only an effect through the pandoc template in use.
Pandoc templates may contain variables. For example in your HTML template, you could write:
<title>$title$</title>
These template variables can be set with the --variable KEY[=VAL] option.
However, they are also set from the document metadata, which in turn can be set either by using:
the --metadata KEY[=VAL] option,
a YAML metadata block, or
the --metadata-file option.
The --variable options inserts strings verbatim into the template, while --metadata escapes strings. Strings in YAML metadata (also when using --metadata-file) are interpreted as markdown, which you can circumvent by using pandoc markdown's generic raw attributes. For example for HTML output:
`<script>alert()</script>`{=html}
See this table for a schematic:
| | --variable | --metadata | YAML metadata and --metadata-file |
|------------------------|-------------------|-------------------|-----------------------------------|
| values can be… | strings and bools | strings and bools | also YAML objects and lists |
| strings are… | inserted verbatim | escaped | interpreted as markdown |
| accessible by filters: | no | yes | yes |
To answer your question: the template determines what fields in the YAML metadata block have an effect. To view, for example, the default latex template, use:
$ pandoc -D latex
To see some variables that are set automatically by pandoc, see the Manual. Finally, other behaviours of pandoc (such as markdown extensions, etc) can only be set as command-line options (except when using a wrapper script).
It is a rather long list that you can browse by running man pandoc in the command line and navigating to "Variables set by pandoc" section under "TEMPLATES."
The top of the list includes the following among many other options:
Variables set by pandoc
Some variables are set automatically by pandoc. These vary somewhat depending on the
output format, but include metadata fields as well as the following:
title, author, date
allow identification of basic aspects of the document. Included in PDF metadata
through LaTeX and ConTeXt. These can be set through a pandoc title block, which
allows for multiple authors, or through a YAML metadata block:
---
author:
- Aristotle
- Peter Abelard
...
subtitle
document subtitle; also used as subject in PDF metadata
abstract
document summary, included in LaTeX, ConTeXt, AsciiDoc, and Word docx
keywords
list of keywords to be included in HTML, PDF, and AsciiDoc metadata; may be
repeated as for author, above
header-includes
contents specified by -H/--include-in-header (may have multiple values)
toc non-null value if --toc/--table-of-contents was specified
toc-title
title of table of contents (works only with EPUB and docx)
include-before
contents specified by -B/--include-before-body (may have multiple values)
include-after
contents specified by -A/--include-after-body (may have multiple values)
body body of document
```
You can see the documentation of pandoc for a clue: http://pandoc.org/getting-started.html
But to know exactly where it will be used you can look for templates sources of pandoc: https://github.com/jgm/pandoc-templates
For example, for the html5 output the file is: https://github.com/jgm/pandoc-templates/blob/master/default.html5
Here's an section of the code:
<title>$if(title-prefix)$$title-prefix$ - $endif$$pagetitle$</title>
As you can see it has title-prefix and pagetitle.
You can look the documentation, but the best solution is to look for the source code of the version you are using.
The pandoc main page now contains a list of options and explanations for them:
https://pandoc.org/MANUAL.html#variables
It seems to be the same as the one when looking at man pandoc.
I want to find and replace the VALUE into a xml file :
<test name="NAME" value="VALUE"/>
I have to filter by name (because there are lot of lines like that).
Is it possible ?
Thanks for you help.
Since you tagged the question "bash", I assume that you're not trying to use an XML library (although I think an XML expert might be able to give you something like an XSLT processor command that solves this question very robustly), but that you're simply interested in doing search & replace from the commandline.
I am using perl for this:
perl -pi -e 's#VALUE#replacement#g' *.xml
See perlrun man page: Very shortly put, the -p switches perl into text processing mode, -i stands for "in-place", and -e let's you specify an expression to apply to all lines of input.
Also note (if you are not too familiar with that already) that you may use other characters than # (common ones are %, a comma, etc.) that don't clash with your search & replacement strings.
There is one small caveat: perl will read & write all files given on the commandline, even those that did not change. Thus, the files' modification times will be updated even if they did not change. (I usually work around that with some more shell magic, e.g. using grep -l or grin -l to select files for perl to work on.)
EDIT: If I understand your comments correctly, you also need help with the regular expression to apply. Let me briefly suggest something like this then:
perl -pi -e 's,(name="NAME" value=)"[^"]*",\1"NEWVALUE",g' *.xml
Related: bash XHTML parsing using xpath
You can use SED:
SED 's/\(<test name=\"NAME\"\) value=\"VALUE\"/\1 value=\"YourValue\"/' test.xml
where test.xml is the xml document containing the given node. This is very fragile, and you can work to make it more flexible if you need to do this substitution multiple times. For instance, the current statement is case sensitive, so it won't substitute the value on a node with the name="name", but you can add a case insensitivity flag to the end of the statement, like so:
('s/\(<test name=\"NAME\"\) value=\"VALUE\"/\1 value=\"YourValue\"/I').
Another option would be to use XSLT, but it would require you to download an external library. It's pretty versatile, and could be a viable option for more complex modifications to an XML document.