Is there a way to get metaStyles when converting docx to markdown with pandoc? - pandoc

Is there a way to get word paragraphs with metaStyles (Title, Subtitle, Author, Date, Abstract) converted to markdown with custom-style attribute?
To reproduce it:
A word document with a single paragraph "MyTitle" styled with "Title"
Without the --standalone option the metaStyles are ignored.
When using the --standalone option the metaStyles are exported to meta data in markdown as follows:
---
title: MyTitle
---
What I need is to get the following in markdown (for Title, Subtitle etc.):
::: {custom-style="Title"}
MyTitle
:::
I'd expect it when using -f docx+styles but this is only the case when the word document starts with a non-metaStyle paragraph!
To reproduce it:
A word document with two paragraphs:
First paragraph "std" styled with "Standard"
Second paragraph "MyTitle" styled with "Title"

Related

Trying to create rich text link to source citation for inline citation with pandoc citeproc

Trying to create an output that would allow the URL part of a citation to appear as clickable rich text link to source file.
So far I have tried to create a custom CSL file that would output as (TITLE, DATE) ([SOURCE LINK](URL)) so that when I run it through pandoc it would turn into a rich text link when converted to DOCX, HTML, PDF, etc.
However, when I run the following command it populates it as escaped markdown that would not result as a rich link but just the raw text.
pandoc input.md -o output.md --citeproc --csl chicago-custom.csl --bibliography references.bib
I also tried to output as html but it creates the link for the URL, but not the "SOURCE LINK" part and looks like this: /[SOURCE LINK]/(URL) for markdown or URL with html output
Is there a different approach than custom CSL to do this?

Custom title page on pptx generated from markdown

From https://pandoc.org/MANUAL.html#extension-yaml_metadata_block I see how to generate a pptx presentation from a markdown file
pandoc habits.txt -o habits.pptx
Title is set with
% title
% author(s) (separated by semicolons)
% date
How can I add a subtitle or another arbitrary text on the title page? I mean to add it to the title page of the slides, the .pptx file.
I think there is some confusion about the title block: what's specified in the question is a pandoc title block; it does not support subtitles.
YAML metadata blocks are the generic method of specifying document metadata. Including a subtitle is straight-forward this way:
---
title: title
subtitle: the elusive subtitle
author:
- Author 1
- Author 2
date: 2021-05-29
---
# Content slides start here

How can I prevent pandoc from inserting an <h1> element of my title in the content

I have a book made of several ordered Markdown files. I am using Pandoc to convert those into an epub file, and things are mostly okay. I can embed the font I like and provide my own CSS, etc. The problem is that the output file contains an element that is not present in the Markdown (as a "#" header element). This element is then being picked up by the ToC function and inserted into the Table of Contents. I didn't ask for the element to be present, and I can't find an option to turn it off.
Here's how to reproduce, with a much simpler case than my actual one, but it's sufficient to demonstrate the problem. I have the following file structure:
- pandoctest/
- src/
- file1.md
- file2.md
- epub.yml
The contents are as follows:
file1.md:
Here is some text.
file2.md:
# Chapter one
The chapter goes here.
epub.yml:
---
title:
- type: main
text: A Book
creator:
- role: author
text: Some Dude
---
And the pandoc command I'm running is:
pandoc -o output.epub epub.yml --toc src/*
The end result is something like this:
Page 1: An appropriate title page using the title and author elements from epub.yml
Page 2: The table of contents page. At the top, the title from epub.yml. Beneath that are two ToC entries. The first is the title of the book and refers to the element I don't want present on the next page. The second is "Chapter One" which refers to the # Chapter One element from my Markdown (this is appropriate).
Page 3: First, the undesired element, which, in the raw XML looks like this:
<h1 class="unnumbered" data-number="">A Book</h1>
Then, "Here is some text", a paragraph that I did indeed tell it to put there.
Page 4: A correctly rendered "Chapter One" page.
The question here is how to get pandoc to not render the "unnumbered" header element that is not present in the Markdown. It screws up the Table of Contents and I never asked for it to be there.
For reference, here is the epub that is rendered from my little test here: https://www.dropbox.com/s/dj4jo08g7q4f9i2/output.epub?dl=0

Using fancyhdr in YAML metadata produces multiple page numbers with Pandoc

I'm using Pandoc to generate a PDF from markdown. When specifying header/footer information in the YAML metadata (as below), I continue to get a page number in the center of my footer (with the text of \fancyfoot[L] written overtop), in addition to the page number in footer on the right that I've specified with \fancyfoot[R].
How can I remove the default page number in the footer at center? If I use \pagenumbering{gobble} it just removes all page numbers, at center and on right.
---
title: Test Title
author: Author Name
header-includes:
- \usepackage{fancyhdr}
- \pagestyle{fancy}
- \fancyhead[L]{Author Name}
- \fancyhead[R]{Test Title}
- \fancyfoot[L]{Extra text here}
- \fancyfoot[R]{\thepage}
---
Currently using Pandoc 1.17.2 on OSX 10.11.6.
Well, I think this should work if you just give the center field an empty content field. That is at least one way in which it works in Latex and hopefully the same for pandoc.
\fancyfoot[C]{}

Retain YAML data when converting markdown to html with pandoc

Is it possible to retain the yaml header in a markdown file when converting to html using pandoc?
Or, even better, convert the yaml to json and keep it in the converted file.
E.g.,
---
title: My Title
subtitle: My Subtitle
...
# Pandoc
We Love pandoc
To:
---
title: My Title
subtitle: My Subtitle
...
<h1>Pandoc</h1>
<p>We Love pandoc</p>
Or something like:
{title: "My Title", subtitle: "My Subtitle"}
<h1>Pandoc</h1>
<p>We Love pandoc</p>
Update
So, I guess I'll use templates and do something like this:
{title: $title$, subtitle:$subtitle$}
Not out of the box as HTML doesn't have the concept of a YAML or JSON block (without javascript). You can customise your output with templates:
Templates
To see the default template that is used, just type
pandoc -D FORMAT
A custom template can be specified using the --template option. You
can also override the system default templates for a given output
format FORMAT by putting a file templates/default.FORMAT in the user
data directory (see --data-dir
So you can create a custom template that deals with values in the YAML metadata block as you see fit.
Examples 5.11 YAML metadata block
Template variables will be set automatically from the metadata. Thus,
for example, in writing HTML, the variable abstract will be set to the
HTML equivalent of the markdown in the abstract field:

Resources