Can pandoc generate a bibliography with the references in order of citation? - pandoc

Pandoc noob here. I am trying to convert a LaTeX file into a Word document for submission to a picky journal. They are requiring that my references appear in the bibliography in the order in which they are cited. This is no problem in LaTeX, but when I use Pandoc to convert to Word my references appear in alphabetical order. I am using the basic command:
pandoc my.tex --bibliography=my.bib -o my.docx
Is there any way to force Pandoc to print the references in the order in which they appear in-text? Ideally, the references would appear in-text as numbers (bracketed, superscripted, I don't care) and the list of references would be numbered accordingly.
Any help in the direction of reducing the amount of manual work I will have to do is much appreciated.

Related

PanDoc - How to insert variables in header in docx with pandoc_title_block extension?

I would like to generate a DOCX with a variables inside header based on the texts inserted in the md file as variables, such as the title of the document, the version and the date of publication.
Through the yaml_metadata_block extension and the creation of custom fields in the reference.docx file I was able but I would like not to use form fields.
I understand (but I could be wrong) that with the extension pandoc_title_block this can be done but I don't understand how it works and I don't find examples on the net that I can study.
is what I said correct?
if so, could a simple example be shared that you can study and understand?
thank you

How to add listings or images to the table of contents (TOC)

I have a couple of examples (all with titles) and I'd like to create an index/list out of them automatically.
An example can be seen in the chunked AsciiDoc User Guide table of contents (or beneath):
The asciidoc source of the AsciiDoc User Guide does not show anything specific to me for Asciidoc itself, I could find the following hint to Docbook:
DocBook toolchains will normally automatically number examples and generate a 'List of Examples' backmatter section.
I'm looking for the (asciidoctor?) standard html5 rendering, but I'm open for different suggestions.
Adding the :doctype: book attribute alone does not do it. So I merely hit dead ends not knowing if it is possible at all. Also I'm new to Asciidoc so I might just miss some pointers, too.
The Python Asciidoc repo includes the a2x tool, which is a wrapper around a DocBook toolchain. It is DocBook that is producing these entries in the table of contents. Neither Python Asciidoc, nor asciidoctor, can do this out of the box.
You would need to curate the lists manually, or create a macro that does the curation for you. This thread might prove helpful: https://github.com/asciidoctor/asciidoctor-extensions-lab/issues/111

With Pandoc, how to converting between different formats with additional rules?

I have some existing Mediawiki format texts that contain categories tokens like
[[Category:XXX]]
[[Category:YYY]]
I'd like to convert them to Markdown texts. The basic command for doing that with Pandoc is
pandoc -f mediawiki -t markdown -s mytext.mediawiki -o mytext.md
The resultant Markdown text is mostly usable except that it converts the category tokens to
<Category:XXX> <Category:YYY>
which isn't really what I need. Instead, I need
[[!tag XXX YYY]]
because I'm using the resultant Markdown files as source files in a special content management system called Ikiwiki which has its idiosyncratic format for tags. How to do that with Pandoc?
It's probably easiest to do this as a second step with a search and replace on <Category:XXX>. Note that pandoc without the -o option writes to standard-out, so you can pipe it directly to some custom post-processing script.
[[Category:XXX]] is converted by pandoc internally to a link along the lines of Category:XXX (try pandoc -f mediawiki -t native).
So generally, additional rules for elements are implemented through custom scripts that match on Pandoc's internal data types, see Pandoc scripting. So you could match on those kind of links. It's more work (the first time), but makes quite sure you don't replace false positives.

Preserve Line Breaks in Pandoc Markdown -> LaTeX Conversion

I want to convert the following *.md converted into proper LaTeX *.tex.
Lorem *ipsum* something.
Does anyone know lorem by heart?
That would *sad* because there's always Google.
Expected Behavior / Resulting LaTeX from Pandoc
Lorem \emph{ipsum} something.
Does anyone know lorem by heart?
That would \emph{sad} because there's always Google.
Observed Behavior / Resulting LaTeX from Pandoc
Lorem \emph{ipsum} something. Does anyone know lorem by heart?
That would \emph{sad} because there's always Google.
Why do I care?
1. I'm transitioning a bigger git repo from markdown to LaTeX, and I want a clean diff and history.
2. I actually like my LaTeX with one sentence-per-line even though it does not matter for the typesetting.
How can I get Pandoc to do this?
Ps.: I am aware of the option hard_line_breaks, but that only adds \\ between the two first lines, and does not actually preserve my line breaks.
Update
Since pandoc 1.16, this is possible:
pandoc --wrap=preserve
Old answer
Since Pandoc converts the Markdown to an AST-like internal representation, your non-semantic linebreaks are lost. So what you're looking for is not possible without some custom scripting (like using --no-wrap and then processing the output by inserting a line-break wherever there is a dot followed by a space).
However, you can use the --columns NUMBER options to specify the number of characters on each line. So you won't have a sentence per line, but NUMBER of characters per line.
A much simpler solution would be to add two spaces after "...something.". This will add a manual line break (the method is mentioned in the Pandoc Manual).
I figured out another way to address this problem – which is to not change the original *.mds (under version control), but to simply read them in and to have them "pandoced" when building the PDF.
Here's how:
Some markdown.md in project root:
Happy one-sentence-per-line **markdown** stuff.
And another line – makes for clear git diffs!
And some latexify.tex in project root:
\documentclass{article}
\begin{document}
\immediate\write18{pandoc markdown.md -t latex -o tmp.tex}
\input{tmp.tex}
\end{document}
Works just dandy if you have some markdown components in a latex project, e.g. github READMEs or sth.
Requires no special package, but compilation with shell-escape enabled.

Understanding Wikipedia Title Dump Format

I downloaded the latest list of article titles from Wikipedia. It appears to contain some sort of markup but I can't seem to find any documentation to help me understand "how" or "what" can be stripped. The file is "enwiki-latest-all-titles.gz" and can be downloaded here:
http://dumps.wikimedia.org/enwiki/latest/
I could naively strip out all punctuation, etc., based on my own observations of the text file but it would be better to have more information about the data so that it can be handled in a meaningful way.
That's not markup, it's a list of page_title. Spaces are converted to underscores etc.

Resources