Preserve Line Breaks in Pandoc Markdown -> LaTeX Conversion - pandoc

I want to convert the following *.md converted into proper LaTeX *.tex.
Lorem *ipsum* something.
Does anyone know lorem by heart?
That would *sad* because there's always Google.
Expected Behavior / Resulting LaTeX from Pandoc
Lorem \emph{ipsum} something.
Does anyone know lorem by heart?
That would \emph{sad} because there's always Google.
Observed Behavior / Resulting LaTeX from Pandoc
Lorem \emph{ipsum} something. Does anyone know lorem by heart?
That would \emph{sad} because there's always Google.
Why do I care?
1. I'm transitioning a bigger git repo from markdown to LaTeX, and I want a clean diff and history.
2. I actually like my LaTeX with one sentence-per-line even though it does not matter for the typesetting.
How can I get Pandoc to do this?
Ps.: I am aware of the option hard_line_breaks, but that only adds \\ between the two first lines, and does not actually preserve my line breaks.

Update
Since pandoc 1.16, this is possible:
pandoc --wrap=preserve
Old answer
Since Pandoc converts the Markdown to an AST-like internal representation, your non-semantic linebreaks are lost. So what you're looking for is not possible without some custom scripting (like using --no-wrap and then processing the output by inserting a line-break wherever there is a dot followed by a space).
However, you can use the --columns NUMBER options to specify the number of characters on each line. So you won't have a sentence per line, but NUMBER of characters per line.

A much simpler solution would be to add two spaces after "...something.". This will add a manual line break (the method is mentioned in the Pandoc Manual).

I figured out another way to address this problem – which is to not change the original *.mds (under version control), but to simply read them in and to have them "pandoced" when building the PDF.
Here's how:
Some markdown.md in project root:
Happy one-sentence-per-line **markdown** stuff.
And another line – makes for clear git diffs!
And some latexify.tex in project root:
\documentclass{article}
\begin{document}
\immediate\write18{pandoc markdown.md -t latex -o tmp.tex}
\input{tmp.tex}
\end{document}
Works just dandy if you have some markdown components in a latex project, e.g. github READMEs or sth.
Requires no special package, but compilation with shell-escape enabled.

Related

pandoc .md to .docx conversion: preserving original linebreaks

I'm converting a simple markdown file to docx. I would like the output not to wrap lines.
My impression from the documentation is that --wrap=none will do this, but the docx output still wraps lines not separated by a blank line. I prefer not to wrap the file content with "~~~" lines.
Is there another switch I'm missing?
mb21 suggests that my questions is a duplicate of Pandoc: no line wrapping when converting to HTML. It isn't quite a duplicate, since that issue addresses how to get lines to break in a controlled way rather than how to preserve the original line breaks. However, the answers in that thread mentions the "-f markdown+hard_line_breaks option", which DOES solve my issue. Thanks to mb21!

markdown files don't format properly

I am using using jekyll and markdown for the first time to build a blog site. From what I understand about markdown files, the pound key is what is used to comment lines, except it does the complete opposite for me. Anything with in all of my .md files are commented out, and the things that are supposed to be comments, are actual live text on the page. Here's what I mean:
Does anyone know what the problem is? It was working properly yesterday, so I'm thinking that it may be a problem with my text editor (Atom). Thanks!
From what I understand about markdown files, the pound key is what is used to comment lines, except it does the complete opposite for me.
Nope. Hashes used to represent various levels of header:
Atx-style headers use 1-6 hash characters at the start of the line, corresponding to header levels 1-6. For example:
# This is an H1
## This is an H2
###### This is an H6
Markdown doesn't have the concept of comments, although it does support inline HTML, so you can use HTML comments, e.g.
<!-- This is a comment -->

Can I automatically update msgids in gettext's .po files for trivial text changes?

With gettext, the original (usually English) text of messages serves as
the message key ("msgid") for the translations. This means that every time the
original text changes, the msgid must be updated in all the .po files.
For real changes of the text, this is obviously unavoidable, as the
translator must update the translation.
However, if the change of the original does not change its meaning,
re-translation is superflous (e.g. change in punctation, whitespace
changes, or correction of a spelling mistake).
Is there a way to update the .po files automatically in that case?
I tried to use xgettext & msgmerge (with fuzzy matching turned on), but
fuzzy matching sometimes fails, plus this produces lots of ugly
"#,fuzzy" flags.
Note: There is a similar question:
How to efficiently work with gettext PO files when making small edits to large text values
However, it's about large strings, thus about a more specific problem.
One way to avoid the problem is to leave the msgids alone, have a .po file for the original language and make the fix inside that.
It always strikes me as being more of a work around than a proper fix though. For the next iteration (where there will definitely be more msgid changes) the msgid is changed and either the translators translate it in their usual update or each language is updated by hand when the msgid is changed.
I've had exactly this issue when doing minor changes to a django project. What I do is the following:
Change message in code.
Run find and replace on all translation files ("django.po"), replacing the old message (msgid) with the new one.
Run django-admin makemessages.
If I have done things right, the last step is superflous (i.e, you have done the change for gettext). django uses the gettext utilities, so it shouldn't matter how you make your message files.
I find and replace like so:
find . -name "*.po" -print | xargs sed -i 's/oldmessageid/newmessageid/g' Courtesy of http://rushi.vishavadia.com/blog/find-replace-across-multiple-files-in-linux

Is there any way to comment out text in textile?

LaTeX has %, html has <\!-- to denote that a comment folows.
Does textile have anyway of commenting out text? I couldn't find one, and it seems like it would be a nice feature to have.
Not really. It seems you can do a single line HTML escape sequence containing an HTML comment which is passed through. But you probably want something more like the C Preprocessor comments that are simply stripped out completely?
==<!-- html comment -->==
Or you could do this, which outputs a multiline html comment, but I doubt it's what you want either:
notextile. <!-- test
test
test
-->
The TextPattern version of Textile does support comments.
The syntax is to have a line beginning with three hashes and one or two full stops.
###. This line will be treated as a comment.
So will this.
This line will be displayed.
###.. Blank lines are allowed in comments if you use two full stops.
This line is also a comment.
p. Starting a new block will end the comment.
Currently, the RedCloth and Mylyn implementations of Textile do not support these comments.

Markdown to plain text in Ruby?

I'm currently using BlueCloth to process Markdown in Ruby and show it as HTML, but in one location I need it as plain text (without some of the Markdown). Is there a way to achieve that?
Is there a markdown-to-plain-text method? Is there an html-to-plain-text method that I could feel the result of BlueCloth?
RedCarpet gem has a Redcarpet::Render::StripDown renderer which "turns Markdown into plaintext".
Copy and modify it to suit your needs.
Or use it like this:
Redcarpet::Markdown.new(Redcarpet::Render::StripDown).render(markdown)
Converting HTML to plain text with Ruby is not a problem, but of course you'll lose all markup. If you only want to get rid of some of the Markdown syntax, it probably won't yield the result you're looking for.
The bottom line is that unrendered Markdown is intended to be used as plain text, therefore converting it to plain text doesn't really make sense. All Ruby implementations that I have seen follow the same interface, which does not offer a way to strip syntax (only including to_html, and text, which returns the original Markdown text).
It's not ruby, but one of the formats Pandoc now writes is 'plain'. Here's some arbitrary markdown:
# My Great Work
## First Section
Here we discuss my difficulties with [Markdown](http://wikipedia.org/Markdown)
## Second Section
We begin with a quote:
> We hold these truths to be self-evident ...
then some code:
#! /usr/bin/bash
That's *all*.
(Not sure how to turn off the syntax highlighting!) Here's the associated 'plain':
My Great Work
=============
First Section
-------------
Here we discuss my difficulties with Markdown
Second Section
--------------
We begin with a quote:
We hold these truths to be self-evident ...
then some code:
#! /usr/bin/bash
That's all.
You can get an idea what it does with the different elements it parses out of documents from the definition of plainify in pandoc/blob/master/src/Text/Pandoc/Writers/Markdown.hs in the Github repository; there is also a tutorial that shows how easy it is to modify the behavior.

Resources