pandoc .md to .docx conversion: preserving original linebreaks - pandoc

I'm converting a simple markdown file to docx. I would like the output not to wrap lines.
My impression from the documentation is that --wrap=none will do this, but the docx output still wraps lines not separated by a blank line. I prefer not to wrap the file content with "~~~" lines.
Is there another switch I'm missing?
mb21 suggests that my questions is a duplicate of Pandoc: no line wrapping when converting to HTML. It isn't quite a duplicate, since that issue addresses how to get lines to break in a controlled way rather than how to preserve the original line breaks. However, the answers in that thread mentions the "-f markdown+hard_line_breaks option", which DOES solve my issue. Thanks to mb21!

Related

Better way to include content as-is with AsciiDoc include directive

Context
I am making a script that dynamically inserts include directives in code blocks on an AsciiDoc file and then generates a PDF out of that. A generated AsciiDoc file could look like this:
= Title
[source,java]
---
include::foo.java[]
---
I want the user to be free to include whatever char-based file he or she wants, even other AsciiDoc files.
Problems
My goal is to show the contents as are of these included files. I run into problems when the included file:
is recognized as AsciiDoc beacuse of its extension, an thus any include directives it has are interpreted. I don't want nested includes, just to show the include directive in the code block. Example of undesired behaviour:
contains the code block delimiter ----, as seen on the image above when I end up with two code blocks instead of the intended single one. In this case, it does not matter if the file is recognized as an AsciiDoc file, the problem persists.
My workaround
The script I am writing uses AsciidoctorJ and I am leveraging that I can control how the content of each file is included by using an include processor. Using the include processor I wrap each line of each file with the pass:[] macro. Additionally, I activate macro substitution on the desired code block. A demonstration of this idea is shown in the image above.
Is there a better way to show the exact contents of a file? This works, but it seems like a hack. I would much rather prefer not having to change the read lines as I am currently doing.
EDIT for futher information
I would like to:
not have to escape the block delimiter. I am not exclusively referring to ----, but whatever the delimiter happens to be. For example, the answer by cirrus still has the problem when a line of the included file has .....
not have to escape the include directives in files recognized as AsciiDoc.
In a general note, I don't want to escape (or modify in any way) any lines.
Problem I found with my workaround:
If the last char of a line is a backslash (\), it escapes the closing bracket of the pass:[] macro.
You can try using a literal block. Based on your above example:
a.adoc:
= Title
....
include::c.adoc[]
....
If you use include:: in c.adoc, asciidoctor will still try to find and include the file. As such you will need to replace include:: with \include::
c.adoc:
\include::foo.txt[]
----
----
Which should output the following pdf:

markdown files don't format properly

I am using using jekyll and markdown for the first time to build a blog site. From what I understand about markdown files, the pound key is what is used to comment lines, except it does the complete opposite for me. Anything with in all of my .md files are commented out, and the things that are supposed to be comments, are actual live text on the page. Here's what I mean:
Does anyone know what the problem is? It was working properly yesterday, so I'm thinking that it may be a problem with my text editor (Atom). Thanks!
From what I understand about markdown files, the pound key is what is used to comment lines, except it does the complete opposite for me.
Nope. Hashes used to represent various levels of header:
Atx-style headers use 1-6 hash characters at the start of the line, corresponding to header levels 1-6. For example:
# This is an H1
## This is an H2
###### This is an H6
Markdown doesn't have the concept of comments, although it does support inline HTML, so you can use HTML comments, e.g.
<!-- This is a comment -->

Preserve Line Breaks in Pandoc Markdown -> LaTeX Conversion

I want to convert the following *.md converted into proper LaTeX *.tex.
Lorem *ipsum* something.
Does anyone know lorem by heart?
That would *sad* because there's always Google.
Expected Behavior / Resulting LaTeX from Pandoc
Lorem \emph{ipsum} something.
Does anyone know lorem by heart?
That would \emph{sad} because there's always Google.
Observed Behavior / Resulting LaTeX from Pandoc
Lorem \emph{ipsum} something. Does anyone know lorem by heart?
That would \emph{sad} because there's always Google.
Why do I care?
1. I'm transitioning a bigger git repo from markdown to LaTeX, and I want a clean diff and history.
2. I actually like my LaTeX with one sentence-per-line even though it does not matter for the typesetting.
How can I get Pandoc to do this?
Ps.: I am aware of the option hard_line_breaks, but that only adds \\ between the two first lines, and does not actually preserve my line breaks.
Update
Since pandoc 1.16, this is possible:
pandoc --wrap=preserve
Old answer
Since Pandoc converts the Markdown to an AST-like internal representation, your non-semantic linebreaks are lost. So what you're looking for is not possible without some custom scripting (like using --no-wrap and then processing the output by inserting a line-break wherever there is a dot followed by a space).
However, you can use the --columns NUMBER options to specify the number of characters on each line. So you won't have a sentence per line, but NUMBER of characters per line.
A much simpler solution would be to add two spaces after "...something.". This will add a manual line break (the method is mentioned in the Pandoc Manual).
I figured out another way to address this problem – which is to not change the original *.mds (under version control), but to simply read them in and to have them "pandoced" when building the PDF.
Here's how:
Some markdown.md in project root:
Happy one-sentence-per-line **markdown** stuff.
And another line – makes for clear git diffs!
And some latexify.tex in project root:
\documentclass{article}
\begin{document}
\immediate\write18{pandoc markdown.md -t latex -o tmp.tex}
\input{tmp.tex}
\end{document}
Works just dandy if you have some markdown components in a latex project, e.g. github READMEs or sth.
Requires no special package, but compilation with shell-escape enabled.

Testing for extended characters in watir-webdriver

I need to check for text with extended character set characters in my watir-webdriver scripts.
For example checking for a link has the follow text;
Weiß
I read the text from a CSV file, which when edited looks like the above text.
But when running the test in FireFox I get the following failure.
Wrong values on attribute table after add all save.
<"Wei\247"> expected but was
<"Wei\303\237>.
I tried saving it in the CSV as Wei\303\237 but the expected value then had double backslash characters.
How can I encode this in the CSV so I can check the text value safely cross platform and browser?
I had this problem, and I got around it by writing it in the spreadsheet as something like {S} and gsubbing it when I read the file into Ruby. If you gsub the text when you check the link too then basically you have your own encoding method for special characters. This is a long way around, so I'd be very interested in other answers.
The double backslash is probably because when your code reads from the CSV it escapes the backslashes in the file to preserve the text. Therefore you can't put the unicode in your CSV file. I don't really know a way around this. I hear that Ruby unicode support isn't that great, but is being worked on as of 1.9.x.

Is there any way to comment out text in textile?

LaTeX has %, html has <\!-- to denote that a comment folows.
Does textile have anyway of commenting out text? I couldn't find one, and it seems like it would be a nice feature to have.
Not really. It seems you can do a single line HTML escape sequence containing an HTML comment which is passed through. But you probably want something more like the C Preprocessor comments that are simply stripped out completely?
==<!-- html comment -->==
Or you could do this, which outputs a multiline html comment, but I doubt it's what you want either:
notextile. <!-- test
test
test
-->
The TextPattern version of Textile does support comments.
The syntax is to have a line beginning with three hashes and one or two full stops.
###. This line will be treated as a comment.
So will this.
This line will be displayed.
###.. Blank lines are allowed in comments if you use two full stops.
This line is also a comment.
p. Starting a new block will end the comment.
Currently, the RedCloth and Mylyn implementations of Textile do not support these comments.

Resources