Pandoc - convert docx to .md with docx having word shapes - pandoc

I'm playing around with pandoc to see if it is able to convert all aspect of word doc to .md reliably. Looks like it handles lot of stuff pretty well such as table of contents, images, etc.. However, I am looking to see if it can also understand a diagram in word doc that has been made using combining multiple shapes of word. for, e.g. diagram like below in your word doc:
when I do "pandoc --extract-media=. my.docx -o my.md" to convert to .md, mark down doc does not have any thing related to word shapes. Looks like it does not understand it. Is there any way to make pandoc smart enough to undestand word shapes ?

No, pandoc cannot handle these. There are two issues for this on the pandoc issue tracker, #4735, and #2792.

Related

Converting page range from docx to pdf using pandoc

I'm trying to convert certain pages of a docx to pdf using pandoc but I can't find any sources hinting at where should I start. After taking a look into the pandoc documentation I still couldn't figure it out, so I just assumed that pandoc doesn't support this.
This might just act as a confirmation for future readers, Does pandoc support page range converting?
Pandoc has no concept of pages.
Putting text on pages happens during rendering with Word and LaTeX, but pandoc does not render the text before converting. Therefore it cannot know on which page a specific letter will be placed.

Markdown to markdown but undo hard wraps

Can pandoc be used to take a (pandoc) markdown file that is hard wrapped and reflow the text and use one line for paragraph, but otherwise don't change anything? Usecase would be to take hardwrapped text and transform it so an online textbox doesn't mess it up when pasting.
It depends a little in your definition of "doesn't change anything else", but --wrap=none is probably the option that you are after. Pandoc's Markdown output is opinionated, so it may not do exactly what you want.

Pandoc: Different outputs to docx and LaTeX formats

In LaTeX one could for example have a nice inline equation like $x^2=4$, which in docx format I would be glad to have as italic text.
Is there a way to tell Pandoc to use one of these solutions depending on the output format?
When searching for a possible solution, I realized pandoc has filters and templates. I would not really understand, which direction to follow.
But I would really like to arrive with a more general solution, that would also work for analogous tasks like, for example, smaller spaces between a number and units: In LaTeX straightforward $\;$, but including this in my Markdown document would not give me a satisfactory result in DOCX or ODT output.
This is what I found from the pandoc manual
For docx output, styles will be defined in the output file as inheriting from normal
text, if the styles are not yet in your reference.docx. If they are already defined,
pandoc will not alter the definition.
and please read the --reference-doc=FILE part of the maunal
--reference-doc=FILE
Use the specified file as a style reference in producing a docx or ODT file.
...
how to use the reference-doc in pandoc???
create a empty docx file and rename it (eg. refer.docx)
define the styles you want to display
add "--reference-doc=(refer.docx path)" into your pandoc command line .

markdown or markup to powerpoint?

I need to maintain some slides in both latex beamer and in powerpoint. (This is to make slides available for instructors elsewhere, too, 90% of which do not know how to use latex and are unwilling to learn it. and I am a latex guy on linux.)
I have tried the route via Libreoffice (and opendocument), but this did not come out well. right now, the best method that I have found is to author pdf in beamer, then run it through a nuance OCR program to get MS Word...and not even go all the way to Powerpoint (which is where I really need to be).
If I only had a markup language that produced nice Powerpoint, I could probably code a perl translator from markdown to this intermediate markup language. (going from markdown to latex beamer is relatively easy.)
I don't think this exists, but hope springs eternal. after all, it is almost 2014 now. does anyone know of a solution?
One solution is to use odpdown: It converts markdown to the OpenOffice Presenter format, which can be imported into PowerPoint.
It is not yet complete, i.e. table support is missing and possibly not running on certain Windows setups, but nevertheless it could be a start. Possibly, you have Linux running, where it seems to work.
Steve Rindsberg's answer in the comments works on PP 2007 works! Let me repeat it here:
I suspect that PowerPoint is the likeliest solution. ;-) But what sort
of slides are you creating? If they're simple heading and bullet point
slides, all you need to produce is a simple text file. Any text that
starts in the left column will be the heading of a new slide. Indent
one tab and it becomes a first-level bullet point under the current
heading; indent two tabs, it becomes a second level bullet point and
so on. Simply use File | Open on the text file to pull it into PPT.
Steve: Is this all that PP converts? Or is there a reference of other "sneaky" markup that PP knows about?
(pandoc: unfortunately, the conversion from libreoffice to powerpoint is pretty poor when I tried it last. I also tried to save and understand the powerpoint xml format, but that was REAL bad.)
The easiest way to handle this is to work with:
RStudio (and R if not already installed)
RMarkdown
Pandoc 2.0.5 (minimum)
Install those 3 (or 4) items, then read: https://bookdown.org/yihui/rmarkdown/powerpoint-presentation.html
The installation time is worth the time saved copy-pasting everything from scratch.
I also am a Linux guy and I also use LateX engines to create nice documents. Based on my experience, here's what you should do :
Stop writing directly in LaTeX and start using org-mode to write documents instead (I spent years writing in LaTeX and now it's over (except when I use modernv package))
Org supports latex math formulas and .org files are easily exported in .tex files
Org can also be easily exported in markdown
Once you have your markdown, there are several tools that will allow you to create a PowerPoint. Two of them are pandoc and md2pptx

How to use Pandoc image alignment to align two images in the same row?

From the pandoc documentation I know how to insert an image
http://johnmacfarlane.net/pandoc/README.html#images
Now I want to align two images in the same row, how can I do that?
My desired output format is pdf.
You can try something like this:
![](tests/lalune.jpg)\ ![](tests/lalune.jpg)
This will give you two side-by-side images. You won't have a caption, though; pandoc only treats an image as a captioned figure if it is by itself in a paragraph.
Expanding on from John's answer, if you do want a single caption under two side by side figures, a 'hack' would be to do this:
![](https://upload.wikimedia.org/wikipedia/commons/7/70/Example.png){width=60%}
![](https://upload.wikimedia.org/wikipedia/commons/7/70/Example.png){width=40%}
\begin{figure}[!h]
\caption{A single caption for the two subfigures}
\end{figure}
This results in one caption for two images placed side by side. You might need to tweak each individual image's width setting, or the !h caption placement specifier to get things looking like this:
I found this helpful because you don't have to download the picture off the internet as in a pure LaTeX \subfigure solution. I.e. just use pandoc markdown to get the image, and LaTeX to generate the caption.
If you want to go crazy, you can actually use the same idea above to make subfigure captions like so:
![](https://upload.wikimedia.org/wikipedia/commons/7/70/Example.png){width=60%}
![](https://upload.wikimedia.org/wikipedia/commons/7/70/Example.png){width=40%}
\begin{figure}[!h]
\begin{subfigure}[t]{0.6\textwidth}
\caption{Caption for the left subfigure}
\end{subfigure}
\hfill
\begin{subfigure}[t]{0.4\textwidth}
\caption{Caption for the right subfigure}
\end{subfigure}
\caption{A single caption for the two subfigures}
\end{figure}
Edit 20180910:
You'll need to include the following packages in the pandoc YAML frontmatter/header:
header-includes: |
\usepackage{caption}
\usepackage{subcaption}
One simple way is convert your file first to a .tex file, where you can adjust your image alignment with LaTeX command minipage or so. Then you could obtain your .pdf file running latex or pandoc in command line. See Pandoc Demos for example.
You could use a preprocessor like gpp to include options like align image. Or you could do it like how John told you:
![](tests/lalune.jpg)\ ![](tests/lalune.jpg)

Resources