I tried to convert html to docx by using Pandoc:
here is my html code:
<p> Example: ${v_1} = {\rm{ }}{v_2}$</p>
with MathJax config in head:
MathJax.Hub.Config({
extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
jax: ["input/TeX", "output/HTML-CSS"],
tex2jax: {
inlineMath: [['$', '$'], ["\(", "\)"]],
displayMath: [['$$', '$$'], ["\[", "\]"]],
},
"HTML-CSS": {availableFonts: ["TeX"]}
});
Pandoc command that i used (Pandoc version 2.2.3.2):
pandoc -s test.html --mathjax -f html+tex_math_dollars --pdf-engine=xelatex -o xxx.docx
then i got a warning:
[WARNING] Could not convert TeX math '{v_1} = {\rm{ }}{v_2}', rendering as TeX:
{v_1} = {\rm{ }}{v_2}
^
unexpected "{"
expecting "%", "\\label", "\\nonumber" or whitespace
Someone please tell me how to fix this. Thanks!
Use the LaTeX \textrm instead of the plain tex \rm, and pandoc will be able to handle it.
Since 7k users have viewed this question since it was asked... perhaps others have made the same mistake I made as a novice RStudio user.
The first comment in both the README.md and the README.Rmd file is
<!-- README.md is generated from README.Rmd. Please edit that file -->
The intended meaning is (at least arguably) apparent if you pay sufficient attention to the this/that relative pronouns!
<!-- You should edit the README.Rmd file, not the README.md file -->
To repair the damage... I'm currently trying the suggestion to use an explicit devtools::build_readme() which I found in RStudio README.Rmd and README.md should be both staged use 'git commit --no-verify' to override this check
No luck yet ... but I feel like I'm (finally!) making forward progress on getting $\sqrt{x}$ to display properly in my github README!
Related
I assume that inserting a reference to a BibTex bibliography in a YAML-metadata is sufficient for the references to be produced. This is like pandoc does not print references when .bib file is in YAML, which was perhaps misunderstood and which has no accepted answer yet.
I have the example input file:
---
title: Ontologies of what?
author: auf
date: 2010-07-29
keywords: homepage
abstract: |
What are the objects ontologists talk about.
Inconsistencies become visible if one models real objects (cats) and children playthings.
bibliography: "BibTexExample.bib"
---
An example post. With a reference to [#Frank2010a] and more.
## References
I invoke the conversion to latex with :
pandoc -f markdown -t pdf postWithReference.markdown -s --verbose -o postWR.pdf -w latex
The pdf is produced, but it contains no references and the text is rendered as With a reference to [#Frank2010a] and more. demonstrating that the reference file was not used. The title and author is inserted in the pdf, thus the YAML-metadata is read. If I add the reference file on the command line, the output is correctly produce with the reference list.
What am I doing wrong? I want to avoid specifying the bibliography file (as duplication, DRY) on the command line. Is there a general switch to demand bibliography processing and leaving the selection of the bibliography file to the document YAML-metada?
In the more recent version requires --citeproc instead of --filter=pandoc-citeproc
Theo bibliography is inserted by the pandoc-citeproc filter. It will be run automatically when biblioraphy is set via the command lines, but has to be run manually in cases such as yours. Addind --filter=pandoc-citeproc will make it work as expected.
Background
Pandoc's markdown lets you specify extensions for how you would like your markdown to be handled:
Markdown syntax extensions can be individually enabled or disabled by appending +EXTENSION or -EXTENSION to the format name. So, for example, markdown_strict+footnotes+definition_lists is strict markdown with footnotes and definition lists enabled, and markdown-pipe_tables+hard_line_breaks is pandoc’s markdown without pipe tables and with hard line breaks.
My specific question
For a given pandoc conversion where, say, I use grid tables in my source:
pandoc myReport.md --from markdown+pipe_tables --to latex -o myReport.pdf
How can I write a pandoc YAML block to accomplish the same thing (specifying that my source contains grid tables?)
A generalized form of my question
How can I turn extensions on and off using pandoc YAML?
Stack Overflow Questions that I don't think completely answer my question
Can I set command line arguments using the YAML metadata - This one deals with how to specify output options, but I'm trying to tell pandoc about the structure of my input
What can I control with YAML header options in pandoc? - Answerers mention pandoc's templates, but neither the latex output template nor the markdown template indicate any sort of option for grid_tables. So, it's not clear to me from these answers how knowing about the templates will help me figure out how to structure my YAML.
There may also not be a way to do this
It's always possible that pandoc isn't designed to let you specify those extensions in the YAML. Although, I'm hoping it is.
You can use Markdown Variants to do this in an Rmarkdown document. Essentially, you enter your extensions into a variant option in the YAML header block at the start of the your .Rmd file.
For example, to use grid tables, you have something like this in your YAML header block:
---
title: "Habits"
author: John Doe
date: March 22, 2005
output: md_document
variant: markdown+grid_tables
---
Then you can compile to a PDF directly in pandoc by typing in your command line something like:
pandoc yourfile.md -o yourfile.pdf
For more information on markdown variants in RStudio: http://rmarkdown.rstudio.com/markdown_document_format.html#markdown_variants
For more information on Pandoc extensions in markdown/Rmarkdown in RStudio:
http://rmarkdown.rstudio.com/authoring_pandoc_markdown.html#pandoc_markdown
You can specify pandoc markdown extension in the yaml header using md_extension argument included in each output format.
---
title: "Your title"
output:
pdf_document:
md_extensions: +grid_tables
---
This will activate the extension. See Rmarkdown Definitive Guide for details.
Outside Rmarkdown scope, you can use Pandocomatic to it, or Paru for Ruby.
---
title: My first pandocomatic-converted document
pandocomatic_:
pandoc:
from: markdown+footnotes
to: html
...
As Merchako noted, the accepted answer is specific to rmarkdown. In, for instance, Atom md_extensions: does not work.
A more general approach would be to put the extensions in the command line options. This example works fine:
----
title: "Word document with emojis"
author: me
date: June 9, 2021
output:
word_document:
pandoc_args: ["--standalone", "--from=markdown+emoji"]
----
For people stumbling across this in or after 2021, this can be done without Rmarkdown. You can specify a YAML "defaults" file, which basically includes anything you could want to configure.
In order to do what OP wanted, all you'd need to do is
from: markdown+pipe_tables
in the defaults file, then pass it when you compile.
You can also specify the input and output files, so you can end up with the very minimal command
pandoc --defaults=defaults.yaml
and have it handle the rest for you. See https://pandoc.org/MANUAL.html#extensions for more.
sorry for my english in my post (it is my first on this forum, and my question is perhaps stupid).
I encounter a problem in converting a html file to pdf file with pandoc.
Here is my code in the console
set Path=%Path%;C:\Users\nicolas\AppData\Local\Pandoc
(redirecting to Pandoc directory)
followed by
pandoc --data-dir=C:\Users\nicolas\Desktop essai.html -o essai.pdf
As indicated, my file is in the Desktop, but I got the following error:
pandoc: essai.html: openFile: does not exist (No such file or directory)
I get the same error if i do (with the file essai.html in the same folder as pandoc.exe):
pandoc essai.html -o essai.pdf
Have you any idea of the cause of my problem? (I precise that the file's name i want to convert is correct).
Remark: My original problem was to create a pdf faithful to the beautiful html file generated by Ipython Notebook via pandoc but I encounter the same kind of problem when i want to convert a .ipynb file in pdf with nbconvert.
I finally solve my problem by adding the full paths to my files (But I have used wkhtmltopdf which is simpler to use for a good result.)
Is it possible to embed images into a docx file that are embedded in a HTML file?
I am trying and it's not working for me, and perhaps I am not adding some extra parameter when I am running pandoc.
pandoc -f html -t docx -o testdoc.docx image.html
Thank you very much!
I managed to solve this by executing the following command:
pandoc -s file_name.html -o file_name.docx;
There are actually 2 important points that you need to consider:
The quality of the output file is pretty much related to how pandoc interpret your HTML file, so that if the source was pretty complex then you wouldn't really expect a pretty good quality output, for instance the <hr/> tag is not recognized by pandoc, while the <p> tag is.
The path of the image is not an HTTP path but instead it is a full desk path, meaning:
This is NO good:
<img src="http://www.example.com/images/img.jpg" />
And This is what pandoc can really read:
<img src="/var/www/example.com/images/img.jpg" />
HTH
I've got 2 questions concerning the latex output of doxygen:
How can one organize the related pages (those created by \page) ? (They seem to be organized according to the title of the page)
How to specify which latex stylesheet to use ? (i've found nothing in the Doxyfile)
I would like to get rid of the paragraph numbers for the class members.
thanks
For LaTeX output, you can generate the first part of refman.tex (see LATEX_HEADER) and the style sheet included by that header (normally doxygen.sty), using:
doxygen -w latex header.tex doxygen.sty
In current doxygen (I use 1.8.11), you can modify the footer as well, the command by #synthesizerpatel won't work anymore. Now you have to say
doxygen -w latex header.tex footer.tex doxygen.sty
You can use the modified files by setting these variables in your Doxyfile
LATEX_HEADER = header.tex
LATEX_FOOTER = footer.tex