Convert html to docx using pandoc - pandoc

I trying to convert this html
<p><font color = \"#808080\">SHILPI</p>
to docx using pandoc with this command
pandoc -s -o "test.docx" -t html5 -t docx html_file
But its losing colors, I am getting only text, as far as I know I think html code is correct because when I am using wkhtmltopdf to convert same html to pdf it gives the proper color. What can be the issue ? Thanks in advance.

That's not how Pandoc works, it doesn't understand CSS, only HTML/Markdown. It preserves the content, but not the layout etc, see semantic HTML.
You can, however, use templates to style your output consistently. With Word it's a bit more complicated but you can use the --reference-doc option for mostly the same effect.

Try:
pandoc -s -S test.htm -o test.docx
Reference: http://pandoc.org/demos.html

This inspection highlights deprecated HTML tags and provides ability to replace them with CSS or with other tags for some of them.
Maybe better use this:
<p style="color: #808080">SHILPI</p>

Related

Pandoc tex to html: how to handle custom environments?

I am trying to convert a large body of tex code into html using Pandoc. I have multiple custom-defined environments and commands in the LaTeX source that I would like to tag with classes in the resulting HTML.
How can I make sure that the following LaTeX code:
\begin{myspecialenvironment}
hello
\end{myspecialenvironment}
is converted to the following HTML
<div class="myspecialenvironment">
hello
</div>
and the following LaTeX
\myspecialcommand{hola}
converted to HTML as follows?
<span class="myspecialcommand">hola</span>
I had the same question and looked in vain for relevant documentation, but then I noticed that Pandoc is already doing what you propose, at least for block environments. :-) I don't know about inline commands.

Highlighting specific lines of code in Pandoc Revealjs

I'm using Pandoc to generate a Reveal.js presentation. It includes code in fenced code blocks, like this:
```java
// Some Java code
```
Reveal.js supports a way to add a highlight to a specific line or range of lines, with the data-line-numbers="1" attribute that should be added to the <code> tag.
I've tried to add this attribute to the fenced code block in various ways, such as this
``` { .java data-line-numbers="1" }
// Some Java code
```
But I can't get it to work. Is there a way to use Reveal.js's data-line-numbers in Pandoc? Or perhaps Pandoc has a way to achieve something similar? Or do I need to give up and just use those messy <pre><code> HTML tags in my Markdown?
The correct syntax should be:
``` {.java .number-lines}
// Some Java code
```
Pandoc does the syntax-highlighting itself, and is sensitive to the number-lines class.
Pandoc's HTML output for code blocks does not follow the way that reveal.js expects them to be written. E.g., the default pandoc way of indicating that lines are to be numbered is to mark the block with the number-lines class, while reveal.js expects a boolean data-line-numbers attribute. Even adding the data-line-numbers attribute manually won't work: pandoc wraps the code in <pre> and <code> elements and adds all code block attributes to the <code> element, while reveal.js looks for them in the <pre> element.
I struggled with pandoc's way of handling code blocks for reveal.js output myself, so I wrote this lua-filter: revealjs-codeblock. This filter adjusts pandoc's HTML output such that it follows the reveal.js specs.
It supports the .number-lines and .numberLines classes as well as the data-line-numbers attribute.

Ruby: how to generate HTML from Markdown like GitHub's or BitBucket's?

On the main page of every repository in GitHub or BitBucket it shows the Readme.md in a very pretty format.
Is there a way to make the same thing with ruby? I have already found some gems like Redcarpet, but it never looks pretty. I've followed this instructions for Redcarpet.
Edit:
After I tried Github's markup ruby gem, the same thing is happening.
What is shown is this:
And what I want is this:
And I'm sure it's not only css missing, because after 3 backquotes (```) I write the syntax like json or bash and in the first image it is written.
Edit2:
This code here:
renderer = Redcarpet::Render::HTML.new(prettify: true)
markdown = Redcarpet::Markdown.new(renderer, fenced_code_blocks: true)
html = markdown.render(source_text)
'<script src="https://cdn.rawgit.com/google/code-prettify/master/loader/run_prettify.js"></script>'+html
Generated this:
Github provides its own ruby gem to do so: https://github.com/github/markup.
You just need to install the right dependencies and you're good to go.
You need to enable a few nonstandard features.
Fenced code blocks
Fenced code blocks are nonstandard and are not enabled by default on most Markdown parsers (some older ones don't support them at all). According to Redcarpet's docs, you want to enable the fenced_code_blocks extension:
:fenced_code_blocks: parse fenced code blocks, PHP-Markdown style. Blocks delimited with 3 or more ~ or backticks will be considered as code, without the need to be indented. An optional language name may be added at the end of the opening fence for the code block.
Syntax Highlighting
Most Markdown parsers to not do syntax highlighting of code blocks. And those that do always do it as an option. Even then, you will still need to provide your own CSS styles to have the code blocks styled properly. As it turns out, Redcarpet does include support for a prettify option to the HTML renderer:
:prettify: add prettyprint classes to <code> tags for google-code-prettify.
You will need to get the Javascript and CSS from the google-code-prettify project to include in your pages.
Solution
In the end you'll need something like this:
renderer = Redcarpet::Render::HTML.new(prettify: true)
markdown = Redcarpet::Markdown.new(renderer, fenced_code_blocks: true)
html = markdown.render(source_text)
As #yoones said Github shares their way to do it but to be more precise they use the gem "commonmarker" for markdown. Though as far as I can tell this thing does not give the full formatted HTML file but only a piece that you insert into <body>. So you can do it like I did:
require "commonmarker"
puts <<~HEREDOC
<!DOCTYPE html>
<html>
<head>
<style>#{File.read "markdown.css"}</style>
</head>
<body class="markdown-body Box-body">
#{CommonMarker.render_html ARGF.read, %i{ DEFAULT UNSAFE }, %i{ table }}
</body>
</html>
HEREDOC
Where did I get the markdown.css? I just stole the CSS files from an arbitrary Github page with README rendered and applied UNCSS to it -- resulted in a 26kb file, you can find it in the same repo I just linked.
Why the table and UNSAFE? I need this to render an index.html for Github Pages because their markdown renderer can't newlines within table cells, etc. so instead of asking it to render my README.md I make the index.html myself.

Use amp-img tag in place of img tag for images - Sphinx

I am working on creating a Sphinx theme based on Accelerated Mobile Pages (AMP). While creating it, I came to realize that since AMP uses amp-img tag in place of the img tag. Is there a way to convert all the img tag in the sphinx generated docs to amp-img
There is nothing out of the box. The easiest option is to post-process your HTML output from Sphinx.

How do I do strikethrough (line-through) in asciidoc?

How do I render a strikethrough (or line-through) in an adoc file?
Let's presume I want to write "That technology is -c-r-a-p- not perfect."
That technology is [line-through]#crap# not perfect.
As per Ascii Doc manual, [line-through] is deprecated. You can test here.
Comment from Dan Allen
It's important to understand that line-through is just a CSS role. Therefore, it needs support from the stylesheet in order to appear as though it is working.
If I run the following through Asciidoctor (or Asciidoctor.js):
[.line-through]#strike#
I get:
<span class="line-through">strike</span>
The default stylesheet has a rule for this:
.line-through{text-decoration:line-through}
You would need to do the same.
It is possible to customize the HTML that is generated using custom templates (Asciidoctor.js supports Jade templates). In that case, you'd override the template for inline_quoted, check for the line-through role and produce either an <s> or, preferably, a <del> instead of the span.
If you're only targeting the HTML backend, you can insert HTML code verbatim via a passthrough context. This can be done inline by wrapping the parts in +++:
That technology is +++<del>+++crap+++</del>+++ not perfect.
This won't help you for PDF, DocBook XML, or other output formats, though.
If the output is intended for HTML you can pass HTML.
The <s> HTML element renders text with a strikethrough, or a line
through it. Use the element to represent things that are no longer
relevant or no longer accurate.
https://developer.mozilla.org/en-US/docs/Web/HTML/Element/s
To render as:
Example text.
use:
1. Pass inline:
Example +++<s>text</s>+++.
2. Pass-through macro:
Example pass:[<s>text</s>].
3. Pass block:
++++
Example <s>text</s>.
++++

Resources