Convert HTML and inline Mathjax math to LaTeX with pandoc ruby - ruby

I'm building a Rails app and I'm looking for a way to convert database entries with html and inline MathJax math (TeX) to LaTeX for pdf creation.
I found similar questions like mine:
Convert html mathjax to markdown with pandoc
How to convert HTML with mathjax into latex using pandoc?
and I see two options here:
Create a Haskell executable which leaves stuff like \(y=f(x)\) alone when converting html to LaTeX
Write a ruby method which does the following things:
Take the string and split it into an array with a regex (string.split(regex))
loop through the created array and if content matches regex convert the parts to LaTeX which do not include inline math with PandocRuby.html(string).to_latex
concatenate everything back together (array.join)
I would prefer the ruby method solution because I'm hosting my application on Heroku and I don't like to checkin binaries into git.
Note: the pandoc binary is implemented this way http://www.petekeen.net/introduction-to-heroku-buildpacks)
So my question is: what should the regex look like to split the string by \(math\).
E.g. string can look like this: text \(y=f(x) \iff \log_{10}(b)\) and \(a+b=c\) text
And for the sake of completeness, how should the Haskell script be written to leave \(math\) alone when converting to LaTeX and the ruby method is not a possible solution?

Get the very latest version of pandoc (1.12.2). Then you can do
pandoc -f html+tex_math_dollars+tex_math_single_backslash -t latex

Related

Making Sphinx format the Markdown code examples in Python docstrings

I'm trying to use Sphinx to auto-generate API documentation for a Python library, and I can't make it properly format the example code snippets in the docstrings - they do get indented but lines of the same indentation get concatenated (https://weka-io.github.io/easypy)
I understand that the problem is that the format I'm using to mark the code blocks is Markdown (indent them by 4 spaces) but Sphinx is expecting reStructuredText (code-block::)
I've tried googling for a solution and it recommended using recommonmark - but it seems to be for using .md files as the source. I'm using sphinx-apidoc to generate the "source" .rst files from the Python code - so it's not going to work (unless there is a way to make sphinx-apidoc generate .md files instead)
So - how do I make Sphinx treat just the Python docstrings as Markdown, leaving the elaborate reStructuredText framework as is for everything else?

With Pandoc, how to converting between different formats with additional rules?

I have some existing Mediawiki format texts that contain categories tokens like
[[Category:XXX]]
[[Category:YYY]]
I'd like to convert them to Markdown texts. The basic command for doing that with Pandoc is
pandoc -f mediawiki -t markdown -s mytext.mediawiki -o mytext.md
The resultant Markdown text is mostly usable except that it converts the category tokens to
<Category:XXX> <Category:YYY>
which isn't really what I need. Instead, I need
[[!tag XXX YYY]]
because I'm using the resultant Markdown files as source files in a special content management system called Ikiwiki which has its idiosyncratic format for tags. How to do that with Pandoc?
It's probably easiest to do this as a second step with a search and replace on <Category:XXX>. Note that pandoc without the -o option writes to standard-out, so you can pipe it directly to some custom post-processing script.
[[Category:XXX]] is converted by pandoc internally to a link along the lines of Category:XXX (try pandoc -f mediawiki -t native).
So generally, additional rules for elements are implemented through custom scripts that match on Pandoc's internal data types, see Pandoc scripting. So you could match on those kind of links. It's more work (the first time), but makes quite sure you don't replace false positives.

Reformat Markdown files to a specific code style

I'm working on a book which had a couple of people writing and editing the text. Everything is Markdown. Unfortunately, there is a mix of different styles and lines widths. Technically this isn't a problem but it's not nice in terms of aesthetics.
What is the best way to reformat those files in e.g. GitHub markdown style? Is there a shell script for this job?
You might want to look at Pandoc; it understands several flavors of Markdown.
pandoc -f markdown -t gfm foobar.md
Having written a markup converter years ago in Perl, I would not want to approach such a task without a decent lexical analyzer, which is a bit beyond shell scripting.
I wrote a tool called tidy-markdown that will reformat any Markdown (including GFM) according to this styleguide.
$ tidy-markdown < ./ugly-markdown.md > ./clean-markdown.md
It handles conversion of inline HTML to Markdown, normalization of syntactic elements like code blocks (converting them to fenced), lists, block-quotes, front-matter, headers, and will even attempt to standardize code-block language identifiers.

where is a list of markdown tags supported by redcarpet gem

Is there is list of the markdown tags supported by the redcarpet gem?
For example, some markdown implementations support centering text, some don't. Rather than trial and error experimentation, it seems like such a popular gem would be documented somewhere?
I don't think redcarpet is responsible for the markdown - it's simply a renderer; it uses some libraries to interpret the required code
After some research, it seems all of the markdown interpreters are originally based on the UpSkirt library, which was derived from this Daring Fireball project:
Markdown is a text-to-HTML conversion tool for web writers. Markdown
allows you to write using an easy-to-read, easy-to-write plain text
format, then convert it to structurally valid XHTML (or HTML).
Thus, “Markdown” is two things: (1) a plain text formatting syntax;
and (2) a software tool, written in Perl, that converts the plain text
formatting to HTML. See the Syntax page for details pertaining to
Markdown’s formatting syntax. You can try it out, right now, using the
online Dingus.
You can find the sytnax here

Markdown to plain text in Ruby?

I'm currently using BlueCloth to process Markdown in Ruby and show it as HTML, but in one location I need it as plain text (without some of the Markdown). Is there a way to achieve that?
Is there a markdown-to-plain-text method? Is there an html-to-plain-text method that I could feel the result of BlueCloth?
RedCarpet gem has a Redcarpet::Render::StripDown renderer which "turns Markdown into plaintext".
Copy and modify it to suit your needs.
Or use it like this:
Redcarpet::Markdown.new(Redcarpet::Render::StripDown).render(markdown)
Converting HTML to plain text with Ruby is not a problem, but of course you'll lose all markup. If you only want to get rid of some of the Markdown syntax, it probably won't yield the result you're looking for.
The bottom line is that unrendered Markdown is intended to be used as plain text, therefore converting it to plain text doesn't really make sense. All Ruby implementations that I have seen follow the same interface, which does not offer a way to strip syntax (only including to_html, and text, which returns the original Markdown text).
It's not ruby, but one of the formats Pandoc now writes is 'plain'. Here's some arbitrary markdown:
# My Great Work
## First Section
Here we discuss my difficulties with [Markdown](http://wikipedia.org/Markdown)
## Second Section
We begin with a quote:
> We hold these truths to be self-evident ...
then some code:
#! /usr/bin/bash
That's *all*.
(Not sure how to turn off the syntax highlighting!) Here's the associated 'plain':
My Great Work
=============
First Section
-------------
Here we discuss my difficulties with Markdown
Second Section
--------------
We begin with a quote:
We hold these truths to be self-evident ...
then some code:
#! /usr/bin/bash
That's all.
You can get an idea what it does with the different elements it parses out of documents from the definition of plainify in pandoc/blob/master/src/Text/Pandoc/Writers/Markdown.hs in the Github repository; there is also a tutorial that shows how easy it is to modify the behavior.

Resources