Keep custom code block attributes in pandoc when converting to Markdown - pandoc

I am converting an org file to Markdown (specifically commonmark). I am adding a custom attribute to my code blocks, which the commonmark writer does not support, and strips them from the code block during conversion. I am trying to find a way to keep my custom attributes.
This is what I have:
#+begin_src python :hl_lines "2"
def some_function():
print("foo bar")
return
#+end_src
This is what I want in my .md file:
``` python hl_lines="2"
def some_function():
print("foo bar")
return
```
After doing some research, I think a filter can solve my issue: I am now playing with panflute, a python lib for writing pandoc filters.
I found some relevant questions, but they apply to other conversions (rST -> html, rst -> latex) and I don't know enough Lua to translate the code into Python and the org -> md conversion.
Thanks for any help.

I was able to write a script, posting it here for future Python-based questions about pandoc filters.
The filter below requires panflute, but there are other libs for pandoc filters in Python.
import panflute
def keep_attributes_markdown(elem, doc, format="commonmark"):
"""Keep custom attributes specified in code block headers when exporting to Markdown"""
if type(elem) == panflute.CodeBlock:
language = "." + elem.classes[0]
attributes = ""
attributes = " ".join(
[key + "=" + value for key, value in elem.attributes.items()]
)
header = "``` { " + " ".join([language, attributes]).strip() + " }"
panflute.debug(header)
code = elem.text.strip()
footer = "```"
content = [
panflute.RawBlock(header, format=format),
panflute.RawBlock(code, format=format),
panflute.RawBlock(footer, format=format),
]
return content
def main(doc=None):
return panflute.run_filter(keep_attributes_markdown, doc=doc)
if __name__ == "__main__":
main()
You can now run the following command:
pandoc --from=org --to=commonmark --filter=/full/path/to/keep_attributes_markdown.py --output=target_file.md your_file.org

Related

Pandoc equivalent to Sphinx's :download: role

Sphinx defines a role :download: that instructs Sphinx to copy the reference file to _downloads.
Does Pandoc has a similar feature?
Pandoc does not have that feature built-in, but it can be added with a few lines of Lua:
local prefix = 'media'
local path = pandoc.path
function Code (code)
if code.attributes.role == 'download' then
local description, filename = code.text:match '(.*) %<(.*)%>$'
local mimetype, content = pandoc.mediabag.fetch(filename)
local mediabag_filename = path.join{
pandoc.utils.sha1(content),
path.filename(filename)
}
if content and mimetype then
pandoc.mediabag.insert(mediabag_filename, mimetype, content)
end
return pandoc.Link(description, path.join{prefix, mediabag_filename})
end
end
Use the script by saving it to a file download-role.lua and the call pandoc with
pandoc --lua-filter=download-role.lua --extract-media=media ...
This will also work when using Markdown:
`this example script <../example.py>`{role=download}

How to include the source line number everywhere in html output in Sphinx?

Let's say I'm writing a custom editor for my RestructuredText/Sphinx stuff, with "live" html output preview. Output is built using Sphinx.
The source files are pure RestructuredText. No code there.
One desirable feature would be that right-clicking on some part of the preview opens the editor at the correct line of the source file.
To achieve that, one way would be to put that line number in every tag of the html file, for example using classes (e.g., class = "... lineno-124"). Or use html comments.
Note that I don't want to add more content to my source files, just that the line number be included everywhere in the output.
An approximate line number would be enough.
Someone knows how to do this in Sphinx, my way or another?
I decided to add <a> tags with a specific class "lineno lineno-nnn" where nnn is the line number in the RestructuredText source.
The directive .. linenocomment:: nnn is inserted before each new block of unindented text in the source, before the actual parsing (using a 'source-read' event hook).
linenocomment is a custom directive that pushes the <a> tag at build time.
Half a solution is still a solution...
import docutils.nodes as dn
from docutils.parsers.rst import Directive
class linenocomment(dn.General,dn.Element):
pass
def visit_linenocomment_html(self,node):
self.body.append(self.starttag(node,'a',CLASS="lineno lineno-{}".format(node['lineno'])))
def depart_linenocomment_html(self,node):
self.body.append('</a>')
class LineNoComment(Directive):
required_arguments = 1
optional_arguments = 0
has_content = False
add_index = False
def run(self):
node = linenocomment()
node['lineno'] = self.arguments[0]
return [node]
def insert_line_comments(app, docname, source):
print(source)
new_source = []
last_line_empty = True
lineno = 0
for line in source[0].split('\n'):
if line.strip() == '':
last_line_empty = True
new_source.append(line)
elif line[0].isspace():
new_source.append(line)
last_line_empty = False
elif not last_line_empty:
new_source.append(line)
else:
last_line_empty = False
new_source.append('.. linenocomment:: {}'.format(lineno))
new_source.append('')
new_source.append(line)
lineno += 1
source[0] = '\n'.join(new_source)
print(source)
def setup(app):
app.add_node(linenocomment,html=(visit_linenocomment_html,depart_linenocomment_html))
app.add_directive('linenocomment', LineNoComment)
app.connect('source-read',insert_line_comments)
return {
'version': 0.1
}

Declare additional dependency to sphinx-build in an extension

TL,DR: From a Sphinx extension, how do I tell sphinx-build to treat an additional file as a dependency? In my immediate use case, this is the extension's source code, but the question could equally apply to some auxiliary file used by the extension.
I'm generating documentation with Sphinx using a custom extension. I'm using sphinx-build to build the documentation. For example, I use this command to generate the HTML (this is the command in the makefile generated by sphinx-quickstart):
sphinx-build -b html -d _build/doctrees . _build/html
Since my custom extension is maintained together with the source of the documentation, I want sphinx-build to treat it as a dependency of the generated HTML (and LaTeX, etc.). So whenever I change my extension's source code, I want sphinx-build to regenerate the output.
How do I tell sphinx-build to treat an additional file as a dependency? That is not mentioned in the toctree, since it isn't part of the source. Logically, this should be something I do from my extension's setup function.
Sample extension (my_extension.py):
from docutils import nodes
from docutils.parsers.rst import Directive
class Foo(Directive):
def run(self):
node = nodes.paragraph(text='Hello world\n')
return [node]
def setup(app):
app.add_directive('foo', Foo)
Sample source (index.rst):
.. toctree::
:maxdepth: 2
.. foo::
Sample conf.py (basically the output of sphinx-quickstart plus my extension):
import sys
import os
sys.path.insert(0, os.path.abspath('.'))
extensions = ['my_extension']
templates_path = ['_templates']
source_suffix = '.rst'
master_doc = 'index'
project = 'Hello directive'
copyright = '2019, Gilles'
author = 'Gilles'
version = '1'
release = '1'
language = None
exclude_patterns = ['_build']
pygments_style = 'sphinx'
todo_include_todos = False
html_theme = 'alabaster'
html_static_path = ['_static']
htmlhelp_basename = 'Hellodirectivedoc'
latex_elements = {
}
latex_documents = [
(master_doc, 'Hellodirective.tex', 'Hello directive Documentation',
'Gilles', 'manual'),
]
man_pages = [
(master_doc, 'hellodirective', 'Hello directive Documentation',
[author], 1)
]
texinfo_documents = [
(master_doc, 'Hellodirective', 'Hello directive Documentation',
author, 'Hellodirective', 'One line description of project.',
'Miscellaneous'),
]
Validation of a solution:
Run make html (or sphinx-build as above).
Modify my_extension.py to replace Hello world by Hello again.
Run make html again.
The generated HTML (_build/html/index.html) must now contain Hello again instead of Hello world.
It looks like the note_dependency method in the build environment API should do what I want. But when should I call it? I tried various events but none seemed to hit the environment object in the right state. What did work was to call it from a directive.
import os
from docutils import nodes
from docutils.parsers.rst import Directive
import sphinx.application
class Foo(Directive):
def run(self):
self.state.document.settings.env.note_dependency(__file__)
node = nodes.paragraph(text='Hello done\n')
return [node]
def setup(app):
app.add_directive('foo', Foo)
If a document contains at least one foo directive, it'll get marked as stale when the extension that introduces this directive changes. This makes sense, although it could get tedious if an extension adds many directives or makes different changes. I don't know if there's a better way.
Inspired by Luc Van Oostenryck's autodoc-C.
As far as I know app.env.note_dependency can be called within the doctree-read to add any file as a dependency to the document currently being read.
So in your use case, I assume this would work:
from typing import Any, Dict
from sphinx.application import Sphinx
import docutils.nodes as nodes
def doctree-read(app: Sphinx, doctree: nodes.document):
app.env.note_dependency(file)
def setup(app: Sphinx):
app.connect("doctree-read", doctree-read)

pandoc-citeproc as API: processCites' does not add references

I have a small text file in markdown :
---
title: postWithReference
author: auf
date: 2010-07-29
keywords: homepage
abstract: |
What are the objects of
ontologists .
bibliography: "/home/frank/Workspace8/SSG/site/resources/BibTexLatex.bib"
csl: "/home/frank/Workspace8/SSG/site/resources/chicago-fullnote-bibliography-bb.csl"
---
An example post. With a reference to [#Frank2010a] and more[#navratil08].
## References
and process it in Haskell with processCites' which has a single argument, namely the Pandoc data resulting from readMarkdown. The bibliography and the csl style should be taken from the input file.
The process does not produce errors, but the result of processCites is the same text as the input; references are not treated at all. For the same input the references are resolved with the standalone pandoc (this excludes errors in the bibliography and the csl style)
pandoc -f markdown -t html --filter=pandoc-citeproc -o p1.html postWithReference.md
The issue is therefore in the API. The code I have is:
markdownToHTML4 :: Text -> PandocIO Value
markdownToHTML4 t = do
pandoc <- readMarkdown markdownOptions t
let meta2 = flattenMeta (getMeta pandoc)
-- test if biblio is present and apply
let bib = Just $ ( meta2) ^? key "bibliography" . _String
pandoc2 <- case bib of
Nothing -> return pandoc
_ -> do
res <- liftIO $ processCites' pandoc -- :: Pandoc -> IO Pandoc
when (res == pandoc) $
liftIO $ putStrLn "*** markdownToHTML3 result without references ***"
return res
htmltex <- writeHtml5String html5Options pandoc2
let withContent = ( meta2) & _Object . at "contentHtml" ?~ String ( htmltex)
return withContent
getMeta :: Pandoc -> Meta
getMeta (Pandoc m _) = m
What do I misunderstand? are there any reader options necessary for citeproc? The bibliography is a BibLatex file.
I found in hakyll code a comment, which I cannot understand in light of the code there - perhaps somebody knows what the intention is.
-- We need to know the citation keys, add then *before* actually parsing the
-- actual page. If we don't do this, pandoc won't even consider them
-- citations!
I have a workaround (not an answer to the original question, I still hope that somebody can identify my error!). It is simple to call the standalone pandoc with System.readProess and pass the text and get the result back, not even reading and writing files:
processCites2x :: Maybe FilePath -> Maybe FilePath -> Text -> ErrIO Text
-- porcess the cites in the text (not with the API)
-- using systemcall because the standalone pandoc works with
-- call: pandoc -f markdown -t html --filter=pandoc-citeproc
-- with the input text on stdin and the result on stdout
-- the csl and bib file are used from text, not from what is in the arguments
processCites2x _ _ t = do
putIOwords ["processCite2" ] -- - filein\n", showT styleFn2, "\n", showT bibfn2]
let cmd = "pandoc"
let cmdargs = ["--from=markdown", "--to=html5", "--filter=pandoc-citeproc" ]
let cmdinp = t2s t
res :: String <- callIO $ System.readProcess cmd cmdargs cmdinp
return . s2t $ res
-- error are properly caught and reported in ErrIO
t2s and s2t are conversion utilities between string and text, ErrIO is ErrorT Text a IO and callIO is essentially liftIO with handling of errors.
The original problem was very simple: I had not included the option Ext_citations in the markdownOptions. When it is included, the example works (thanks to help I received from the pandoc-citeproc issue page). The referenced code is updated...

How to stop ImageMagick in Ruby (Rmagick) evaluating an # sign in text annotation

In an app I recently built for a client the following code resulted in the variable #nameText being evaluated, and then resulting in an error 'no text' (since the variable doesn't exist).
To get around this I used gsub, as per the example below. Is there a way to tell Magick not to evaluate the string at all?
require 'RMagick'
#image = Magick::Image.read( '/path/to/image.jpg' ).first
#nameText = '#SomeTwitterUser'
#text = Magick::Draw.new
#text.font_family = 'Futura'
#text.pointsize = 22
#text.font_weight = Magick::BoldWeight
# Causes error 'no text'...
# #text.annotate( #image, 0,0,200,54, #nameText )
#text.annotate( #image, 0,0,200,54, #nameText.gsub('#', '\#') )
This is the C code from RMagick that is returning the error:
// Translate & store in Draw structure
draw->info->text = InterpretImageProperties(NULL, image, StringValuePtr(text));
if (!draw->info->text)
{
rb_raise(rb_eArgError, "no text");
}
It is the call to InterpretImageProperties that is modifying the input text - but it is not Ruby, or a Ruby instance variable that it is trying to reference. The function is defined here in the Image Magick core library: http://www.imagemagick.org/api/MagickCore/property_8c_source.html#l02966
Look a bit further down, and you can see the code:
/* handle a '#' replace string from file */
if (*p == '#') {
p++;
if (*p != '-' && (IsPathAccessible(p) == MagickFalse) ) {
(void) ThrowMagickException(&image->exception,GetMagickModule(),
OptionError,"UnableToAccessPath","%s",p);
return((char *) NULL);
}
return(FileToString(p,~0,&image->exception));
}
In summary, this is a core library feature which will attempt to load text from file (named SomeTwitterUser in your case, I have confirmed this -try it!), and your work-around is probably the best you can do.
For efficiency, and minimal changes to input strings, you could rely on the selectivity of the library code and only modify the string if it starts with #:
#text.annotate( #image, 0,0,200,54, #name_string.gsub( /^#/, '\#') )

Resources