Overload Jinja2 autoescape for (La)TeX - markup

Is it possible to overload Jinja2's autoescape so that it escapes something in a user-specified way (i.e. something other than HTML such as LaTeX)?
Here's an example trying to escape TeX.
import jinja2
class MyEnv(jinja2.Environment):
def __init__(self, filters={}, globals={}, tests={},
loader=None, extensions=[], **kwargs):
super(MyEnv, self).__init__(
autoescape = True,
)
template = MyEnv().from_string("""\documentclass[{{ class }}]
\\begin{document}
{{ content }}
\end{document}
""")
print template.render({
'class':'memoir',
'content': '{bob} <-- is escaped',
})
When you run the above, it outputs:
\documentclass[memoir]
\begin{document}
{bob} <-- is escaped
\end{document}
The problem here is that HTML escaping is used. So { and } should be escaped, but they're not, and < is converted to < but it should not be.
I'd like to overload the escape function that Jinja2 uses to escape variables.
My first thought is to overload finalize and disable autoescape. e.g.
import jinja2
class MyEnv(jinja2.Environment):
def __init__(self, filters={}, globals={}, tests={},
loader=None, extensions=[], **kwargs):
super(MyEnv, self).__init__(
autoescape = False, # turn off autoescape
finalize = self.finalize,
)
def finalize(self, s):
import re
if isinstance(s, jinja2.Markup):
return s
s = s.replace('\\', '')
s = s.replace('~', '\\textasciitilde')
s = re.sub(r'([#|^|$|&|%|{|}])', r'\\\1', s)
s = re.sub(r'_', r'\\_', s)
return jinja2.Markup(s)
template = MyEnv().from_string("""\documentclass[{{ class }}]
\\begin{document}
{{ content }}
\end{document}
""")
print template.render({
'class':'memoir',
'content': '{bob} <-- is escaped',
})
The output is incorrect, because the main text isn't made into Markup (i.e. a string flagged as safe):
documentclass[memoir]
begin\{document\}
\{bob\} <-- is escaped
end\{document\}
If I set autoescape to True, and leave in finalize it almost works (and in this example, it does work):
\documentclass[memoir]
\begin{document}
\{bob\} <-- is escaped
\end{document}
Turning autoescape on works because it makes the main body of text for the template as Markup (i.e. safe).
However, here's where the problem lies, if I change the input to a list that's joined:
template = MyEnv().from_string("""\documentclass[{{ class }}]
\\begin{document}
{{ content|join(" > a & b > "|safe) }}
\end{document}
""")
print template.render({
'class':'memoir',
'content': ['A&B', 'C<D'],
})
When I run this I get:
\documentclass[memoir]
\begin{document}
A&B > a & b > C<D
\end{document}
It would seem HTML autoescape is being run on the elements of 'content', rather than finalize. The simplest solution, provided Jinja2 and its autoescaping are loosely coupled, would seem to be to overload a autoescape function. I can't seem to figure that out, and the best I've come up with is the finalize function.
Is there a better way to handle escaping of TeX than overloading the finalize function? Can one overload autoescape?
For example, could one install a custom Markup package? (a choice I'd prefer to avoid)
Thank you for reading.

Related

Latex nested loop not displaying correctly

Hi I'm new to latex I use here the Algorithmic package to write my pseudo code the problem I faced is that 'some text' is displayed correctly under the second loop but the 'return' statement which needs to be outside the first for loop isn't showing correctly also it does not mark the end of each loop (the vertical tic is missing), the execution result is shown in the image:
\documentclass{article}
\usepackage[utf8,linesnumbered,ruled,vlined]{algorithm2e}
\usepackage {algpseudocode}
\usepackage{algorithmicx}
\usepackage{algcompatible}
\begin{document}
\begin{algorithm}
\ContinuedFloat
\caption{My algorithm}
\textbf{Input:} solution,bound, data\_matrix, vehicle\_capacity, demand\_data,k\_max,operations\_data, move\_type,tenure, max\_number\_of\_moves,max\_iter,non\_improvement\_maxiter,itermax,epsilon\\
\textbf{Output:} $best$ $solution$ \\[0.1in]
routes = extract routes from \textbf{solution}\\
oldfitness = fitness(\textbf{solution})\\
ls\_move\_type = inversion\\
best\_solution = routes\\[0.1in]
\For{0 \leq i \leq itermax}{
new\_routes = [ ]\\
desc = 'normal route'\\
\For{route \textbf{in} routes}{
n=length(route)\\
comb = int($\frac{n!}{(n-2)!}$)\\
\If{n \geq 4}{
tabu\_list\_tenure = $\frac{comb}{5}$\\
ls\_maxiteration = 50 \\
ls\_move\_type = 'inversion'\\
}
\If{3 \leq n \leq 4}{
tabu\_list_tenure = $\frac{comb}{4}$ \\
ls\_maxiteration = 25\\
ls\_move\_type = 'relocation'\\
}
\Else{
append \textbf{route} to \textbf{new\_routes}\\
desc = 'short route'\\
}\\[0.1in]
}
some action
}
return
\end{algorithm}
\end{document}
There is no point in wondering about the output as long as you get errors in your .log file. After an error, latex only recovers enough to syntax check the rest of the document, not necessarily producing sensible output.
Some of the most critical problems:
never ignore error messages!
utf8 isn't a valid option for the algorithm2e package
\ContinuedFloat is not defined by default. If you want to use it, you need a package which defines it. Maybe you want to use the caption package?
never ever use math mode to fake italic text as in $best$ $solution$. This completely messes up the kerning
some of your _ are not escaped
you mustn't use math commands like 0 \leq i \leq outside of math mode
use something like \Return to properly format it
using \\ for line breaks is already quite questionable, but using them two times in a row is simply an error.
because one can't say it often enough: never ignore error messages!
\documentclass{article}
\usepackage[
%utf8,
linesnumbered,ruled,vlined]{algorithm2e}
\usepackage {algpseudocode}
\usepackage{algorithmicx}
\usepackage{algcompatible}
\begin{document}
\begin{algorithm}
%\ContinuedFloat
\caption{My algorithm}
\textbf{Input:} solution,bound, data\_matrix, vehicle\_capacity, demand\_data,k\_max,operations\_data, move\_type,tenure, max\_number\_of\_moves,max\_iter,non\_improvement\_maxiter,itermax,epsilon
\textbf{Output:} \emph{best solution}
\medskip
routes = extract routes from \textbf{solution}
oldfitness = fitness(\textbf{solution})
ls\_move\_type = inversion
best\_solution = routes
\medskip
\For{$0 \leq i \leq$ itermax}{
new\_routes = [ ]
desc = 'normal route'
\For{route \textbf{in} routes}{
n=length(route)
comb = int($\frac{n!}{(n-2)!}$)
\If{$n \geq 4$}{
tabu\_list\_tenure = $\frac{comb}{5}$
ls\_maxiteration = 50
ls\_move\_type = 'inversion'
}
\If{$3 \leq n \leq 4$}{
tabu\_list\_tenure = $\frac{comb}{4}$
ls\_maxiteration = 25
ls\_move\_type = 'relocation'
}
\Else{
append \textbf{route} to \textbf{new\_routes}
desc = 'short route'
}
\medskip
}
some action
}
\Return
\end{algorithm}
\end{document}

using construct_undefined in ruamel from_yaml

I'm creating a custom yaml tag MyTag. It can contain any given valid yaml - map, scalar, anchor, sequence etc.
How do I implement class MyTag to model this tag so that ruamel parses the contents of a !mytag in exactly the same way as it would parse any given yaml? The MyTag instance just stores whatever the parsed result of the yaml contents is.
The following code works, and the asserts should should demonstrate exactly what it should do and they all pass.
But I'm not sure if it's working for the right reasons. . . Specifically in the from_yaml class method, is using commented_obj = constructor.construct_undefined(node) a recommended way of achieving this, and is consuming 1 and only 1 from the yielded generator correct? It's not just working by accident?
Should I instead be using something like construct_object, or construct_map or. . .? The examples I've been able to find tend to know what type it is constructing, so would either use construct_map or construct_sequence to pick which type of object to construct. In this case I effectively want to piggy-back of the usual/standard ruamel parsing for whatever unknown type there might be in there, and just store it in its own type.
import ruamel.yaml
from ruamel.yaml.comments import CommentedMap, CommentedSeq, TaggedScalar
class MyTag():
yaml_tag = '!mytag'
def __init__(self, value):
self.value = value
#classmethod
def from_yaml(cls, constructor, node):
commented_obj = constructor.construct_undefined(node)
flag = False
for data in commented_obj:
if flag:
raise AssertionError('should only be 1 thing in generator??')
flag = True
return cls(data)
with open('mytag-sample.yaml') as yaml_file:
yaml_parser = ruamel.yaml.YAML()
yaml_parser.register_class(MyTag)
yaml = yaml_parser.load(yaml_file)
custom_tag_with_list = yaml['root'][0]['arb']['k2']
assert type(custom_tag_with_list) is MyTag
assert type(custom_tag_with_list.value) is CommentedSeq
print(custom_tag_with_list.value)
standard_list = yaml['root'][0]['arb']['k3']
assert type(standard_list) is CommentedSeq
assert standard_list == custom_tag_with_list.value
custom_tag_with_map = yaml['root'][1]['arb']
assert type(custom_tag_with_map) is MyTag
assert type(custom_tag_with_map.value) is CommentedMap
print(custom_tag_with_map.value)
standard_map = yaml['root'][1]['arb_no_tag']
assert type(standard_map) is CommentedMap
assert standard_map == custom_tag_with_map.value
custom_tag_scalar = yaml['root'][2]
assert type(custom_tag_scalar) is MyTag
assert type(custom_tag_scalar.value) is TaggedScalar
standard_tag_scalar = yaml['root'][3]
assert type(standard_tag_scalar) is str
assert standard_tag_scalar == str(custom_tag_scalar.value)
And some sample yaml:
root:
- item: blah
arb:
k1: v1
k2: !mytag
- one
- two
- three-k1: three-v1
three-k2: three-v2
three-k3: 123 # arb comment
three-k4:
- a
- b
- True
k3:
- one
- two
- three-k1: three-v1
three-k2: three-v2
three-k3: 123 # arb comment
three-k4:
- a
- b
- True
- item: argh
arb: !mytag
k1: v1
k2: 123
# blah line 1
# blah line 2
k3:
k31: v31
k32:
- False
- string here
- 321
arb_no_tag:
k1: v1
k2: 123
# blah line 1
# blah line 2
k3:
k31: v31
k32:
- False
- string here
- 321
- !mytag plain scalar
- plain scalar
- item: no comment
arb:
- one1
- two2
In YAML you can have anchors and aliases, and it is perfectly fine to have an object be a child of itself (using an alias). If you want to dump the Python data structure data:
data = [1, 2, 4, dict(a=42)]
data[3]['b'] = data
it dumps to:
&id001
- 1
- 2
- 4
- a: 42
b: *id001
and for that anchors and aliases are necessary.
When loading such a construct, ruamel.yaml recurses into the nested data structures, but if the toplevel node has not caused a real object to be constructed to which the anchor can be made a reference, the recursive leaf cannot resolve the alias.
To solve that, a generator is used, except for scalar values. It first creates an empty object, then recurses and updates it values. In code calling the constructor a check is made to see if a generator is returned, and in that case next() is done on the data, and potential self-recursion "resolved".
Because you call construct_undefined(), you always get a generator. Practically that method could return a value if it detects a scalar node (which of course cannot recurse), but it doesn't. If it would, your code could then not load the following YAML document:
!mytag 1
without modifications that test if you get a generator or not, as is done in the code in ruamel.yaml calling the various constructors so it can handle both construct_undefined and e.g. construct_yaml_int (which is not a generator).

YAML mapping order not preserved when using alias and yamlordereddictloader loader

I want to load a YAML file into Python as an OrderedDict. I am using yamlordereddictloader to preserve ordering.
However, I notice that the aliased object is placed "too soon" in the OrderedDict in the output.
How can I preserve the order of this mapping when read into Python, ideally as an OrderedDict? Is it possible to achieve this result without writing some custom parsing?
Notes:
I'm not particularly concerned with the method used, as long as the end result is the same.
Using sequences instead of mappings is problematic because they can result in nested output, and I can't simply flatten everything (some nestedness is appropriate).
When I try to just use !!omap, I cannot seem to merge the aliased mapping (d1.dt) into the d2 mapping.
I'm in Python 3.6, if I don't use this loader or !!omap order is not preserved (apparently contrary to the top 'Update' here: https://stackoverflow.com/a/21912744/2343633)
import yaml
import yamlordereddictloader
yaml_file = """
d1:
id:
nm1: val1
dt: &dt
nm2: val2
nm3: val3
d2: # expect nm4, nm2, nm3
nm4: val4
<<: *dt
"""
out = yaml.load(yaml_file, Loader=yamlordereddictloader.Loader)
keys = [x for x in out['d2']]
print(keys) # ['nm2', 'nm3', 'nm4']
assert keys==['nm4', 'nm2', 'nm3'], "order from YAML file is not preserved, aliased keys placed too early"
Is it possible to achieve this result without writing some custom parsing?
Yes. You need to override the method flatten_mapping from SafeConstructor. Here's a basic working example:
import yaml
import yamlordereddictloader
from yaml.constructor import *
from yaml.reader import *
from yaml.parser import *
from yaml.resolver import *
from yaml.composer import *
from yaml.scanner import *
from yaml.nodes import *
class MyLoader(yamlordereddictloader.Loader):
def __init__(self, stream):
yamlordereddictloader.Loader.__init__(self, stream)
# taken from here and reengineered to keep order:
# https://github.com/yaml/pyyaml/blob/5.3.1/lib/yaml/constructor.py#L207
def flatten_mapping(self, node):
merged = []
def merge_from(node):
if not isinstance(node, MappingNode):
raise yaml.ConstructorError("while constructing a mapping",
node.start_mark, "expected mapping for merging, but found %s" %
node.id, node.start_mark)
self.flatten_mapping(node)
merged.extend(node.value)
for index in range(len(node.value)):
key_node, value_node = node.value[index]
if key_node.tag == u'tag:yaml.org,2002:merge':
if isinstance(value_node, SequenceNode):
for subnode in value_node.value:
merge_from(subnode)
else:
merge_from(value_node)
else:
if key_node.tag == u'tag:yaml.org,2002:value':
key_node.tag = u'tag:yaml.org,2002:str'
merged.append((key_node, value_node))
node.value = merged
yaml_file = """
d1:
id:
nm1: val1
dt: &dt
nm2: val2
nm3: val3
d2: # expect nm4, nm2, nm3
nm4: val4
<<: *dt
"""
out = yaml.load(yaml_file, Loader=MyLoader)
keys = [x for x in out['d2']]
print(keys)
assert keys==['nm4', 'nm2', 'nm3'], "order from YAML file is not preserved, aliased keys placed too early"
This has not the best performance as it basically copies all key-value pairs from all mappings once each during loading, but it's working. Performance enhancement is left as an exercise for the reader :).

Human readable iterables in Sphinx documentation

Sphinx-autodoc flattens dicts, lists, and tuples - making long ones barely readable. Pretty-print format isn't always desired either, as some nested containers are better kept flattened than columned. Is there a way to display iterables as typed in source code?
Get it straight from source, and add an .rst command for it:
# conf.py
from importlib import import_module
from docutils import nodes
from sphinx import addnodes
from inspect import getsource
from docutils.parsers.rst import Directive
class PrettyPrintIterable(Directive):
required_arguments = 1
def run(self):
def _get_iter_source(src, varname):
# 1. identifies target iterable by variable name, (cannot be spaced)
# 2. determines iter source code start & end by tracking brackets
# 3. returns source code between found start & end
start = end = None
open_brackets = closed_brackets = 0
for i, line in enumerate(src):
if line.startswith(varname):
if start is None:
start = i
if start is not None:
open_brackets += sum(line.count(b) for b in "([{")
closed_brackets += sum(line.count(b) for b in ")]}")
if open_brackets > 0 and (open_brackets - closed_brackets == 0):
end = i + 1
break
return '\n'.join(src[start:end])
module_path, member_name = self.arguments[0].rsplit('.', 1)
src = getsource(import_module(module_path)).split('\n')
code = _get_iter_source(src, member_name)
literal = nodes.literal_block(code, code)
literal['language'] = 'python'
return [addnodes.desc_name(text=member_name),
addnodes.desc_content('', literal)]
def setup(app):
app.add_directive('pprint', PrettyPrintIterable)
Example .rst and result:
(:autodata: with empty :annotation: is to exclude the original flattened dictionary).
Some code borrowed from this answer.

Include jekyll / liquid template data in a YAML variable?

I am using the YAML heading of a markdown file to add an excerpt variable to blog posts that I can use elsewhere. In one of these excerpts I refer to an earlier blog post via markdown link markup, and I use the liquid template data variable {{ site.url }} in place of the base URL of the site.
So I have something like (trimmed it somewhat)
---
title: "Decluttering ordination plots in vegan part 2: orditorp()"
status: publish
layout: post
published: true
tags:
- tag1
- tag2
excerpt: In the [earlier post in this series]({{ site.url }}/2013/01/12/
decluttering-ordination-plots-in-vegan-part-1-ordilabel/ "Decluttering ordination
plots in vegan part 1: ordilabel()") I looked at the `ordilabel()` function
----
However, jekyll and the Maruku md parser don't like this, which makes me suspect that you can't use liquid markup in the YAML header.
Is it possible to use liquid markup in the YAML header of pages handled by jekyll?
If it is, what I am I doing wrong in the example shown?
If it is not allowed, who else can I achieve what I intended? I am currently developing my site on my laptop and don't want to hard code the base URL as it'll have to change when I am ready to deploy.
The errors I am getting from Maruku are:
| Maruku tells you:
+---------------------------------------------------------------------------
| Must quote title
| ---------------------------------------------------------------------------
| the [earlier post in this series]({{ site.url }}/2013/01/12/decluttering-o
| --------------------------------------|-------------------------------------
| +--- Byte 40
and
| Maruku tells you:
+---------------------------------------------------------------------------
| Unclosed link
| ---------------------------------------------------------------------------
| the [earlier post in this series]({{ site.url }}/2013/01/12/decluttering-or
| --------------------------------------|-------------------------------------
| +--- Byte 41
and
| Maruku tells you:
+---------------------------------------------------------------------------
| No closing ): I will not create the link for ["earlier post in this series"]
| ---------------------------------------------------------------------------
| the [earlier post in this series]({{ site.url }}/2013/01/12/decluttering-or
| --------------------------------------|-------------------------------------
| +--- Byte 41
Today I ran into a similar problem. As a solution I created the following simple Jekyll filter-plugin which allows to expand nested liquid-templates in (e.g. liquid-variables in the YAML front matter):
module Jekyll
module LiquifyFilter
def liquify(input)
Liquid::Template.parse(input).render(#context)
end
end
end
Liquid::Template.register_filter(Jekyll::LiquifyFilter)
Filters can be added to a Jekyll site by placing them in the '_plugins' sub-directory of the site-root dir. The above code can be simply pasted into a yoursite/_plugins/liquify_filter.rb file.
After that a template like...
---
layout: default
first_name: Harry
last_name: Potter
greetings: Greetings {{ page.first_name }} {{ page.last_name }}!
---
{{ page.greetings | liquify }}
... should render some output like "Greetings Harry Potter!". The expansion works also for deeper nested structures - as long as the liquify filter is also specified on the inner liquid output-blocks. Something like {{ site.url }} works of course, too.
Update - looks like this is now available as a Ruby gem: https://github.com/gemfarmer/jekyll-liquify.
I don't believe it's possible to nest liquid variables inside YAML. At least, I haven't figure out how to do it.
One approach that will work is to use a Liquid's replace filter. Specifically, define a string that you want to use for the variable replacement (e.g. !SITE_URL!). Then, use the replace filter to switch that to your desired Jekyll variable (e.g. site.url) during the output. Here's a cut down .md file that behaves as expected on my jekyll 0.11 install:
---
layout: post
excerpt: In the [earlier post in this series](!SITE_URL!/2013/01/12/)
---
{{ page.excerpt | replace: '!SITE_URL!', site.url }}
Testing that on my machine, the URL is inserted properly and then translated from markdown into an HTML link as expected. If you have more than one item to replace, you can string multiple replace calls together.
---
layout: post
my_name: Alan W. Smith
multi_replace_test: 'Name: !PAGE_MY_NAME! - Site: [!SITE_URL!](!SITE_URL!)'
---
{{ page.multi_replace_test | replace: '!SITE_URL!', site.url | replace: '!PAGE_MY_NAME!', page.my_name }}
An important note is that you must explicitly set the site.url value. You don't get that for free with Jekyll. You can either set it in your _config.yml file with:
url: http://alanwsmith.com
Or, define it when you call jekyll:
jekyll --url http://alanwsmith.com
If you need to replace values in data/yml from another data/yml file, I wrote plugin. It's not so elegant but works :
I did some code improvements. Now it catch all occurrences in one string and work with nested values.
module LiquidReplacer
class Generator < Jekyll::Generator
REGEX = /\!([A-Za-z0-9]|_|\.){1,}\!/
def replace_str(str)
out = str
str.to_s.to_enum(:scan, REGEX).map {
m = Regexp.last_match.to_s
val = m.gsub('!', '').split('.')
vv = $site_data[val[0]]
val.delete_at(0)
val.length.times.with_index do |i|
if val.nil? || val[i].nil? || vv.nil? ||vv[val[i]].nil?
puts "ERROR IN BUILDING YAML WITH KEY:\n#{m}"
else
vv = vv[val[i]]
end
end
out = out.gsub(m, vv)
}
out
end
def deeper(in_hash)
if in_hash.class == Hash || in_hash.class == Array
_in_hash = in_hash.to_a
_out_hash = {}
_in_hash.each do |dd|
case dd
when Hash
_dd = dd.to_a
_out_hash[_dd[0]] = deeper(_dd[1])
when Array
_out_hash[dd[0]] = deeper(dd[1])
else
_out_hash = replace_str(dd)
end
end
else
_out_hash = replace_str(in_hash)
end
return _out_hash
end
def generate(site)
$site_data = site.data
site.data.each do |data|
site.data[data[0]] = deeper(data[1])
end
end
end
end
place this code in site/_plugins/liquid_replacer.rb
in yml file use !something.someval! like as site.data.something.someval but without site.data part.
example :
_data/one.yml
foo: foo
_data/two.yml
bar: "!one.foo!bar"
calling {{ site.data.two.bar }} will produce foobar
=======
OLD CODE
======
module LiquidReplacer
class Generator < Jekyll::Generator
REGEX = /\!([A-Za-z0-9]|_|\.){1,}\!/
def generate(site)
site.data.each do |d|
d[1].each_pair do |k,v|
v.to_s.match(REGEX) do |m|
val = m[0].gsub('!', '').split('.')
vv = site.data[val[0]]
val.delete_at(0)
val.length.times.with_index do |i|
vv = vv[val[i]]
end
d[1][k] = d[1][k].gsub(m[0], vv)
end
end
end
end
end
end
Another approach would be to add an IF statement to your head.html.
Instead of using page.layout like I did on my example below, you could use any variable from the page YAML header.
<title>
{% if page.layout == 'post' %}
Some text with {{ site.url }} variable
{% else %}
{{ site.description | escape }}
{% endif %}
</title>

Resources