Include jekyll / liquid template data in a YAML variable? - ruby

I am using the YAML heading of a markdown file to add an excerpt variable to blog posts that I can use elsewhere. In one of these excerpts I refer to an earlier blog post via markdown link markup, and I use the liquid template data variable {{ site.url }} in place of the base URL of the site.
So I have something like (trimmed it somewhat)
---
title: "Decluttering ordination plots in vegan part 2: orditorp()"
status: publish
layout: post
published: true
tags:
- tag1
- tag2
excerpt: In the [earlier post in this series]({{ site.url }}/2013/01/12/
decluttering-ordination-plots-in-vegan-part-1-ordilabel/ "Decluttering ordination
plots in vegan part 1: ordilabel()") I looked at the `ordilabel()` function
----
However, jekyll and the Maruku md parser don't like this, which makes me suspect that you can't use liquid markup in the YAML header.
Is it possible to use liquid markup in the YAML header of pages handled by jekyll?
If it is, what I am I doing wrong in the example shown?
If it is not allowed, who else can I achieve what I intended? I am currently developing my site on my laptop and don't want to hard code the base URL as it'll have to change when I am ready to deploy.
The errors I am getting from Maruku are:
| Maruku tells you:
+---------------------------------------------------------------------------
| Must quote title
| ---------------------------------------------------------------------------
| the [earlier post in this series]({{ site.url }}/2013/01/12/decluttering-o
| --------------------------------------|-------------------------------------
| +--- Byte 40
and
| Maruku tells you:
+---------------------------------------------------------------------------
| Unclosed link
| ---------------------------------------------------------------------------
| the [earlier post in this series]({{ site.url }}/2013/01/12/decluttering-or
| --------------------------------------|-------------------------------------
| +--- Byte 41
and
| Maruku tells you:
+---------------------------------------------------------------------------
| No closing ): I will not create the link for ["earlier post in this series"]
| ---------------------------------------------------------------------------
| the [earlier post in this series]({{ site.url }}/2013/01/12/decluttering-or
| --------------------------------------|-------------------------------------
| +--- Byte 41

Today I ran into a similar problem. As a solution I created the following simple Jekyll filter-plugin which allows to expand nested liquid-templates in (e.g. liquid-variables in the YAML front matter):
module Jekyll
module LiquifyFilter
def liquify(input)
Liquid::Template.parse(input).render(#context)
end
end
end
Liquid::Template.register_filter(Jekyll::LiquifyFilter)
Filters can be added to a Jekyll site by placing them in the '_plugins' sub-directory of the site-root dir. The above code can be simply pasted into a yoursite/_plugins/liquify_filter.rb file.
After that a template like...
---
layout: default
first_name: Harry
last_name: Potter
greetings: Greetings {{ page.first_name }} {{ page.last_name }}!
---
{{ page.greetings | liquify }}
... should render some output like "Greetings Harry Potter!". The expansion works also for deeper nested structures - as long as the liquify filter is also specified on the inner liquid output-blocks. Something like {{ site.url }} works of course, too.
Update - looks like this is now available as a Ruby gem: https://github.com/gemfarmer/jekyll-liquify.

I don't believe it's possible to nest liquid variables inside YAML. At least, I haven't figure out how to do it.
One approach that will work is to use a Liquid's replace filter. Specifically, define a string that you want to use for the variable replacement (e.g. !SITE_URL!). Then, use the replace filter to switch that to your desired Jekyll variable (e.g. site.url) during the output. Here's a cut down .md file that behaves as expected on my jekyll 0.11 install:
---
layout: post
excerpt: In the [earlier post in this series](!SITE_URL!/2013/01/12/)
---
{{ page.excerpt | replace: '!SITE_URL!', site.url }}
Testing that on my machine, the URL is inserted properly and then translated from markdown into an HTML link as expected. If you have more than one item to replace, you can string multiple replace calls together.
---
layout: post
my_name: Alan W. Smith
multi_replace_test: 'Name: !PAGE_MY_NAME! - Site: [!SITE_URL!](!SITE_URL!)'
---
{{ page.multi_replace_test | replace: '!SITE_URL!', site.url | replace: '!PAGE_MY_NAME!', page.my_name }}
An important note is that you must explicitly set the site.url value. You don't get that for free with Jekyll. You can either set it in your _config.yml file with:
url: http://alanwsmith.com
Or, define it when you call jekyll:
jekyll --url http://alanwsmith.com

If you need to replace values in data/yml from another data/yml file, I wrote plugin. It's not so elegant but works :
I did some code improvements. Now it catch all occurrences in one string and work with nested values.
module LiquidReplacer
class Generator < Jekyll::Generator
REGEX = /\!([A-Za-z0-9]|_|\.){1,}\!/
def replace_str(str)
out = str
str.to_s.to_enum(:scan, REGEX).map {
m = Regexp.last_match.to_s
val = m.gsub('!', '').split('.')
vv = $site_data[val[0]]
val.delete_at(0)
val.length.times.with_index do |i|
if val.nil? || val[i].nil? || vv.nil? ||vv[val[i]].nil?
puts "ERROR IN BUILDING YAML WITH KEY:\n#{m}"
else
vv = vv[val[i]]
end
end
out = out.gsub(m, vv)
}
out
end
def deeper(in_hash)
if in_hash.class == Hash || in_hash.class == Array
_in_hash = in_hash.to_a
_out_hash = {}
_in_hash.each do |dd|
case dd
when Hash
_dd = dd.to_a
_out_hash[_dd[0]] = deeper(_dd[1])
when Array
_out_hash[dd[0]] = deeper(dd[1])
else
_out_hash = replace_str(dd)
end
end
else
_out_hash = replace_str(in_hash)
end
return _out_hash
end
def generate(site)
$site_data = site.data
site.data.each do |data|
site.data[data[0]] = deeper(data[1])
end
end
end
end
place this code in site/_plugins/liquid_replacer.rb
in yml file use !something.someval! like as site.data.something.someval but without site.data part.
example :
_data/one.yml
foo: foo
_data/two.yml
bar: "!one.foo!bar"
calling {{ site.data.two.bar }} will produce foobar
=======
OLD CODE
======
module LiquidReplacer
class Generator < Jekyll::Generator
REGEX = /\!([A-Za-z0-9]|_|\.){1,}\!/
def generate(site)
site.data.each do |d|
d[1].each_pair do |k,v|
v.to_s.match(REGEX) do |m|
val = m[0].gsub('!', '').split('.')
vv = site.data[val[0]]
val.delete_at(0)
val.length.times.with_index do |i|
vv = vv[val[i]]
end
d[1][k] = d[1][k].gsub(m[0], vv)
end
end
end
end
end
end

Another approach would be to add an IF statement to your head.html.
Instead of using page.layout like I did on my example below, you could use any variable from the page YAML header.
<title>
{% if page.layout == 'post' %}
Some text with {{ site.url }} variable
{% else %}
{{ site.description | escape }}
{% endif %}
</title>

Related

XPath problem with multiple OR expressions like (a|b|c) [duplicate]

This question already has an answer here:
Logical OR in XPath? Why isn't | working?
(1 answer)
Closed 1 year ago.
I have simplified html:
<html>
<main>
<span>one</span>
</main>
<not_important>
<div>skip_me</div>
</not_important>
<support>
<div>two</div>
</support>
</html>
I want to find only one and two, using conditions that the parent tag is main or support, and there is span or divafter it.
I wonder why that code does not work:
import lxml.html as HTML_PARSER
html = """
<html>
<main>
<span>one</span>
</main>
<not_important>
<div>skip_me</div>
</not_important>
<support>
<div>two</div>
</support>
</html>
"""
parent = '//main | //support'
child = '/span | /div'
doc = HTML_PARSER.fromstring(html)
print doc
xpath = '(%s)(%s)' % (parent, child)
print xpath
parsed = doc.xpath(xpath)
print parsed
I get an error Invalid expression. Why?
This (//main | //support) and this (/span | /div) xpaths are both correct.
Simple combo like (//main | //support)/span is also correct.
But why more complicated combination (//main | //support)(/span | /div) is not correct? How to resolve it?
In my real case //main, //support, /span and /div are really complicated xpaths, I want some general solution like (xpath1 | xpath2)(xpath3 | xpath4)
this will find it, however I'm not 100% sure if it's what you want:
//*[name() = 'main' or name() = 'support']/*[name() = 'span' or name() = 'div']/text()
Your XPath is not valid for XPath version 1 (the one that lxml use)
Try
xpath = '//div[parent::support]|//span[parent::main]'
or
parent = ['main', 'support']
child = ['span', 'div']
xpath = '//*[self::{0[0]} or self::{0[1]}]/*[self::{1[0]} or self::{1[1]}]'.format(parent, child)
You can use the self:: axis:
(//main | //support)[*[self::div or self::span]]

saltstack, multi-line pillar items interpolated into template

I have a pillar that looks like this:
inline_blocks:
the_seven: |
dog cat horse cow
ardvaark beatle snail
which I then want to insert into a file
{% set inline_block = pillar['inline_blocks'].get(val, '') %}
/etc/animals.conf:
file.managed:
- source: salt://farm/animals.conf
- user: root
- group: root
- mode: 644
- template: jinja
- defaults:
extras: {{ inline_block }}
and then in animals.conf,
{{ extras }}
I expect that if the key val is in inline_blocks, then its value will be interpolated in. If it's not, an empty string will be interpolated in.
Indeed, that's what happens if I write the defaults statement explicitly:
- defaults:
extras: |
dog cat horse cow
ardvaark beatle snail
but as written above, I get the error could not find expected ':'.
As a reality check, pillar.items happily retrieves the pillar entry, so (1) the pillar entry can be retrieved, and (2) the value can be interpolated, but (X) the multi-line value in the .sls file is causing problems.
Any pointers what the right syntax is to do this?
This issue was discussed in a bug as well, but for the content parameter. It seems to apply to multi-line YAML blocks passed as defaults or context also.
Since you are using a Jinja template file as a source, we can easily fetch pillar data from template itself (as one of the comments suggests in above link).
Considering pillar as:
inline_blocks:
val: |
dog cat horse cow
ardvaark beatle snail
Then the animals.conf.j2 template as:
{{ salt.pillar.get('inline_blocks:val', default="foo") }}
Note: If this pillar data is assured to be always defined, we might even use pillar['inline_blocks']['val'] in the template.
Rendered with a state like:
create-animals-conf:
file.managed:
- name: /tmp/animals.conf
- source: salt://animals.conf.j2
- mode: 0664
- template: jinja
Should yield the template as you expect:
$ cat /tmp/animals.conf
dog cat horse cow
ardvaark beatle snail
What happens is that
- defaults:
extras: {{ inline_block }}
is processed into
- defaults:
extras: dog cat horse cow
ardvaark beatle snail
So YAML tries to parse the second line as another top-level key. However, the : marking the end of this key never comes, hence the error.
To fix it, do this:
- defaults:
extras: |
{{ inline_block | indent(10) }}
indent doesn't indent the first line, but will add 10 spaces to every subsequent line.

How to export pdf table data into csv?

I am using Rails 4.2, Ruby 2.2, Gem: 'pdf-reader'.
My application will read pdf file which has table-data and it exports into CSV which i have already done. When i match result with table header and table content, they are in wrong position, yes because pdf table is not a actual table, we need to write some extra logic behind this which I am asking for.
marks.pdf has content similar as shown below
School Name: ABC
Program: MicroBiology Year: Second
| Roll No | Math |
|----------- |-------- |
1000001 | 65
|----------- |-------- |
Any help would be appreciated.
Working code which reads PDF and export to CSV is given below
class ExportToCsv
# method useful to export pdf to csv
def convert_to_csv
pdf_reader = PDF::Reader.new("public/marks.pdf")
csv = CSV.open("output100.tsv","wb", {:col_sep => "\t"})
data_header = ""
pdf_reader.pages.each do |page|
page.text.each_line do |line|
# line with characters
if /^[a-z|\s]*$/i=~line
data_header = line.strip
else
# line with number
data_row = line.split(/[0-9]/).first
csv_line = line.sub(data_row,'').strip.split(/[\(|\)]/)
csv_line.unshift(data_row).unshift(data_header)
csv << csv_line
end
end
end
end
end
I am not able to attach original pdf here because of security, sorry for that. You can generate the pdf as per below screenshot.
The screen of pdf is given below:
The screen of generated Csv is given below:
Desired pdf should be like below image

Overload Jinja2 autoescape for (La)TeX

Is it possible to overload Jinja2's autoescape so that it escapes something in a user-specified way (i.e. something other than HTML such as LaTeX)?
Here's an example trying to escape TeX.
import jinja2
class MyEnv(jinja2.Environment):
def __init__(self, filters={}, globals={}, tests={},
loader=None, extensions=[], **kwargs):
super(MyEnv, self).__init__(
autoescape = True,
)
template = MyEnv().from_string("""\documentclass[{{ class }}]
\\begin{document}
{{ content }}
\end{document}
""")
print template.render({
'class':'memoir',
'content': '{bob} <-- is escaped',
})
When you run the above, it outputs:
\documentclass[memoir]
\begin{document}
{bob} <-- is escaped
\end{document}
The problem here is that HTML escaping is used. So { and } should be escaped, but they're not, and < is converted to < but it should not be.
I'd like to overload the escape function that Jinja2 uses to escape variables.
My first thought is to overload finalize and disable autoescape. e.g.
import jinja2
class MyEnv(jinja2.Environment):
def __init__(self, filters={}, globals={}, tests={},
loader=None, extensions=[], **kwargs):
super(MyEnv, self).__init__(
autoescape = False, # turn off autoescape
finalize = self.finalize,
)
def finalize(self, s):
import re
if isinstance(s, jinja2.Markup):
return s
s = s.replace('\\', '')
s = s.replace('~', '\\textasciitilde')
s = re.sub(r'([#|^|$|&|%|{|}])', r'\\\1', s)
s = re.sub(r'_', r'\\_', s)
return jinja2.Markup(s)
template = MyEnv().from_string("""\documentclass[{{ class }}]
\\begin{document}
{{ content }}
\end{document}
""")
print template.render({
'class':'memoir',
'content': '{bob} <-- is escaped',
})
The output is incorrect, because the main text isn't made into Markup (i.e. a string flagged as safe):
documentclass[memoir]
begin\{document\}
\{bob\} <-- is escaped
end\{document\}
If I set autoescape to True, and leave in finalize it almost works (and in this example, it does work):
\documentclass[memoir]
\begin{document}
\{bob\} <-- is escaped
\end{document}
Turning autoescape on works because it makes the main body of text for the template as Markup (i.e. safe).
However, here's where the problem lies, if I change the input to a list that's joined:
template = MyEnv().from_string("""\documentclass[{{ class }}]
\\begin{document}
{{ content|join(" > a & b > "|safe) }}
\end{document}
""")
print template.render({
'class':'memoir',
'content': ['A&B', 'C<D'],
})
When I run this I get:
\documentclass[memoir]
\begin{document}
A&B > a & b > C<D
\end{document}
It would seem HTML autoescape is being run on the elements of 'content', rather than finalize. The simplest solution, provided Jinja2 and its autoescaping are loosely coupled, would seem to be to overload a autoescape function. I can't seem to figure that out, and the best I've come up with is the finalize function.
Is there a better way to handle escaping of TeX than overloading the finalize function? Can one overload autoescape?
For example, could one install a custom Markup package? (a choice I'd prefer to avoid)
Thank you for reading.

YAML/Ruby: Get the first item whose <field> is <value>?

I have this YAML:
- company:
- id: toyota
- fullname: トヨタ自動車株式会社
- company:
- id: konami
- fullname: Konami Corporation
And I want to get the fullname of the company whose id is konami.
Using Ruby 1.9.2, what is the simplest/usual way to get it?
Note: In the rest of my code, I have been using require "yaml" so I would prefer to use the same library.
This works too and does not use iteration:
y = YAML.load_file('japanese_companies.yml')
result = y.select{ |x| x['company'].first['id'] == 'konami' }
result.first['company'].last['fullname'] # => "Konami Corporation"
Or if you have other attributes and you can't be sure fullname is the last one:
result.first['company'].select{ |x| x['fullname'] }.first['fullname']
I agree with Ray Toal, if you change your yml it becomes much easier. E.g.:
toyota:
fullname: トヨタ自動車株式会社
konami:
fullname: Konami Corporation
With the above yaml, fetching the fullname of konami becomes much easier:
y = YAML.load_file('test.yml')
y.fetch('konami')['fullname']
Your YAML is a little unconventional but we can compensate.
A brute force approach is (I'm not sure if this can be done without parsing the YAML):
require 'yaml'
YAML.parse_file(ARGV[0]).transform.each do |company|
properties = {}
company['company'].each {|h| properties = properties.merge(h)}
puts properties['fullname'] if properties['id'] == 'konami'
end
Pass your YAML file in as the first argument to this script.
Feel free to adapt into a method that takes the YAML as a string and returns the desired fullname. (A return is useful because it directly answers the OP's question of obtaining the first such company.)

Resources