Prevent double escaping with CodeRay and RDiscount - ruby

I am looking for the most straight forward way to have code syntax highlighting in markdown, using Ruby (without Rails).
I have tried some things with Kramdown and Rouge, and could not make it work, so I am now working with RDiscount and CodeRay.
Most of the things work as I expect, with one small and one big issue:
Small Issue: The only way I found to make CodeRay work with RDiscount, is by applying the highlighting on the HTML rather than on the markdown document. This seems a little off to me and prone to errors. Is there another way?
Big Issue: I am now facing a double HTML escaping issue, and was unable to find any html_escape: false option in the CodeRay documentation.
Code
require 'rdiscount'
require 'coderay'
markdown = <<EOF
```ruby
A > B
```
EOF
def coderay(text)
text.gsub(/\<code class="(.+?)"\>(.+?)\<\/code\>/m) do
CodeRay.scan($2, $1).html
end
end
html = RDiscount.new(markdown).to_html
html = coderay(html)
puts html
Output
Notice the double escaping on the greater than sign:
<pre>
<span class="constant">A</span> &gt; <span class="constant">B</span>
</pre>
I have found this related question, but its old and without a solution for this case.
The only way I can come up with, is to unescape the HTML before passing it through CodeRay, but this does not feel right to me. Nevertheless, a working alternative below:
def coderay(text)
text.gsub(/\<code class="(.+?)"\>(.+?)\<\/code\>/m) do
lang, code = $1, $2
code = CGI.unescapeHTML code
CodeRay.scan(code, lang).html
end
end

Related

Why can't Nokogiri wrap my image with a link?

I'm confused about the reaction of Nokogiri (1.6.6.2), when I try to wrap an image with a link tag. Here is an example of my problem:
fragment = Nokogiri::HTML5.fragment("<p>Example</p><img src='test.jpg' class='test'><p>Example</p>")
Now I would like to wrap the image with a link:
fragment.search('img').wrap('')
This unfortunately results in an error:
ArgumentError: Requires a Node, NodeSet or String argument, and cannot accept a NilClass.
(You probably want to select a node from the Document with at() or search(), or create a new Node via Node.new().)
Now the very strange this is, it works with other tags:
fragment.search('img').wrap('<something href="http://www.google.com"></something>')
Why is Nokogiri doing that? Is it a bug?
The first problem is:
uninitialized constant Nokogiri::HTML5 (NameError)
You want Nokogiri::HTML instead.
Running this:
require 'nokogiri'
fragment = Nokogiri::HTML.fragment("<p>Example</p><img src='test.jpg' class='test'><p>Example</p>")
fragment.search('img').wrap('<a href="test">')
and looking at fragment afterwards:
puts fragment.to_html
# >> <p>Example</p><img src="test.jpg" class="test"><p>Example</p>
It appears to be working correctly. Adding the trailing </a> also works.
Perhaps you need to check your Nokogiri and libXML2 versions.

Ruby Nokogiri - How to prevent Nokogiri from printing HTML character entities

I have a html which I am parsing using Nokogiri and then generating a html out of this like this
htext= File.open(input.html).read
h_doc = Nokogiri::HTML(htmltext)
/////Modifying h_doc//////////
File.open(output.html, 'w+') do |file|
file.write(h_doc)
end
Question is how to prevent NOkogiri from printing HTML character entities (< >, & ) in the final generated html file.
Instead of HTML character entities (< > & ) I want to print actual character (< ,> etc).
As an example it is printing the html like
<title><%= ("/emailclient=sometext") %></title>
and I want it to output like this
<title><%= ("/emailclient=sometext")%></title>
So... you want Nokogiri to output incorrect or invalid XML/HTML?
Best suggestion I have, replace those sequences with something else beforehand, cut it up with Nokogiri, then replace them back. Your input is not XML/HTML, there is no point expecting Nokogiri to know how to handle it correctly. Because look:
<div>To write "&", you need to write "&amp;".</div>
This renders:
To write "&", you need to write "&".
If you had your way, you'd get this HTML:
<div>To write "&", you need to write "&".</div>
which would render as:
To write "&", you need to write "&".
Even worse in this scenario, say, in XHTML:
<div>Use the <script> tag for JavaScript</div>
if you replace the entities, you get undisplayable file, due to unclosed <script> tag:
<div>Use the <script> tag for JavaScript</div>
EDIT I still think you're trying to get Nokogiri to do something it is not designed to do: handle template HTML. I'd rather assume that your documents normally don't contain those sequences, and post-correct them:
doc.traverse do |node|
if node.text?
node.content = node.content.gsub(/^(\s*)(\S.+?)(\s*)$/,
"\\1<%= \\2 %>\\3")
end
end
puts doc.to_html.gsub('<%=', '<%=').gsub('%>', '%>')
You absolutely can prevent Nokogiri from transforming your entities. Its a built in function even, no voodoo or hacking needed. Be warned, I'm not a nokogiri guru and I've only got this to work when I'm actuing directly on a node inside document, but I'm sure a little digging can show you how to do it with a standalone node too.
When you create or load your document you need to include the NOENT option. Thats it. You're done, you can now add entities to your hearts content.
It is important to note that there are about half a dozen ways to call a doc with options, below is my personal favorite method.
require 'nokogiri'
noko_doc = File.open('<my/doc/path>') { |f| Nokogiri.<XML_or_HTML>(f, &:noent)}
xpath = '<selector_for_element>'
noko_doc.at_<css_or_xpath>(xpath).set_attribute('I_can_now_safely_add_preformatted_entities!', '&&&&&')
puts noko_doc.at_xpath(xpath).attributes['I_can_now_safely_add_preformatted_entities!']
>>> &&&&&
As for as usefulness of this feature... I find it incredibly useful. There are plenty of cases where you are dealing with preformatted data that you do not control and it would be a serious pain to have to manage incoming entities just so nokogiri could put them back the way they were.

writing a short script to process markdown links and handling multiple scans

I'd like to process just links written in markdown. I've looked at redcarpet which I'd be ok with using but I really want to support just links and it doesn't look like you can use it that way. So I think I'm going to write a little method using regex but....
assuming I have something like this:
str="here is my thing [hope](http://www.github.com) and after [hxxx](http://www.some.com)"
tmp=str.scan(/\[.*\]\(.*\)/)
or if there is some way I could just gsub in place [hope](http://www.github.com) -> <a href='http://www.github.com'>hope</a>
How would I get an array of the matched phrases? I was thinking once I get an array, I could just do a replace on the original string. Are there better / easier ways of achieving the same result?
I would actually stick with redcarpet. It includes a StripDown render class that will eliminate any markdown markup (essentially, rendering markdown as plain text). You can subclass it to reactivate the link method:
require 'redcarpet'
require 'redcarpet/render_strip'
module Redcarpet
module Render
class LinksOnly < StripDown
def link(link, title, content)
%{#{content}}
end
end
end
end
str="here is my thing [hope](http://www.github.com) and after [hxxx](http://www.some.com)"
md = Redcarpet::Markdown.new(Redcarpet::Render::LinksOnly)
puts md.render(str)
# => here is my thing hope and ...
This has the added benefits of being able to easily implement a few additional tags (say, if you decide you want paragraph tags to be inserted for line breaks).
You could just do a replace.
Match this:
\[([^[]\n]+)\]\(([^()[]\s"'<>]+)\)
Replace with:
\1
In Ruby it should be something like:
str.gsub(/\[([^[]\n]+)\]\(([^()[]\s"'<>]+)\)/, '\1')

Haml doesn't evaluate embedded Ruby code

Why the code below (which is taken from http://haml-lang.com/docs/yardoc/file.HAML_REFERENCE.html#ruby_blocks) renders to <p>See, I can count!</p> and doesn't output numbers from 42 to 47?
- (42...47).each do |i|
%p= i
%p See, I can count!
I used #haml.try page in order to test the haml snippet.
The online version does not allow you to run ruby code, as it says on the website :)
Give Haml a try online! Just type in some Haml code below, press Render, and see the beautiful HTML output. You can’t use any real Ruby code here, but feel free to use Ruby hash attributes.
It works fine if you run it locally. The online version may not be evaluating the ruby code.
$ haml
- (42...47).each do |i|
%p= i
%p See, I can count!
^Z
<p>42</p>
<p>43</p>
<p>44</p>
<p>45</p>
<p>46</p>
<p>See, I can count!</p>

How to select a different theme/style for CodeRay syntax highlighting Ruby code?

I'm having difficulty figuring out how to select a different theme/style for syntax highlighting of Ruby code using the CodeRay gem, the default is OK but i wonder if there's anything else on offer? i can't seem to find them.
Thanks
It is my understanding that you generate your own coderay.css file. Coderay doesn't have "themes".
Try this:
CodeRay stylesheet
And then:
doc = Nokogiri::HTML(html)
doc.search("//pre//code[#class]").each do |pre|
tokens = CodeRay.scan(pre.text, pre[:class])
pre.replace tokens.div(:css => :class)
end
The trick is to render your tokens with a CSS class.

Resources