Why do we use html_safe? - ruby

Unable to understand as to why we use html_safe than the conventional html to be rendered.
def group(content)
html = "".html_safe
html << "<div class='group'>".html_safe
html << content
html << "</div>".html_safe
html
end

I agree html_safe doesn't make much sense in this example because content_tag would be much shorter, easier to read and would automatically escape the user input:
def group(content)
content_tag(:div, content, class: 'group')
end

In Rails HTML ERB templates, strings passed into it are HTML escaped (to prevent Cross Site Scripting, which is injecting HTML code into your string so that attackers can execute JavaScript on visitors of your site). However, sometimes we know that our string is safe for HTML and don't want it to be escaped so that the HTML can actually be rendered. We do this by calling .html_safe on a string to mark it as being safe for HTML rendering. You generally want to avoid using this as much as possible since it makes it easier to make a mistake and cause XSS to be a possible attack on your site.

Related

Cast a Nokogiri::XML::Document to a Nokogiri::HTML::Document

I want to transform an XML document to HTML using XSL, tinker with it a little, then render it out. This is essentially what I'm doing:
source = Nokogiri::XML(File.read 'source.xml')
xsl = Nokogiri::XSLT(File.read 'transform.xsl')
transformed = xsl.transform(source)
html = Nokogiri::HTML(transformed.to_html)
html.title = 'Something computed'
Stylesheet::transform always returns XML::Document, but I need a HTML::Document instance to use methods like title=.
The code above works, but exporting and re-parsing as HTML is just awful. Since the target is a subclass of the source, there must be a more effective way to perform the conversion.
How can I clean up this mess?
As a side question, Nokogiri has generally underwhelmed me with its handling of doctypes, unawareness of <meta charset= etc... does anyone know of a less auto-magic library with similar capabilities?
Many thanks ;)
HTML::Document extends XML::Document, but the individual nodes in a HTML document are just plain XML::Nodes, i.e. there aren’t any HTML::Nodes. This suggests a way of converting an XML document to HTML by creating a new empty HTML::Document and setting its root to that of the XML document:
html = Nokogiri::HTML::Document.new
html.root= transformed.root
The new document has the HTML methods like title= and meta_encoding= available, and when serializing it creates a HTML document rather than HTML: adds a HTML doctype, correctly uses empty tags like <br>, displays minimized attributes where appropriate (e.g. <input type="checkbox" selected>) and doesn’t escape things like > in <script> blocks.

Put HTML in javascript using Ruby

Note: This is a very strange and unique use case so I apologise in advance if it seems a bit ass-backwards.
I have a haml file content.haml and a coffeescript file main.coffee.
I wish to somehow get the html resulting from rendering content.haml into a variable in the coffeescript/resulting javascript.
The end result should be a javascript file rendered to the browser.
let's say they look like this:
# content.haml
.container
.some_content
blah blah blah
-
# main.coffee
html_content = ???
do_something_with_html_content(html_content)
I know, this sounds ridiculous, 'use templates', 'fetch the HTML via ajax' etc. In this instance however, it's not possible, everything needs to be served via one JS file and I cannot fetch other resources from the server. Weird, I know.
Short of manually reconstructing the haml in the coffeescript file by joining an array of strings like this:
html_content = [
'<div class"container">',
'<div class"some_content">',
'blah blah blah',
'</div>',
'</div>',
]
I'm not sure the best way of doing this.
Another way I though of was to put something like this in the coffee file:
html_content = '###CONTENT###'
Then render the haml to html in ruby, render the coffeescript to js and then replace ###CONTENT### with the rendered html before serving to the client. However the html is a multi-line string so it completely destroys the javascript.
I'm convinced there must be some other nice way of rendering the haml into html in a variable such that it forms valid javascript, but my brain has gone blank.
Perhaps you can try something like this in one of your views:
:javascript
html_content = <%= escape_javascript(render partial: "content")%>
## your own logic follows here....
Wouldn't it be better to use a custom html data attribute and then fetch the content of it in js?
<div data-mycontent="YOUR CONTENT GOES HERE"></div>
And then in coffee, use the dataset attribute / data via jquery, if it is available.
If you set a var via writing the file directly it will render your js file uncacheable, among other drawbacks.
You can do that by using the sprockets gem, like Rails does. You just need to rename your CoffeeScript file to main.coffee.erb and use it as you would e.g. a haml template. Pass in your rendered html with an instance variable:
html_content = '<%= #html_content %>'
Edit: Added missing quotes.

HAML -> Backbone Template, Unescaping HTML Parameters

I'm using HAML to generate templates for a Backbone.js app. I need to be able to insert <%= blah %> as an html attribute a la:
%a{:href => "myresources/<% id %>"} My Resource
and have it output
<a href='myresources/<%= id %>' >My Resource</a>
in the html template. Unfortunately, HAML escapes the html parameters leaving me with
<a href='#myresources/<%= id %>'>My Resource</a>
According to the HAML Reference the '!' operator can be used for unescaping strings, but not within the HTML attributes.
Also, I'd use plaintext to render the anchor tag, but since the anchor tag is the root for this particular view, I lose all of the benefits of using HAML.
Any help?
Update
I didn't mention, but I'm using LiveReload to actually watch my file system and run the haml compiler, and there was a setting in LiveReload to disable HTML escapes in tag attributes. < head slap > If anyone else runs into this issue outside of LiveReload, you can also set the :escape_attrs option to false when configuring your HAML setup.
You can configure HAML to not escape tag attributes using the escape_attrs option in your HAML configuration. See HAML Options.
You can try using html_safe which is a method on String objects. This will escape the html characters in the variable statement (< for example) and will leave the intact for underscore to evaluate at runtime:
%a{:href => "myresources/<% id %>".html_safe} My Resource
Found on answer to Interpolate inside html attributes with Underscore.js

Convert HTML to plain text and maintain structure/formatting, with ruby

I'd like to convert html to plain text. I don't want to just strip the tags though, I'd like to intelligently retain as much formatting as possible. Inserting line breaks for <br> tags, detecting paragraphs and formatting them as such, etc.
The input is pretty simple, usually well-formatted html (not entire documents, just a bunch of content, usually with no anchors or images).
I could put together a couple regexs that get me 80% there but figured there might be some existing solutions with more intelligence.
First, don't try to use regex for this. The odds are really good you'll come up with a fragile/brittle solution that will break with changes in the HTML or will be very hard to manage and maintain.
You can get part of the way there very quickly using Nokogiri to parse the HTML and extract the text:
require 'nokogiri'
html = '
<html>
<body>
<p>This is
some text.</p>
<p>This is some more text.</p>
<pre>
This is
preformatted
text.
</pre>
</body>
</html>
'
doc = Nokogiri::HTML(html)
puts doc.text
>> This is
>> some text.
>> This is some more text.
>>
>> This is
>> preformatted
>> text.
The reason this works is Nokogiri is returning the text nodes, which are basically the whitespace surrounding the tags, along with the text contained in the tags. If you do a pre-flight cleanup of the HTML using tidy you can sometimes get a lot nicer output.
The problem is when you compare the output of a parser, or any means of looking at the HTML, with what a browser displays. The browser is concerned with presenting the HTML in as pleasing way as possible, ignoring the fact that the HTML can be horribly malformed and broken. The parser is not designed to do that.
You can massage the HTML before extracting the content to remove extraneous line-breaks, like "\n", and "\r" followed by replacing <br> tags with line-breaks. There are many questions here on SO explaining how to replace tags with something else. I think the Nokogiri site also has that as one of the tutorials.
If you really want to do it right, you'll need to figure out what you want to do for <li> tags inside <ul> and <ol> tags, along with tables.
An alternate attack method would be to capture the output of one of the text browsers like lynx. Several years ago I needed to do text processing for keywords on websites that didn't use Meta-Keyword tags, and found one of the text-browsers that let me grab the rendered output that way. I don't have the source available so I can't check to see which one it was.

Rails 3 unwanted html escaping

I am converting my fat Rails2 application to run on Rails3. After a long intense fight with an army of bugs and my bosses yells, the page is all rendered as an escaped html string. So all the divs, images etc. are written literally for the user.
For some reason this call of a partial renders an escaped string
<%= render :partial => 'something_really_interesting' %>
As all Ruby on Rails application this instruction is not called very much! So how would I handle all these calls to not render normally not as an escaped string?
Use <%= raw bla %> in inside the partial file.
http://github.com/rails/rails/blob/3270c58ebb3143b3ab3b349fe339cdd4587468ee/actionpack/lib/action_view/helpers/raw_output_helper.rb#L13
Rails 3 automatically makes everything safe. You need to put raw to escape the behavior. That also means you don't have to use h() method to make your string safe any more.

Resources