Interpolating an XML string containing double quotes and colons in Ruby - ruby

In Ruby, I am taking an incoming from a file representing an XML document, but containing some Ruby interpolation code. Here is an example:
<ns1: xmlns:ns1="http://example.com" attr="#{Time.now}">
Now, when I want to evaluate the string to resolve the Ruby references, I have tried the following:
xs = '<ns1: xmlns:ns1="http://example.com" attr="#{Time.now}">'
eval("'" + xs + "'") #=> "<ns1: xmlns:ns1=\"http://example.com\" attr=\"\#{Time.now}\">"
eval %Q{"'" + #{xs} + "'"} # SyntaxError: (eval):1: syntax error, unexpected '<'
eval('"' + %Q{#{xs}} + '"') # SyntaxError: (eval):1: syntax error, unexpected tIDENTIFIER, expecting end-of-input
I don't know how else to do this. How can I evaluate the string with interpolation so that I get the following:
<ns1: xmlns:ns1="http://example.com" attr="2017-06-22 11:58:39 +0200">

As suggested by Jörg, you'll have much better experience if you use one of the templating languages. I suggested ERB, because it's built-in.
xs = '<ns1: xmlns:ns1="http://example.com" attr="<%= Time.now %>">'
require 'erb'
ERB.new(xs).result(binding)
# => "<ns1: xmlns:ns1=\"http://example.com\" attr=\"2017-06-23 09:11:56 +0300\">"

You are not looking for string interpolation. You are looking for a templating language.
String interpolation is for interpolating Ruby code in a string (or symbol) literal. You don't have a literal, you have a reference to an object. String interpolation doesn't work, it doesn't even apply to this situation.
What you have here instead is a templating language whose syntax happens to be identical to Ruby's string interpolation syntax. You need an implementation for that language; unfortunately, AFAIK there isn't one, so you will have to write your own. Writing a very simple, not very robust templating engine used to be a popular exercise in Ruby a couple of years ago, so I happen to know that it only takes a couple of minutes (and lines). (Making it robust, safe, and secure in the face of untrusted arbitrary input is a whole different matter, though.)
If you could change the input format to an existing templating language, that would be easiest. One well-known templating language in the Ruby world is ERb, which actually has an implementation in the standard library. There's also Tenjin, Liquid, Ruty, Mustache, to name but a few.

Related

How exactly do quotes in Ruby work to form a string?

I don't quite understand how string quotes in Ruby actually work. How does wrapping something in a quote suddenly make it a string? What exactly are the quotes doing? I'm trying to understand the C or core language implementation of this.
What exactly are the quotes doing?
The quotes themselves do nothing. They're just markers. Here's where a string starts, here's where it ends. When your code is being parsed to be executed, the parser will take what's between the quotes and make a string from that content. Simple as that.
If you take a compilers course in the school, chances are that you'll have to implement your own parser and compiler/interpreter for some toy language. Likely, with strings too. It's a fun exercise! :)
BTW, in ruby you can write a string literal in many ways. Not only using quotes. This is a string too, for example
html = <<-HTML
<head><title>stack overflow</title></head>
HTML
html # => " <head><title>stack overflow</title></head>\n"
In ruby the most common syntax for creating a string is using quotes like below.
my_msg = "Hello"
This is same in most other languages as well (c, java etc). AFAIK the language's parser is responsible for detecting the above syntax and continue to store Hello as a string in my_msg variable.
Ruby also has many other syntax for creating strings.

Can someone please explain what (:+) is in Ruby?

Can someone please explain what (:+) is in Ruby? I have tried googling it & looking a reference guides and cant find anything. Thanks, sorry Im pretty new to Ruby & programming.
A colon : before a sequence of characters* is a Symbol literal. This applies to :+, which is a Symbol with content "+".
A symbol can be used to reference a method with the same name in some contexts, and in a couple of places your example :+ can be a reference to the + operator, which is really just a method with the same name. Ruby supports syntax to call it when it sees a plain + in an expression, or in some core methods it will convert :+
As an example you can use :+ as shorthand to create a sum of an Array of integers:
[1,2,3,4].inject( :+ )
=> 10
This works because Ruby has special-cased that specific use of operators in Array#inject (actually defined in Enumberable#inject, and Array gets it from that module).
A more general use-case for a symbol like this is the send method:
2.send( :+, 2 )
=> 4
Although 2.send( "+", 2 ) works just fine too. It might seem odd when used like this instead of just 2 + 2, but it can be handy if you want to make a more dynamic choice of operator.
* The rules for the syntax allowed or not allowed in a Symbol literal are a little arcane. They enable you to write shorter literals where possible, but Ruby has to avoid some ambiguous syntax such as a Symbol with a . or whitespace in the middle. This is allowed, just you have to add quotes if you generate such a Symbol e.g. :"this.that"
Ruby will tell you
:+.class
# Symbol
(:+) is the symbol in parentheses.

Ruby Regular Expression lookahead to Split at pipe unless contained in brackets

I'm trying to decode the following string:
body = '{type:paragaph|class:red|content:[class:intro|body:This is the introduction paragraph.][body:This is the second paragraph.]}'
body << '{type:image|class:grid|content:[id:1|title:image1][id:2|title:image2][id:3|title:image3]}'
I need the string to split at the pipes but not where a pipe is contained with square brackets, to do this I think I need to perform a lookahead as described here: How to split string by ',' unless ',' is within brackets using Regex?
My attempt(still splits at every pipe):
x = self.body.scan(/\{(.*?)\}/).map {|m| m[0].split(/ *\|(?!\]) */)}
->
[
["type:paragaph", "class:red", "content:[class:intro", "body:This is the introduction paragraph.][body:This is the second paragraph.]"]
["type:image", "class:grid", "content:[id:1", "title:image1][id:2", "title:image2][id:3", "title:image3]"]
]
Expecting:
->
[
["type:paragaph", "class:red", "content:[class:intro|body:This is the introduction paragraph.][body:This is the second paragraph.]"]
["type:image", "class:grid", "content:[id:1|title:image1][id:2|title:image2][id:3|title:image3]"]
]
Does anyone know the regex required here?
Is it possible to match this regex? I can't seem to modify it correctly Regular Expression to match underscores not surrounded by brackets?
I modified the answer here Split string in Ruby, ignoring contents of parentheses? to get:
self.body.scan(/\{(.*?)\}/).map {|m| m[0].split(/\|\s*(?=[^\[\]]*(?:\[|$))/)}
Seems to do the trick. Though I'm sure if there's any shortfalls.
Dealing with nested structures that have identical syntax is going to make things difficult for you.
You could try a recursive descent parser (a quick Google turned up https://github.com/Ragmaanir/grammy - not sure if any good)
Personally, I'd go for something really hacky - some gsubs that convert your string into JSON, then parse with a JSON parser :-). That's not particularly easy either, though, but here goes:
require 'json'
b1 = body.gsub(/([^\[\|\]\:\}\{]+)/,'"\1"').gsub(':[',':[{').gsub('][','},{').gsub(']','}]').gsub('}{','},{').gsub('|',',')
JSON.parse('[' + b1 + ']')
It wasn't easy because the string format apparently uses [foo:bar][baz:bam] to represent an array of hashes. If you have a chance to modify the serialised format to make it easier, I would take it.
I modified the answer here Split string in Ruby, ignoring contents of parentheses? to get:
self.body.scan(/\{(.*?)\}/).map {|m| m[0].split(/\|\s*(?=[^\[\]]*(?:\[|$))/)}
Seems to do the trick. If it has any shortfalls please suggest something better.

Tokenize (lex? parse?) a regular expression

Using Ruby I'd like to take a Regexp object (or a String representing a valid regex; your choice) and tokenize it so that I may manipulate certain parts.
Specifically, I'd like to take a regex/string like this:
regex = /var (\w+) = '([^']+)';/
parts = ["foo","bar"]
and create a replacement string that replaces each capture with a literal from the array:
"var foo = 'bar';"
A naïve regex-based approach to parsing the regex, such as:
i = -1
result = regex.source.gsub(/\([^)]+\)/){ parts[i+=1] }
…would fail for things like nested capture groups, or non-capturing groups, or a regex that had a parenthesis inside a character class. Hence my desire to properly break the regex into semantically-valid pieces.
Is there an existing Regex parser available for Ruby? Is there a (horror of horrors) known regex that cleanly matches regexes? Is there a gem I've not found?
The motivation for this question is a desire to find a clean and simple answer to this question.
I have a JavaScript project on GitHub called: Dynamic (?:Regex Highlighting)++ with Javascript! you may want to look at. It parses PCRE compatible regular expressions written in both free-spacing and non-free-spacing modes. Since the regexes are written in the less-feature-rich JavaScript syntax, these regexes could be easily converted to Ruby.
Note that regular expressions may contain arbitrarily nested parentheses structures and JavaScript has no recursive regex features, so the code must parse the tree of nested parens from the-inside-out. Its a bit tricky but works quite well. Be sure to try it out on the highlighter demo page, where you can input and dynamically highlight any regex. The JavaScript regular expressions used to parse regular expressions are documented here.

Why are there so many slightly different ways to do the same thing in Ruby?

I am learning Ruby. My background is C++/Java/C#. Overall, I like the language, but I am a little confused about why there are so many different ways to accomplish the same thing, each with their own slightly different semantics.
Take string creation, for example. I can use '', "", q%, Q%, or just % to create strings. Some forms support interpolation. Other forms allow me to specify the string delimiters.
Why are there five ways to create string literals? Why would I ever use non-interpolated strings? What advantage does the % syntax have over quoted literals?
I know there must be value in the redundency in Ruby, but my untrained eyes are not clearly seeing it. Please enlighten me.
Why would I ever use non-interpolated strings?
When you don't want the interpolation, of course. For example, perhaps you're outputting some documentation about string interpolation:
'Use #{x} to interpolate the value of x.'
=> "Use #{x} to interpolate the value of x."
What advantage does the % syntax have over quoted literals?
It lets you write strings more naturally, without the quotes, or when you don't want to escape a lot of things, analogous to C#'s string-literal prefix #.
%{The % syntax make strings look more "natural".}
=> "The % syntax makes strings look more \"natural\"."
%{<basket size="50">}
=> "<basket size=\"50\">"
There are many other %-notations:
%w{apple banana #{1}cucumber} # [w]hitespace-separated array, no interpolation
=> ["apple", "banana", "\#{1}cucumber"]
%W{apple banana #{1}cucumber} # [W]hitespace-separated array with interpolation
=> ["apple", "banana", "1cucumber"]
# [r]egular expression (finds all unary primes)
%r{^1?$|^(11+?)\1+$}
=> /^1?$|^(11+?)\1+$/
(1..30).to_a.select{ |i| ("1" * i) !~ %r{^1?$|^(11+?)\1+$} }
=> [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]
%x{ruby --version} # [s]hell command
=> "ruby 1.9.1p129 (2009-05-12 revision 23412) [x86_64-linux]\n"
There's also %s (for symbols) and some others.
Why are there five ways to create string literals?
This isn't terribly unusual. Consider C#, for example, which has several different ways to generate strings: new String(); ""; #""; StringBuilder.ToString(), et cetera.
I'm not a Ruby expert, but had you ever heard the term "syntactic sugar" ? Basically some programing languages offer different syntax to accomplish the same task. Some people could find one way easier than others due to his previous programing/syntax experience.
The original question is why there are so many slightly different ways of doing things in Ruby.
Sometimes the different things are sensible: quoting is a good case where different behaviour requires different syntax - non/interpolating, alternate quoting characters, etc. - and historical accidence causes synonyms like %x() vs ``, much as in Perl.
The synonym issue - [].size [].length [].count - feels like an attempt to be helpful in a world where the language is too random for IDEs to be able to help: monkey-patching and the weird combination of strict but dynamic typing together make runtime errors an inevitable and frustrating part of the coding, so folks try to reduce the issue by supplying synonyms. Unfortunately, they end up confusing programmers who're accustomed to different methods doing different things.
The 'so similar but not quite' issue, for example ...
$ ruby -le 'e=[]; e << (*[:A, :B])'
-e:1: syntax error, unexpected ')', expecting :: or '[' or '.'
$ ruby -le 'e=[]; e << *[:A, :B]'
-e:1: syntax error, unexpected *
$ ruby -le 'e=[]; e.push(*[:A, :B])'
$
... can only really be viewed as a flaw. Every language has them, but they're usually more arcane than this.
And then there's the plain arbitrary 'use fail instead of raise unless you're just rethrowing an exception' nonsense in the Rubocop coding standards.
There are some nice bits in Ruby, but really - I'd far rather be coding in something better-founded.
In most situations, you'll end up using normal string delimiters. The main difference between single and double quotes is that double quotes allow you to interpolate variables.
puts 'this is a string'
# => this is a string
puts "this is a string"
# => this is a string
v = "string"
puts 'this is a #{v}'
# => this is a #{v}
puts "this is a #{v}"
# => this is a string
%q and %Q are useful when you can't use quotes because they are part of the internal string.
For example, you might end up writing
html = %Q{this is a <img src="#{img_path}" class="style" /> image tag}
In this case, you can't use double quotes as delimiters unless you want to escape internal attribute delimiters. Also, you can't use single quote because the img_path variable won't be interpolated.
A lot of ruby's syntax is derived from perl's, like using q to quote a few words into a string. That probably is the main reason for such a big variety.
One more reason is a minor performance boost for non-interpolated strings. Using '' vs "" means that Ruby doesn't have to consider what's inside the string at all. So you'll see people using single quotes for array keys or symbols because they're faster. For what it's worth I'll include a little benchmark.
require 'benchmark'
Benchmark.bmbm(10) do |x|
x.report("single-quote") do
for z in 0..1000000
zf = 'hello'
end
end
x.report("double-quote") do
for z in 0..1000000
zf = "hello"
end
end
x.report("symbol") do
for z in 0..1000000
zf = :hello
end
end
end
yields:
Rehearsal ------------------------------------------------
single-quote 0.610000 0.000000 0.610000 ( 0.620387)
double-quote 0.630000 0.000000 0.630000 ( 0.627018)
symbol 0.270000 0.000000 0.270000 ( 0.309873)
--------------------------------------- total: 1.580000sec
You would use non-interpolated strings if your string contains a lot of special characters (like backslashes, #{} etc.) and you don't want to escape all of them.
You'd use different delimiters if your string contains a lot of quotes that you'd otherwise have to escape.
You'd use heredocs if your strings has a lot of lines which would make normal string syntax look unwieldy.
Ruby borrows constructs and ideas from lots of languages. The two most apparent influences are Smalltalk and Perl.
Depending on your comfort with Smalltalk or Perl you may well choose different constructs to do the same thing.
Along the lines of John's answer:
In quick hacks, I often end up running a perl or sed one-liner with grep syntax from within my ruby script. Being able to use %[ ] type syntax means that I can simply copy-paste my regexp from the terminal

Resources