String interpolation in a large JSON string - ruby

I am attempting to replace a variable in a large search string I will be using to call Elastic Search, but I can't seem to get the variable into the string. Below is what I have attempted, using string interpolation to try and add my_var into the search string at 3 different places.
my_var = "foo"
search_string = '{"query":{"bool":{"must":[{"nested":{"path":"some_status", "query":{"bool":{"must":{"terms":{"status.code":["ACTIVE"]}}}}}}, {"bool":{"should":[{"nested":{"path":"acc", "query":{"bool":{"must":[{"match_phrase":{"acc.name.text":"{#my_var}"}}]}}}}, {"bool":{"must":[{"match_phrase":{"name.stuff":"{#my_var}"}}]}}, {"nested":{"path":"trade_names", "query":{"bool":{"must":[{"match_phrase":{"some_names.some_name.stuff":"{#my_var}"}}]}}}}]}}]}}}'
JSON.parse(search_string)
What am I doing wrong here?

First of all, it's #{my_var}, not {#my_var}.
Beside that typo, string interpolation needs "...", it is disabled in '...':
foo = 123
str = "foo = #{foo}" # <- turns #{foo} into 123
#=> "foo = 123"
str = 'foo = #{foo}' # <- keeps #{foo} literally
#=> "foo = \#{foo}"
Since your string contains many literal double quote characters, using "..." would add a lot of escape characters.
Apart from quotes, you could use the percent string %Q(...):
str = %Q({"foo":"#{foo}"})
#=> "{\"foo\":\"123\"}"
or a heredoc: (using JSON as the delimiter might enable syntax highlighting in your editor)
str = <<-JSON.chomp
{"foo":"#{foo}"}
JSON
#=> "{\"foo\":\"123\"}"
Another option is to construct a Ruby hash and turn that into JSON:
require 'json'
hash = { 'foo' => foo }
str = hash.to_json
#=> "{\"foo\":123}"
The JSON library also handles escaping for you.

Related

Lookahead regex in Ruby returns `nil` on irb

I have input:
s = "<tag1 value = \"HelloWorld\" val = \"1234\">"
I want to fetch 'HelloWorld' and '1234'.
I am using this regex expression
(?<=\")+[a-zA-Z0-9]*+(?=\\)
On rubular, it gives the expected result, but on irb, it returns nil:
s.scan(/(?<=\")+[a-zA-Z0-9]*+(?=\\)/) # => []
Why this is happening can anybody explain ? what I am missing
s = "<tag1 value = \"HelloWorld\" val = \"1234\">"
the string value is:
<tag1 value = "HelloWorld" val = "1234">
It can be easily checked by executing e. g. puts s. You see the backslashes there because the string in ruby might be declared using double quotes and in this case the double quotes inside string are to be escaped with backslashes. Other ways to declare the same string in ruby are:
s = '<tag1 value = "HelloWorld" val = "1234">'
s = %|<tag1 value = "HelloWorld" val = "1234">|
s = <<STR
<tag1 value = "HelloWorld" val = "1234">
STR
neither requires escaping double quotes. If you have copied the string as it was displayed in IRB to rubular, with escaping backslashes, you’ve matched another string.
That said, since there are no backslashes in the original string, nothing was matched in ruby. There are other glitches with the regexp you’ve used.
Here is the most careful version of the regexp:
s.scan /(?<=")\w+(?=")/
#⇒ ["HelloWorld", "1234"]
I've changed the regex slightly to wrap around the last \" instead. I suspect it has to do with \ being the Ruby escape character.
> "<tag1 value = \"HelloWorld\" val = \"1234\">".scan(/(?<=\")+[a-zA-Z0-9]*+(?=\")/)
#=> ["HelloWorld", "1234"]

Regex string with grouping?

I see in the documentation I'm able to do:
/\$(?<dollars>\d+)\.(?<cents>\d+)/ =~ "$3.67" #=> 0
puts dollars #=> prints 3
I was wondering if this would be possible:
string = "\$(\?<dlr>\d+)\.(\?<cts>\d+)"
/#{Regexp.escape(string)}/ =~ "$3.67"
I get:
`<main>': undefined local variable or method `dlr' for main:Object (NameError)
There are a few mistakes in your approach. First of all, let's look at your string:
string = "\$(\?<dlr>\d+)\.(\?<cts>\d+)"
You escape the dollar sign with "\$", but that is the same as just writing "$", consider:
"\$" == "$"
#=> true
To actually end up with the string "backslash followed by dollar" you would need to write "\\$". The same thing applies to the decimal character classes, you would have to write "\\d" to end up with the correct string.
The question marks on the other hand are actually part of the regex syntax, so you do not want to escape these at all. I recommend using single quotes for your original string, because that makes the input much easier:
string = '\$(?<dlr>\d+)\.(?<cts>\d+)'
#=> "\\$(?<dlr>\\d+)\\.(?<cts>\\d+)"
The next issue is with Regexp.escape. Take a look at what regular expression it produces with the above string:
string = '\$(?<dlr>\d+)\.(?<cts>\d+)'
Regexp.escape(string)
#=> "\\\\\\$\\(\\?<dlr>\\\\d\\+\\)\\\\\\.\\(\\?<cts>\\\\d\\+\\)"
That's one level too much escaping. Regexp.escape can be used when you want to match the literal characters that are contained in the string. For example, the escaped regex above will match the source string itself:
/#{Regexp.escape(string)}/ =~ string
#=> 0 # matches at offset 0
Instead, you can use Regexp.new to treat the source as an actual regular expression.
The last issue is then how you access the match result. Obviously, you are getting a NoMethodError. You might think that the match result is stored in local variables called dlr and cts, but that is not the case. You have two options to access the match data:
Use Regexp.match, it will return a MatchData object as result
Use regexp =~ string and then access the last match data with the global variable $~
I prefer the former, because it is easier to read. The full code would then look like this:
string = '\$(?<dlr>\d+)\.(?<cts>\d+)'
regexp = Regexp.new(string)
result = regexp.match("$3.67")
#=> #<MatchData "$3.67" dlr:"3" cts:"67">
result[:dlr]
#=> "3"
result[:cts]
#=> "67"

Eval a string without string interpolation

AKA How do I find an unescaped character sequence with regex?
Given an environment set up with:
#secret = "OH NO!"
$secret = "OH NO!"
##secret = "OH NO!"
and given string read in from a file that looks like this:
some_str = '"\"#{:NOT&&:very}\" bad. \u262E\n##secret \\#$secret \\\\###secret"'
I want to evaluate this as a Ruby string, but without interpolation. Thus, the result should be:
puts safe_eval(some_str)
#=> "#{:NOT&&:very}" bad. ☮
#=> ##secret #$secret \###secret
By contrast, the eval-only solution produces
puts eval(some_str)
#=> "very" bad. ☮
#=> OH NO! #$secret \OH NO!
At first I tried:
def safe_eval(str)
eval str.gsub(/#(?=[{#$])/,'\\#')
end
but this fails in the malicious middle case above, producing:
#=> "#{:NOT&&:very}" bad. ☮
#=> ##secret \OH NO! \###secret
You can do this via regex by ensuring that there are an even number of backslashes before the character you want to escape:
def safe_eval(str)
eval str.gsub( /([^\\](?:\\\\)*)#(?=[{#$])/, '\1\#' )
end
…which says:
Find a character that is not a backslash [^\\]
followed by two backslashes (?:\\\\)
repeated zero or more times *
followed by a literal # character
and ensure that after that you can see either a {, #, or $ character.
and replace that with
the non-backslash-maybe-followed-by-even-number-of-backslashes
and then a backslash and then a #
How about not using eval at all? As per this comment in chat, all that's necessary are escaping quotes, newlines, and unicode characters. Here's my solution:
ESCAPE_TABLE = {
/\\n/ => "\n",
/\\"/ => "\"",
}
def expand_escapes(str)
str = str.dup
ESCAPE_TABLE.each {|k, v| str.gsub!(k, v)}
#Deal with Unicode
str.gsub!(/\\u([0-9A-Z]{4})/) {|m| [m[2..5].hex].pack("U") }
str
end
When called on your string the result is (in your variable environment):
"\"\"\#{:NOT&&:very}\" bad. ☮\n\##secret \\\#$secret \\\\\###secret\""
Although I would have preferred not to have to treat unicode specially, it is the only way to do it without eval.

resolve #{var} in string

I have loaded a string with #{variable} references in it. How would I resolve those variables in the string like puts does?
name="jim"
str="Hi #{name}"
puts str
Instead of puts, I would like to have the result available to pass as a parameter or save into a variable.
you could eval it
name = "Patrick"
s = 'hello, #{name}'
s # => "hello, \#{name}"
# wrap the string in double quotes, making it a valid interpolatable ruby string
eval "\"#{s}\"" # => "hello, Patrick"
puts doesn't resolve the variables. The Ruby parser does when it creates the string. if you passed str to any other method, it would be the same as passing 'Hi jim', since the interpolation is already done.
String has a format option that appears as %. It can be used to pass arguments into a predefined string much like interpolation does.
message = "Hello, %s"
for_patrick = message % "Patrick" #=> "Hello, Patrick"
for_jessie = message % "Jessie" #=> "Hello, Jessie"
messages = "Hello, %s and %s"
for_p_and_j = messages % ["Patrick", "Jessie"] #=> "Hello, Patrick and Jessie"
It may not look "Rubyish" but I believe it is the functionality you are looking for.
So, if you have a string coming in from somewhere that contains these placeholders, you can then pass in values as arguments as so:
method_that_gets_hello_message % "Patrick"
This will also allow you to only accept values you are expecting.
message = "I can count to %d"
message % "eleven" #=> ArgumentError: invalid value for Integer()
There's a list on Wikipedia for possible placeholders for printf() that should also work in Ruby.
The eval seems to be the only solution for this particular task. But we can avoid this dirty-unsafe-dishonourable eval if we modify the task a bit: we can resolve not local, but instance variable without eval using instance_variable_get:
#name = "Patrick"
#id = 2 # Test that number is ok
#a_b = "oooo" # Test that our regex can eat underscores
s = 'hello, #{name} !!#{id} ??#{a_b}'
s.gsub(/#\{(\w+)\}/) { instance_variable_get '#'+$1 }
=> "hello, Patrick !!2 ??oooo"
In this case you even can use any other characters instead of #{} (for example, %name% etc), by only modifying the regex a bit.
But of course, all this smells.
It sounds like you want the basis for a template system, which Ruby does easily if you use String's gsub or sub methods.
replacements = { '%greeting%' => 'Hello', '%name%' => 'Jim' }
pattern = Regexp.union(replacements.keys)
'%greeting% %name%!'.gsub(pattern, replacements)
=> "Hello Jim!"
You could just as easily define the key as:
replacements = { '#{name}' => 'Jim' }
and use Ruby's normal string interpolation #{...} but I'd recommend not reusing that. Instead use something unique.
The advantage to this is the target => replacement map can easily be put into a YAML file, or a database table, and then you can swap them out with other languages, or different user information. The sky is the limit.
The benefit to this also, is there is no evaluation involved, it's only string substitution. With a bit of creative use you can actually implement macros:
macros = { '%salutation%' => '%greeting% %name%' }
replacements = { '%greeting%' => 'Hello', '%name%' => 'Jim' }
macro_pattern, replacement_pattern = [macros, replacements].map{ |h| Regexp.union(h.keys) }
'%salutation%!'.gsub(macro_pattern, macros).gsub(replacement_pattern, replacements)
=> "Hello Jim!"

Replace single quote with backslash single quote

I have a very large string that needs to escape all the single quotes in it, so I can feed it to JavaScript without upsetting it.
I have no control over the external string, so I can't change the source data.
Example:
Cote d'Ivoir -> Cote d\'Ivoir
(the actual string is very long and contains many single quotes)
I'm trying to this by using gsub on the string, but can't get this to work:
a = "Cote d'Ivoir"
a.gsub("'", "\\\'")
but this gives me:
=> "Cote dIvoirIvoir"
I also tried:
a.gsub("'", 92.chr + 39.chr)
but got the same result; I know it's something to do with regular expressions, but I never get those.
The %q delimiters come in handy here:
# %q(a string) is equivalent to a single-quoted string
puts "Cote d'Ivoir".gsub("'", %q(\\\')) #=> Cote d\'Ivoir
The problem is that \' in a gsub replacement means "part of the string after the match".
You're probably best to use either the block syntax:
a = "Cote d'Ivoir"
a.gsub(/'/) {|s| "\\'"}
# => "Cote d\\'Ivoir"
or the Hash syntax:
a.gsub(/'/, {"'" => "\\'"})
There's also the hacky workaround:
a.gsub(/'/, '\#').gsub(/#/, "'")
# prepare a text file containing [ abcd\'efg ]
require "pathname"
backslashed_text = Pathname("/path/to/the/text/file.txt").readlines.first.strip
# puts backslashed_text => abcd\'efg
unslashed_text = "abcd'efg"
unslashed_text.gsub("'", Regexp.escape(%q|\'|)) == backslashed_text # true
# puts unslashed_text.gsub("'", Regexp.escape(%q|\'|)) => abcd\'efg

Resources