Regexp.escape adds weird escapes to a plain space - ruby

I stumbled over this problem using the following simplified example:
line = searchstring.dup
line.gsub!(Regexp.escape(searchstring)) { '' }
My understanding was, that for every String stored in searchstring, the gsub! would cause that line is afterwards empty. Indeed, this is the case for many strings, but not for this case:
searchstring = "D "
line = searchstring.dup
line.gsub!(Regexp.escape(searchstring)) { '' }
p line
It turns out, that line is printed as "D " afterwards, i.e. no replacement had been performed.
This happens to any searchstring containing a space. Indeed, if I do a
p(Regexp.escape(searchstring))
for my example, I see "D\\ " being printed, while I would expect to get "D " instead. Is this a bug in the Ruby core library, or did I misuse the escape function?
Some background: In my concrete application, where this simplified example is derived from, I just want to do a literal string replacement inside a long string, in the following way:
REPLACEMENTS.each do
|from, to|
line.chomp!
line.gsub!(Regexp.escape(from)) { to }
end
. I'm using Regexp.escape just as a safety measure in the case that the string being replaced contains some regex metacharacter.
I'm using the Cygwin port of MRI Ruby 2.6.4.

line.gsub!(Regexp.escape(searchstring)) { '' }
My understanding was, that for every String stored in searchstring, the gsub! would cause that line is afterwards empty.
Your understanding is incorrect. The guarantee in the docs is
For any string, Regexp.new(Regexp.escape(str))=~str will be true.
This does hold for your example
Regexp.new(Regexp.escape("D "))=~"D " # => 0
therefore this is what your code should look like
line.gsub!(Regexp.new(Regexp.escape(searchstring))) { '' }
As for why this is the case, there used to be a bug where Regex.escape would incorrectly handle space characters:
# in Ruby 1.8.4
Regex.escape("D ") # => "D\\s"
My guess is they tried to keep the fix as simple as possible by replacing 's' with ' '. Technically this does add an unnecessary escape character but, again, that does not break the intended use of the method.

This happens to any searchstring containing a space. Indeed, if I do a
p(Regexp.escape(searchstring))
for my example, I see "D\\ " being printed, while I would expect to get "D " instead. Is this a bug in the Ruby core library, or did I misuse the escape function?
This looks to be a bug. In my opinion, whitespace is not a Regexp meta character, there is no need to escape it.
Some background: In my concrete application, where this simplified example is derived from, I just want to do a literal string replacement inside a long string […]
If you want to do literal string replacement, then don't use a Regexp. Just use a literal string:
line.gsub!(from, to)

Related

Why ruby controller would escape the parameters itself?

I am writing Ruby application for the back end service. There is a controller which would accept request from front-end.
Here is the case, there is a GET request with a parameter containing character "\n".
def register
begin
request = {
id: params[:key]
}
.........
end
end
The "key" parameter is passing from AngularJs as "----BEGIN----- \n abcd \n ----END---- \n", but in the Ruby controller the parameter became "----BEGIN----- \\n abcd \\n ----END---- \\n" actually.
Anyone has a good solution for this?
Yes, this is because of the ruby way to read the escape character. You can read the explanation right here: Escaping characters in Ruby
I got this issue once, and I just use gsub! to change the \\n to \n. What you should do is:
def register
begin
request = {
id: params[:key].gsub!("\\n", "\n")
}
.........
end
end
Remember, you have to use double quotation " instead of single quotation '. From the link I gave:
The difference between single and double quoted strings in Ruby is the way the string definitions represent escape sequences.
In double quoted strings, you can write escape sequences and Ruby will output their translated meaning. A \n becomes a newline.
In single quoted strings however, escape sequences are escaped and return their literal definition. A \n remains a \n.

start_with not working for backslash in ruby

I have the following string -
abcdefgh;
lmnopqrst;
On doing a string = string.split(";"), I get -
["abcdefgh", "\nlmnopqrst"]
Now when I do -
string[1].start_with?("\\")
The function returns false. Whereas if I do
string[0].start_with?("a")
The function return true.
I am new to ruby and just can't understand this behavior. Can anyone tell me what am I doing wrong.
I dont know, butString[1][0] (first character from string) returns "\n" so maybe use this
string[1].start_with?("\n")
This is because "\n" actually does not start with a backslash . It is the line feed character and is considered to be a single character and for that reason it is only presented having the escape character \ in front of it.
So:
string[1].start_with?("\n")
Will return true.
You already tried to search with string[1].start_with?("\\") so you seem to realize you need to escape the backslash character by using \\.
If your input string would look like this:
\abcdefgh;
lmnopqrst;
Then after .split(';') your resulting array would look like this:
["\\abcdefgh;", "\nlmnopqrst"]
Now string[0].start_with?("\\") would return true because the first string actually starts with a single backslash, which was presented with the escape character in the console.
you can try
'\nhello world'.start_with?("\\") # return true
"\nhello world".start_with?("\\") # return false
because '\n' is two chars( \ and n), but "\n" is one char(new line char).
The first character there is not "\" - it's "\n" in the first example, and "\\" in the second. "\n" and "\\" are effectively single characters in this context, even though they look like two characters.
"\n" != "\\", and so start_with? responds false.

Removing all whitespace from a string in Ruby

How can I remove all newlines and spaces from a string in Ruby?
For example, if we have a string:
"123\n12312313\n\n123 1231 1231 1"
It should become this:
"12312312313123123112311"
That is, all whitespaces should be removed.
You can use something like:
var_name.gsub!(/\s+/, '')
Or, if you want to return the changed string, instead of modifying the variable,
var_name.gsub(/\s+/, '')
This will also let you chain it with other methods (i.e. something_else = var_name.gsub(...).to_i to strip the whitespace then convert it to an integer). gsub! will edit it in place, so you'd have to write var_name.gsub!(...); something_else = var_name.to_i. Strictly speaking, as long as there is at least one change made,gsub! will return the new version (i.e. the same thing gsub would return), but on the chance that you're getting a string with no whitespace, it'll return nil and things will break. Because of that, I'd prefer gsub if you're chaining methods.
gsub works by replacing any matches of the first argument with the contents second argument. In this case, it matches any sequence of consecutive whitespace characters (or just a single one) with the regex /\s+/, then replaces those with an empty string. There's also a block form if you want to do some processing on the matched part, rather than just replacing directly; see String#gsub for more information about that.
The Ruby docs for the class Regexp are a good starting point to learn more about regular expressions -- I've found that they're useful in a wide variety of situations where a couple of milliseconds here or there don't count and you don't need to match things that can be nested arbitrarily deeply.
As Gene suggested in his comment, you could also use tr:
var_name.tr(" \t\r\n", '')
It works in a similar way, but instead of replacing a regex, it replaces every instance of the nth character of the first argument in the string it's called on with the nth character of the second parameter, or if there isn't, with nothing. See String#tr for more information.
You could also use String#delete:
str = "123\n12312313\n\n123 1231 1231 1"
str.delete "\s\n"
#=> "12312312313123123112311"
You could use String#delete! to modify str in place, but note delete! returns nil if no change is made
Alternatively you could scan the string for digits /\d+/ and join the result:
string = "123\n\n12312313\n\n123 1231 1231 1\n"
string.scan(/\d+/).join
#=> "12312312313123123112311"
Please note that this would also remove alphabetical characters, dashes, symbols, basically everything that is not a digit.

Ruby string sub without regex back references

I'm trying to do a simple string sub in Ruby.
The second argument to sub() is a long piece of minified JavaScript which has regular expressions contained in it. Back references in the regex in this string seem to be effecting the result of sub, because the replaced string (i.e., the first argument) is appearing in the output string.
Example:
input = "string <!--tooreplace--> is here"
output = input.sub("<!--tooreplace-->", "\&")
I want the output to be:
"string \& is here"
Not:
"string & is here"
or if escaping the regex
"string <!--tooreplace--> is here"
Basically, I want some way of doing a string sub that has no regex consequences at all - just a simple string replace.
To avoid having to figure out how to escape the replacement string, use Regex.escape. It's handy when replacements are complicated, or dealing with it is an unnecessary pain. A little helper on String is nice too.
input.sub("<!--toreplace-->", Regexp.escape('\&'))
You can also use block notation to make it simpler (as opposed to Regexp.escape):
=> puts input.sub("<!--tooreplace-->") {'\&'}
string \& is here
Use single quotes and escape the backslash:
output = input.sub("<!--tooreplace-->", '\\\&') #=> "string \\& is here"
Well, since '\\&' (that is, \ followed by &) is being interpreted as a special regex statement, it stands to reason that you need to escape the backslash. In fact, this works:
>> puts 'abc'.sub 'b', '\\\\&'
a\&c
Note that \\\\& represents the literal string \\&.

gsub! On an argument doesn't work

I am making a function that turns the first argument into a PHP var (useless, I know), and set it equal to the second argument. I'm trying to gsub! it to get rid of all the characters that can't be used in a PHP var. Here is what I have:
dvar = "$" + name.gsub!(/.?\/!#\#{}$%^&*()`~/, "") { |match| puts match }
I have the puts match there to make sure some of the characters were removed. name is a variable passed into a method in which this is its purpose. I am getting this error:
TypeError: can't convert nil into String
cVar at ./Web.rb:31
(root) at C:\Users\Andrew\Documents\NetBeansProjects\Web\lib\main.rb:13
Web.rb is the file this line is in, and main.rb is the file calling this method. How can I fix this?
EDIT: If I remove the ! in gsub!, it goes through, but the characters aren't removed.
Short answer
Use dvar = "$" + name.tr(".?\/!#\#{}$%^&*()``~", '')
Long answer
The problem you are facing is that the gsub! call is returning nil. You can't concatenate (+) a String with a nil.
That's happening because you have a malformed Regexp. You aren't escaping the special regex symbols, like $, * and ., just for a start. Also, the way it is now, gsub will only match if your string contains all that symbols in sequence. You should use the pipe (|) operator to make an OR like operation.
gsub! will also return nil if no substitutions happened.
See the documentation for gsub and gsub! here: http://ruby-doc.org/core/classes/String.html#M001186
I think you should replace gsub! with gsub. Do you really need name to change?
Example:
name = "m$var.name$$"
dvar = "$" + name.gsub!(/\$|\.|\*/, "") # $ or . or *
# dvar now contains $mvarname and name is mvarname
Your line, corrected:
dvar = "$" + name.gsub(/\.|\?|\/|\!|\#|\\|\#|\{|\}|\$|\%|\^|\&|\*|\(|\)|\`|\~/, "")
# some things shouldn't (or aren't needed to) be escaped, I don't remember them all right now
As J-_-L appointed, you could also use a character class ([]), that makes it a little clearer, I guess. Well, it's hard to mentally parse anyway.
dvar = "$" + name.gsub(/[\.\?\/\!\#\\\#\{\}\$\%\^\&\*\(\)\`\~]/, "")
But because what you are doing is simple character replacement, the best method is tr (again reminded by J-_-L!):
dvar = "$" + name.tr(".?\/!#\#{}$%^&*()`~", '')
Way easier to read and make modifications.
You cannot apply a second parameter
and a block to gsub (the block is ignored)
The regex is wrong, you forgot the
square brackets:
/[.?\/!#\#{}$%^&*()~]/`
Because your regex is wrong, it
didn't match anything and because
gsub! returns nil if nothing was
replaced, you get this strange nil no
method error
btw: you should use gsub not gsub! in
this case, because you are using the
return value (and not name itself) --
and the error would not have happened
i dont see what the block is for
just do
name = 'hello.?\/!##$%^&*()`~hello'
dvar = "$" + name.gsub(/\.|\?|\\|\/|\!|\#|\#|\{|\}|\$|\%|\^|\&|\*|\(|\)|\`|\~/, "")
puts dvar # => "$hellohello"
or use [] to denote OR
dvar = "$" + name.gsub(/[\.\?\\\/\!\#\\\#\{\}\$\%\^\&\*\(\)\`\~]/, "")
you have to escape the special characters and then OR them so it will remove them individually not just if they are all found together
also there is really no need to use gsub! to modify the string in place use the non mutator gsub() since you assign it to a new variable,
gsub! returns nil for which the operator + is not defined for stings, which gives you the no method error mentioned
It seems as the 'name' object is nil, you may be calling gsub! on nil which usually complains with a NoMethodError: private method gusb! called for nilNilClass, since I don't know the version of ruby you are using I am not sure if the error would be the same, but it's a good place to start looking at.

Resources