Ruby: unexplained behaviour of String#sub in the presence of "\\'" - ruby

I can't understand why this happens:
irb(main):015:0> s = "Hello\\'World"
=> "Hello\\'World"
irb(main):016:0> "#X#".sub("X",s)
=> "#Hello#World#"
I would have thought the output would be "#Hello\'World#", and I certainly can't understand where the extra # came from.
I guess I'm unfamiliar with something that has got to do with the internals of String#sub and to the "\'" symbols.

It's due to the use of backslash in a sub replacement string.
Your replacement string contains \' which is expanded to the global variable $' which is otherwise known as POSTMATCH. For a string replacement, it contains everything in the string which exists following the matched text. So because your X that you replaced is followed by a #, that's what gets inserted.
Compare:
"#X$".sub("X",s)
=> "#Hello$World$"
Note that the documentation for sub refers to use of backreferences \0 through \9. This seems to refer directly to the global variables $0 to $9 and also applies to other global variables.
For reference, the other global variables set by regular expression matching are:
$~ is equivalent to ::last_match;
$& contains the complete matched text;
$` contains string before match;
$' contains string after match;
$1, $2 and so on contain text matching first, second, etc capture group;
$+ contains last capture group.

Related

Ruby Regex Group Replacement

I am trying to perform regular expression matching and replacement on the same line in Ruby. I have some libraries that manipulate strings in Ruby and add special formatting characters to it. The formatting can be applied in any order. However, if I would like to change the string formatting, I want to keep some of the original formatting. I'm using regex for that. I have the regular expression matching correctly what I need:
mystring.gsub(/[(\e\[([1-9]|[1,2,4,5,6,7,8]{2}m))|(\e\[[3,9][0-8]m)]*Text/, 'New Text')
However, what I really want is the matching from the first grouping found in:
(\e\[([1-9]|[1,2,4,5,6,7,8]{2}m))
to be appended to New Text and replaced as opposed to just New Text. I'm trying to reference the match in the form of
mystring.gsub(/[(\e\[([1-9]|[1,2,4,5,6,7,8]{2}m))|(\e\[[3,9][0-8]m)]*Text/, '\1' + 'New Text')
but my understanding is that \1 only works when using \d or \k. Is there any way to reference that specific capturing group in my replacement string? Additionally, since I am using an asterik for the [], I know that this grouping could occur more than once. Therefore, I would like to have the last matching occurrence yielded.
My expected input/output with a sample is:
Input: "\e[1mHello there\e[34m\e[40mText\e[0m\e[0m\e[22m"
Output: "\e[1mHello there\e[40mNew Text\e[0m\e[0m\e[22m"
Input: "\e[1mHello there\e[44m\e[34m\e[40mText\e[0m\e[0m\e[22m"
Output: "\e[1mHello there\e[40mNew Text\e[0m\e[0m\e[22m"
So the last grouping is found and appended.
You can use the following regex with back-reference \\1 in the replacement:
reg = /(\\e\[(?:[0-9]{1,2}|[3,9][0-8])m)+Text/
mystring = "\\e[1mHello there\\e[34m\\e[40mText\\e[0m\\e[0m\\e[22m"
puts mystring.gsub(reg, '\\1New Text')
mystring = "\\e[1mHello there\\e[44m\\e[34m\\e[40mText\\e[0m\\e[0m\\e[22m"
puts mystring.gsub(reg, '\\1New Text')
Output of the IDEONE demo:
\e[1mHello there\e[40mNew Text\e[0m\e[0m\e[22m
\e[1mHello there\e[40mNew Text\e[0m\e[0m\e[22m
Mind that your input has backslash \ that needs escaping in a regular string literal. To match it inside the regex, we use double slash, as we are looking for a literal backslash.

use of the ampersand here means pre_match?

What is the ampersand doing in the code below?
s.reverse.gsub( /\d{3}(?=\d)/, '\&,' ).reverse
One would think, after attempting to look up such things, that it is a special variable meaning post_match or pre_match, but the docs say nothing about ampersands - only dollar signs either followed by or preceded by a tick mark.
\& defines the whole string that is matched by the regex. see this simplified example:
s = "p1:1 1:1";
print s.gsub( /[a-z]/, '[\&],' ) ## only p is matched
output: [p],1:1 1:1
Similarly, the \1 defines the first group that is matched from the regex. (Similar goes for \2,\3... so on). An example:
s = "p1:1 1:1";
print s.gsub( /(\d:\d)/, '[\1]' )
output: p[1:1] [1:1]

Display characters around regex match

Is it possible to display the characters around a regex match? I have the string below, and I want to substitute every occurrence of "change" while displaying the 3-5 characters before the match.
string = "val=change anotherval=change stringhere:change: foo=bar foofoo=barbar"
What I have so far
while line.match(/change/)
printf "\n\n Substitute the FIRST change below:\n"
printf "#{line}\n"
printf "\n\tSubstitute => "
substitution = gets.chomp
line = line.sub(/change/, "#{substitution}")
end
If you want to get down and dirty Perl style:
before_chars = $`[-3, 3]
This is the last three characters just before your pattern match.
You would likely use gsub! with block given in the following manner:
line = "val=change anotherval=change stringhere:change: foo=bar foofoo=barbar"
# line.gsub!(/(?<where>.{0,3})change/) {
line.gsub!(/(?<where>\S+)change/) {
printf "\n\n Substitute the change around #{Regexp.last_match[:where]} => \n"
substitution = gets.chomp
"#{Regexp.last_match[:where]}#{substitution}"
}
puts line
Yielding:
Substitute the change around val= =>
111
Substitute the change around anotherval= =>
222
Substitute the change around stringhere: =>
333
val=111 anotherval=222 stringhere:333: foo=bar foofoo=barbar
gsub! will substitute the matches in place, while more suitable pattern \S+ instead of commented out .{0,3} will give you an ability to print out the human-readable hint.
Alternative: Use $1 Match Variable
tadman's answer uses the special prematch variable ($`). Ruby will also store a capture group in a numbered variable, which is probably just as magical but possibly more intuitive. For example:
string = "val=change anotherval=change stringhere:change: foo=bar foofoo=barbar"
string.sub(/(.{3})?change/, "\\1#{substitution}")
$1
# => "al="
No matter what method you use, though, make sure you explicitly check your match variables for nils in the event that your last attempted match was unsuccessful.

Why doesn't this Ruby replace regex work as expected?

Consider the following string which is a C fragment in a file:
strcat(errbuf,errbuftemp);
I want to replace errbuf (but not errbuftemp) with the prefix G-> plus errbuf. To do that successfully, I check the character after and the character before errbuf to see if it's in a list of approved characters and then I perform the replace.
I created the following Ruby file:
line = " strcat(errbuf,errbuftemp);"
item = "errbuf"
puts line.gsub(/([ \t\n\r(),\[\]]{1})#{item}([ \t\n\r(),\[\]]{1})/, "#{$1}G\->#{item}#{$2}")
Expected result:
strcat(G->errbuf,errbuftemp);
Actual result
strcatG->errbuferrbuftemp);
Basically, the matched characters before and after errbuf are not reinserted back with the replace expression.
Anyone can point out what I'm doing wrong?
Because you must use syntax gsub(/.../){"...#{$1}...#{$2}..."} or gsub(/.../,'...\1...\2...').
Here was the same problem: werid, same expression yield different value when excuting two times in irb
The problem is that the variable $1 is interpolated into the argument string before gsub is run, meaning that the previous value of $1 is what the symbol gets replaced with. You can replace the second argument with '\1 ?' to get the intended effect. (Chuck)
I think part of the problem is the use of gsub() instead of sub().
Here's two alternates:
str = 'strcat(errbuf,errbuftemp);'
str.sub(/\w+,/) { |s| 'G->' + s } # => "strcat(G->errbuf,errbuftemp);"
str.sub(/\((\w+)\b/, '(G->\1') # => "strcat(G->errbuf,errbuftemp);"

what does this backtick ruby code mean?

while line = gets
next if line =~ /^\s*#/ # skip comments
break if line =~ /^END/ # stop at end
#substitute stuff in backticks and try again
redo if line.gsub!(/`(.*?)`/) { eval($1) }
end
What I don't understand is this line:
line.gsub!(/`(.*?)`/) { eval($1) }
What does the gsub! exactly do?
the meaning of regex (.*?)
the meaning of the block {eval($1)}
It will substitute within the matched part of line, the result of the block.
It will match 0 or more of the previous subexpression (which was '.', match any one char). The ? modifies the .* RE so that it matches no more than is necessary to continue matching subsequent RE elements. This is called "non-greedy". Without the ?, the .* might also match the second backtick, depending on the rest of the line, and then the expression as a whole might fail.
The block returns the result of eval ("evaluate a Ruby expression") on the backreference, which is the part of the string between the back tick characters. This is specified by $1, which refers to the first paren-enclosed section ("backreference") of the RE.
In the big picture, the result of all this is that lines containing backtick-bracketed expressions have the part within the backticks (and the backticks) replaced with the result value of executing the contained Ruby expression. And since the outer block is subject to a redo, the loop will immediately repeat without rerunning the while condition. This means that the resulting expression is also subject to a backtick evaluation.
Replaces everything between backticks in line with the result of evaluating the ruby code contained therein.
>> line = "one plus two equals `1+2`"
>> line.gsub!(/`(.*?)`/) { eval($1) }
>> p line
=> "one plus two equals 3"
.* matches zero or more characters, ? makes it non-greedy (i.e., it will take the shortest match rather than the longest).
$1 is the string which matched the stuff between the (). In the above example, $1 would have been set to "1+2". eval evaluates the string as ruby code.
line.gsub!(/(.*?)/) { eval($1) }
gsub! replaces line (instead if using line = line.gsub).
.*? so it'd match only until the first `, otherwise it'd replace multiple matches.
The block executes whatever it matches (so for example if "line" contains 1+1, eval would replace it with 2.

Resources