Display characters around regex match - ruby

Is it possible to display the characters around a regex match? I have the string below, and I want to substitute every occurrence of "change" while displaying the 3-5 characters before the match.
string = "val=change anotherval=change stringhere:change: foo=bar foofoo=barbar"
What I have so far
while line.match(/change/)
printf "\n\n Substitute the FIRST change below:\n"
printf "#{line}\n"
printf "\n\tSubstitute => "
substitution = gets.chomp
line = line.sub(/change/, "#{substitution}")
end

If you want to get down and dirty Perl style:
before_chars = $`[-3, 3]
This is the last three characters just before your pattern match.

You would likely use gsub! with block given in the following manner:
line = "val=change anotherval=change stringhere:change: foo=bar foofoo=barbar"
# line.gsub!(/(?<where>.{0,3})change/) {
line.gsub!(/(?<where>\S+)change/) {
printf "\n\n Substitute the change around #{Regexp.last_match[:where]} => \n"
substitution = gets.chomp
"#{Regexp.last_match[:where]}#{substitution}"
}
puts line
Yielding:
Substitute the change around val= =>
111
Substitute the change around anotherval= =>
222
Substitute the change around stringhere: =>
333
val=111 anotherval=222 stringhere:333: foo=bar foofoo=barbar
gsub! will substitute the matches in place, while more suitable pattern \S+ instead of commented out .{0,3} will give you an ability to print out the human-readable hint.

Alternative: Use $1 Match Variable
tadman's answer uses the special prematch variable ($`). Ruby will also store a capture group in a numbered variable, which is probably just as magical but possibly more intuitive. For example:
string = "val=change anotherval=change stringhere:change: foo=bar foofoo=barbar"
string.sub(/(.{3})?change/, "\\1#{substitution}")
$1
# => "al="
No matter what method you use, though, make sure you explicitly check your match variables for nils in the event that your last attempted match was unsuccessful.

Related

Ruby: unexplained behaviour of String#sub in the presence of "\\'"

I can't understand why this happens:
irb(main):015:0> s = "Hello\\'World"
=> "Hello\\'World"
irb(main):016:0> "#X#".sub("X",s)
=> "#Hello#World#"
I would have thought the output would be "#Hello\'World#", and I certainly can't understand where the extra # came from.
I guess I'm unfamiliar with something that has got to do with the internals of String#sub and to the "\'" symbols.
It's due to the use of backslash in a sub replacement string.
Your replacement string contains \' which is expanded to the global variable $' which is otherwise known as POSTMATCH. For a string replacement, it contains everything in the string which exists following the matched text. So because your X that you replaced is followed by a #, that's what gets inserted.
Compare:
"#X$".sub("X",s)
=> "#Hello$World$"
Note that the documentation for sub refers to use of backreferences \0 through \9. This seems to refer directly to the global variables $0 to $9 and also applies to other global variables.
For reference, the other global variables set by regular expression matching are:
$~ is equivalent to ::last_match;
$& contains the complete matched text;
$` contains string before match;
$' contains string after match;
$1, $2 and so on contain text matching first, second, etc capture group;
$+ contains last capture group.

Ruby Regex gsub - everything after string

I have a string something like:
test:awesome my search term with spaces
And I'd like to extract the string immediately after test: into one variable and everything else into another, so I'd end up with awesome in one variable and my search term with spaces in another.
Logically, what I'd so is move everything matching test:* into another variable, and then remove everything before the first :, leaving me with what I wanted.
At the moment I'm using /test:(.*)([\s]+)/ to match the first part, but I can't seem to get the second part correctly.
The first capture in your regular expression is greedy, and matches spaces because you used .. Instead try:
matches = string.match(/test:(\S*) (.*)/)
# index 0 is the whole pattern that was matched
first = matches[1] # this is the first () group
second = matches[2] # and the second () group
Use the following:
/^test:(.*?) (.*)$/
That is, match "test:", then a series of characters (non-greedily), up to a single space, and another series of characters to the end of the line.
I am guessing you want to remove all the leading spaces before the second match too, hence I have \s+ in the expression. Otherwise, remove the \s+ from the expression, and you'll have what you want:
m = /^test:(\w+)\s+(.*)/.match("test:awesome my search term with spaces")
a = m[1]
b = m[2]
http://codepad.org/JzuNQxBN

Ruby regular expression

Apparently I still don't understand exactly how it works ...
Here is my problem: I'm trying to match numbers in strings such as:
910 -6.258000 6.290
That string should gives me an array like this:
[910, -6.2580000, 6.290]
while the string
blabla9999 some more text 1.1
should not be matched.
The regex I'm trying to use is
/([-]?\d+[.]?\d+)/
but it doesn't do exactly that. Could someone help me ?
It would be great if the answer could clarify the use of the parenthesis in the matching.
Here's a pattern that works:
/^[^\d]+?\d+[^\d]+?\d+[\.]?\d+$/
Note that [^\d]+ means at least one non digit character.
On second thought, here's a more generic solution that doesn't need to deal with regular expressions:
str.gsub(/[^\d.-]+/, " ").split.collect{|d| d.to_f}
Example:
str = "blabla9999 some more text -1.1"
Parsed:
[9999.0, -1.1]
The parenthesis have different meanings.
[] defines a character class, that means one character is matched that is part of this class
() is defining a capturing group, the string that is matched by this part in brackets is put into a variable.
You did not define any anchors so your pattern will match your second string
blabla9999 some more text 1.1
^^^^ here ^^^ and here
Maybe this is more what you wanted
^(\s*-?\d+(?:\.\d+)?\s*)+$
See it here on Regexr
^ anchors the pattern to the start of the string and $ to the end.
it allows Whitespace \s before and after the number and an optional fraction part (?:\.\d+)? This kind of pattern will be matched at least once.
maybe /(-?\d+(.\d+)?)+/
irb(main):010:0> "910 -6.258000 6.290".scan(/(\-?\d+(\.\d+)?)+/).map{|x| x[0]}
=> ["910", "-6.258000", "6.290"]
str = " 910 -6.258000 6.290"
str.scan(/-?\d+\.?\d+/).map(&:to_f)
# => [910.0, -6.258, 6.29]
If you don't want integers to be converted to floats, try this:
str = " 910 -6.258000 6.290"
str.scan(/-?\d+\.?\d+/).map do |ns|
ns[/\./] ? ns.to_f : ns.to_i
end
# => [910, -6.258, 6.29]

Why doesn't this Ruby replace regex work as expected?

Consider the following string which is a C fragment in a file:
strcat(errbuf,errbuftemp);
I want to replace errbuf (but not errbuftemp) with the prefix G-> plus errbuf. To do that successfully, I check the character after and the character before errbuf to see if it's in a list of approved characters and then I perform the replace.
I created the following Ruby file:
line = " strcat(errbuf,errbuftemp);"
item = "errbuf"
puts line.gsub(/([ \t\n\r(),\[\]]{1})#{item}([ \t\n\r(),\[\]]{1})/, "#{$1}G\->#{item}#{$2}")
Expected result:
strcat(G->errbuf,errbuftemp);
Actual result
strcatG->errbuferrbuftemp);
Basically, the matched characters before and after errbuf are not reinserted back with the replace expression.
Anyone can point out what I'm doing wrong?
Because you must use syntax gsub(/.../){"...#{$1}...#{$2}..."} or gsub(/.../,'...\1...\2...').
Here was the same problem: werid, same expression yield different value when excuting two times in irb
The problem is that the variable $1 is interpolated into the argument string before gsub is run, meaning that the previous value of $1 is what the symbol gets replaced with. You can replace the second argument with '\1 ?' to get the intended effect. (Chuck)
I think part of the problem is the use of gsub() instead of sub().
Here's two alternates:
str = 'strcat(errbuf,errbuftemp);'
str.sub(/\w+,/) { |s| 'G->' + s } # => "strcat(G->errbuf,errbuftemp);"
str.sub(/\((\w+)\b/, '(G->\1') # => "strcat(G->errbuf,errbuftemp);"

what does this backtick ruby code mean?

while line = gets
next if line =~ /^\s*#/ # skip comments
break if line =~ /^END/ # stop at end
#substitute stuff in backticks and try again
redo if line.gsub!(/`(.*?)`/) { eval($1) }
end
What I don't understand is this line:
line.gsub!(/`(.*?)`/) { eval($1) }
What does the gsub! exactly do?
the meaning of regex (.*?)
the meaning of the block {eval($1)}
It will substitute within the matched part of line, the result of the block.
It will match 0 or more of the previous subexpression (which was '.', match any one char). The ? modifies the .* RE so that it matches no more than is necessary to continue matching subsequent RE elements. This is called "non-greedy". Without the ?, the .* might also match the second backtick, depending on the rest of the line, and then the expression as a whole might fail.
The block returns the result of eval ("evaluate a Ruby expression") on the backreference, which is the part of the string between the back tick characters. This is specified by $1, which refers to the first paren-enclosed section ("backreference") of the RE.
In the big picture, the result of all this is that lines containing backtick-bracketed expressions have the part within the backticks (and the backticks) replaced with the result value of executing the contained Ruby expression. And since the outer block is subject to a redo, the loop will immediately repeat without rerunning the while condition. This means that the resulting expression is also subject to a backtick evaluation.
Replaces everything between backticks in line with the result of evaluating the ruby code contained therein.
>> line = "one plus two equals `1+2`"
>> line.gsub!(/`(.*?)`/) { eval($1) }
>> p line
=> "one plus two equals 3"
.* matches zero or more characters, ? makes it non-greedy (i.e., it will take the shortest match rather than the longest).
$1 is the string which matched the stuff between the (). In the above example, $1 would have been set to "1+2". eval evaluates the string as ruby code.
line.gsub!(/(.*?)/) { eval($1) }
gsub! replaces line (instead if using line = line.gsub).
.*? so it'd match only until the first `, otherwise it'd replace multiple matches.
The block executes whatever it matches (so for example if "line" contains 1+1, eval would replace it with 2.

Resources