Maintaining a "variable" in regular expressions? - ruby

Is there any way using regular expressions to match and replace with a "variable string" like...
foo_1_a => bar_1_b
foo_2_a => bar_2_b
foo_3_a => bar_3_b
...
Using some expression with a variable "var" for example
"replace foo_var => [0-9]_a with bar_var_b "
Specifically I'm trying to take in one regex/replacement from command line using Ruby and executing all these replacements. Thanks.

If I understand you correctly, you are looking for back reference replace string. This is usually done by \1 or $1. The number 1 is the previously matched group's order.
So match foo_2_a by foo_(\d+)_a. Here parenthesis creates a group. And its the first group. So replace it with bar_\1_b. \1 will contain 2
More about Back Reference.

Here we go.
result = "foo_1_a".match(/_([0..1])_/){ "bar_#{$1}_b" }
puts result # "bar_1_b"

Related

Extracting a substring from a string using `Regexp.new`

I have a string like this:
var = "Renewal Quote RQ00041233 (Payment Pending) Policy R38A014294-1"
I have to extract "Payment Pending" from that string using only the information included in another single string.
The following:
var[/\((.*)\)/, 1]
will extract what I want. I can include the string representation of the regex in the string to be given, and construct the regular expression from it using Regexp.new, but I have no way to achieve the information 1 used as the second argument of [].
Without the second argument 1,
regex_string = '\((.*)\)'
var[Regexp.new(regex_string)]
fetches the string "(Payment Pending)"instead of the expected "Payment Pending".
Can someone help me?
Not sure what you are trying to do, but you can get rid of capturing groups using a different regex:
var[/(?<=\().*(?=\))/]
# => "Payment Pending"
or
var[Regexp.new('(?<=\().*(?=\))')]
# => "Payment Pending"
/\((.*)\)/ is just shorthand for Regexp.new('\((.*)\)').
String#[] takes a regex and a capture group as two separate arguments. var[/\((.*)\)/, 1] is var[Regex, 1].
The important thing to realize is 1 is passed to var[], not the regex.
re = Regexp.new('\((.*)\)')
match = var[re, 1]
Note: you might want to require a named capture group rather than a numbered one. It's very easy to accidentally include an extra capture group in a regex.
Assuming there are no nested parenthesis in the string, one way to do that without using a regular expression is as follows.
instance_eval "var[(i=var.index('(')+1)..var.index(')',i)-1]"
#=> "Payment Pending"
See String#index, particularly the reference to the optional second argument, "offset".

Method gsub does not work as expected

I want to change "#" to "\40" in a string. But am not able to do so.
a = "srikanth#in.com"
a.gsub("#", "\40")
# => "srikanth in.com"
It's changing \40 with space. Any idea how to implement this?
An other solution
puts a.gsub("#") {"\\40"}
# => srikanth\40in.com
\\40 doesn't work because it refers to a capture group. From the docs:
If replacement is a String it will be substituted for the matched
text. It may contain back-references to the pattern’s capture groups
of the form \\d, where d is a group number ...
You can use gsub's hash syntax instead:
If the second argument is a Hash, and the matched text is one of its keys, the corresponding value is the replacement string.
Example:
a.gsub('#', '#' => '\\40')
#=> "srikanth\\40in.com"
backslashes have a special meaning in the second parameter of gsub. They refer to a possibly matched regex groups. I tried escaping, but couldn't get it to work. It works this way, though:
s = "srikanth#in.com"
s['#'] = '\\40'
s # => "srikanth\\40in.com"

String gsub - Replace characters between two elements, but leave surrounding elements

Suppose I have the following string:
mystring = "start/abc123/end"
How can you splice out the abc123 with something else, while leaving the "/start/" and "/end" elements intact?
I had the following to match for the pattern, but it replaces the entire string. I was hoping to just have it replace the abc123 with 123abc.
mystring.gsub(/start\/(.*)\/end/,"123abc") #=> "123abc"
Edit: The characters between the start & end elements can be any combination of alphanumeric characters, I changed my example to reflect this.
You can do it using this character class : [^\/] (all that is not a slash) and lookarounds
mystring.gsub(/(?<=start\/)[^\/]+(?=\/end)/,"7")
For your example, you could perhaps use:
mystring.gsub(/\/(.*?)\//,"/7/")
This will match the two slashes between the string you're replacing and putting them back in the substitution.
Alternatively, you could capture the pieces of the string you want to keep and interpolate them around your replacement, this turns out to be much more readable than lookaheads/lookbehinds:
irb(main):010:0> mystring.gsub(/(start)\/.*\/(end)/, "\\1/7/\\2")
=> "start/7/end"
\\1 and \\2 here refer to the numbered captures inside of your regular expression.
The problem is that you're replacing the entire matched string, "start/8/end", with "7". You need to include the matched characters you want to persist:
mystring.gsub(/start\/(.*)\/end/, "start/7/end")
Alternatively, just match the digits:
mystring.gsub(/\d+/, "7")
You can do this by grouping the start and end elements in the regular expression and then referring to these groups in in the substitution string:
mystring.gsub(/(?<start>start\/).*(?<end>\/end)/, "\\<start>7\\<end>")

Why doesn't this Ruby replace regex work as expected?

Consider the following string which is a C fragment in a file:
strcat(errbuf,errbuftemp);
I want to replace errbuf (but not errbuftemp) with the prefix G-> plus errbuf. To do that successfully, I check the character after and the character before errbuf to see if it's in a list of approved characters and then I perform the replace.
I created the following Ruby file:
line = " strcat(errbuf,errbuftemp);"
item = "errbuf"
puts line.gsub(/([ \t\n\r(),\[\]]{1})#{item}([ \t\n\r(),\[\]]{1})/, "#{$1}G\->#{item}#{$2}")
Expected result:
strcat(G->errbuf,errbuftemp);
Actual result
strcatG->errbuferrbuftemp);
Basically, the matched characters before and after errbuf are not reinserted back with the replace expression.
Anyone can point out what I'm doing wrong?
Because you must use syntax gsub(/.../){"...#{$1}...#{$2}..."} or gsub(/.../,'...\1...\2...').
Here was the same problem: werid, same expression yield different value when excuting two times in irb
The problem is that the variable $1 is interpolated into the argument string before gsub is run, meaning that the previous value of $1 is what the symbol gets replaced with. You can replace the second argument with '\1 ?' to get the intended effect. (Chuck)
I think part of the problem is the use of gsub() instead of sub().
Here's two alternates:
str = 'strcat(errbuf,errbuftemp);'
str.sub(/\w+,/) { |s| 'G->' + s } # => "strcat(G->errbuf,errbuftemp);"
str.sub(/\((\w+)\b/, '(G->\1') # => "strcat(G->errbuf,errbuftemp);"

How to get the particular part of string matching regexp in Ruby?

I've got a string Unnecessary:12357927251data and I need to select all data after colon and numbers. I will do it using Regexp.
string.scan(/:\d+.+$/)
This will give me :12357927251data, but can I select only needed information .+ (data)?
Anything in parentheses in a regexp will be captured as a group, which you can access in $1, $2, etc. or by using [] on a match object:
string.match(/:\d+(.+)$/)[1]
If you use scan with capturing groups, you will get an array of arrays of the groups:
"Unnecessary:123data\nUnnecessary:5791next".scan(/:\d+(.+)$/)
=> [["data"], ["next"]]
Use parenthesis in your regular expression and the result will be broken out into an array. For example:
x='Unnecessary:12357927251data'
x.scan(/(:\d+)(.+)$/)
=> [[":12357927251", "data"]]
x.scan(/:\d+(.+$)/).flatten
=> ["data"]
Assuming that you are trying to get the string 'data' from your string, then you can use:
string.match(/.*:\d*(.*)/)[1]
String#match returns a MatchData object. You can then index into that MatchData object to find the part of the string that you want.
(The first element of MatchData is the original string, the second element is the part of the string captured by the parentheses)
Try this: /(?<=\:)\d+.+$/
It changes the colon to a positive look-behind so that it does not appear in the output. Note that the colon alone is a metacharacter and so must be escaped with a backslash.
Using IRB
irb(main):004:0> "Unnecessary:12357927251data".scan(/:\d+(.+)$/)
=> [["data"]]

Resources