Why doesn't this Ruby replace regex work as expected? - ruby

Consider the following string which is a C fragment in a file:
strcat(errbuf,errbuftemp);
I want to replace errbuf (but not errbuftemp) with the prefix G-> plus errbuf. To do that successfully, I check the character after and the character before errbuf to see if it's in a list of approved characters and then I perform the replace.
I created the following Ruby file:
line = " strcat(errbuf,errbuftemp);"
item = "errbuf"
puts line.gsub(/([ \t\n\r(),\[\]]{1})#{item}([ \t\n\r(),\[\]]{1})/, "#{$1}G\->#{item}#{$2}")
Expected result:
strcat(G->errbuf,errbuftemp);
Actual result
strcatG->errbuferrbuftemp);
Basically, the matched characters before and after errbuf are not reinserted back with the replace expression.
Anyone can point out what I'm doing wrong?

Because you must use syntax gsub(/.../){"...#{$1}...#{$2}..."} or gsub(/.../,'...\1...\2...').
Here was the same problem: werid, same expression yield different value when excuting two times in irb
The problem is that the variable $1 is interpolated into the argument string before gsub is run, meaning that the previous value of $1 is what the symbol gets replaced with. You can replace the second argument with '\1 ?' to get the intended effect. (Chuck)

I think part of the problem is the use of gsub() instead of sub().
Here's two alternates:
str = 'strcat(errbuf,errbuftemp);'
str.sub(/\w+,/) { |s| 'G->' + s } # => "strcat(G->errbuf,errbuftemp);"
str.sub(/\((\w+)\b/, '(G->\1') # => "strcat(G->errbuf,errbuftemp);"

Related

Ruby Regex gsub - everything after string

I have a string something like:
test:awesome my search term with spaces
And I'd like to extract the string immediately after test: into one variable and everything else into another, so I'd end up with awesome in one variable and my search term with spaces in another.
Logically, what I'd so is move everything matching test:* into another variable, and then remove everything before the first :, leaving me with what I wanted.
At the moment I'm using /test:(.*)([\s]+)/ to match the first part, but I can't seem to get the second part correctly.
The first capture in your regular expression is greedy, and matches spaces because you used .. Instead try:
matches = string.match(/test:(\S*) (.*)/)
# index 0 is the whole pattern that was matched
first = matches[1] # this is the first () group
second = matches[2] # and the second () group
Use the following:
/^test:(.*?) (.*)$/
That is, match "test:", then a series of characters (non-greedily), up to a single space, and another series of characters to the end of the line.
I am guessing you want to remove all the leading spaces before the second match too, hence I have \s+ in the expression. Otherwise, remove the \s+ from the expression, and you'll have what you want:
m = /^test:(\w+)\s+(.*)/.match("test:awesome my search term with spaces")
a = m[1]
b = m[2]
http://codepad.org/JzuNQxBN

Remove email address from string in Ruby

I have the following code which is supposed to be removing a particular email address from a string if it exists. The problem is i get the error "invalid range "y-d" in string transliteration (ArgumentError)" which I assume is because it's treating my input as a regex. I will need to do this delete by a variable in the actual code, not a string literal but this is a simplified version of the problem.
So how do I properly perform this operation?
myvar = "test1#my-domain.com test2#my-domain.com"
myvar = myvar.delete("test1#my-domain.com")
Try
myvar = "test1#my-domain.com test2#my-domain.com"
myvar = myvar.gsub("test1#my-domain.com", '').strip
String#delete(str) does not delete the literal string str but builds a set out of individual characters of str and deletes all occurrences of these characters. try this:
"sets".delete("test")
=> ""
"sets".delete("est")
=> ""
The hyphen has a special meaning, it defines a range of characters. String#delete("a-d") will delete all occurrences of a,b,c and d characters. Range boundary characters should be given in ascending order: you should write "a-d" but not "d-a".
In your original example, ruby tries to build a character range from y-d substring and fails.
Use String#gsub method instead.
You can do it like this
myvar = "test1#my-domain.com test2#my-domain.com"
remove = "test1#my-domain.com"
myvar.gsub!(remove, "")

gsub! On an argument doesn't work

I am making a function that turns the first argument into a PHP var (useless, I know), and set it equal to the second argument. I'm trying to gsub! it to get rid of all the characters that can't be used in a PHP var. Here is what I have:
dvar = "$" + name.gsub!(/.?\/!#\#{}$%^&*()`~/, "") { |match| puts match }
I have the puts match there to make sure some of the characters were removed. name is a variable passed into a method in which this is its purpose. I am getting this error:
TypeError: can't convert nil into String
cVar at ./Web.rb:31
(root) at C:\Users\Andrew\Documents\NetBeansProjects\Web\lib\main.rb:13
Web.rb is the file this line is in, and main.rb is the file calling this method. How can I fix this?
EDIT: If I remove the ! in gsub!, it goes through, but the characters aren't removed.
Short answer
Use dvar = "$" + name.tr(".?\/!#\#{}$%^&*()``~", '')
Long answer
The problem you are facing is that the gsub! call is returning nil. You can't concatenate (+) a String with a nil.
That's happening because you have a malformed Regexp. You aren't escaping the special regex symbols, like $, * and ., just for a start. Also, the way it is now, gsub will only match if your string contains all that symbols in sequence. You should use the pipe (|) operator to make an OR like operation.
gsub! will also return nil if no substitutions happened.
See the documentation for gsub and gsub! here: http://ruby-doc.org/core/classes/String.html#M001186
I think you should replace gsub! with gsub. Do you really need name to change?
Example:
name = "m$var.name$$"
dvar = "$" + name.gsub!(/\$|\.|\*/, "") # $ or . or *
# dvar now contains $mvarname and name is mvarname
Your line, corrected:
dvar = "$" + name.gsub(/\.|\?|\/|\!|\#|\\|\#|\{|\}|\$|\%|\^|\&|\*|\(|\)|\`|\~/, "")
# some things shouldn't (or aren't needed to) be escaped, I don't remember them all right now
As J-_-L appointed, you could also use a character class ([]), that makes it a little clearer, I guess. Well, it's hard to mentally parse anyway.
dvar = "$" + name.gsub(/[\.\?\/\!\#\\\#\{\}\$\%\^\&\*\(\)\`\~]/, "")
But because what you are doing is simple character replacement, the best method is tr (again reminded by J-_-L!):
dvar = "$" + name.tr(".?\/!#\#{}$%^&*()`~", '')
Way easier to read and make modifications.
You cannot apply a second parameter
and a block to gsub (the block is ignored)
The regex is wrong, you forgot the
square brackets:
/[.?\/!#\#{}$%^&*()~]/`
Because your regex is wrong, it
didn't match anything and because
gsub! returns nil if nothing was
replaced, you get this strange nil no
method error
btw: you should use gsub not gsub! in
this case, because you are using the
return value (and not name itself) --
and the error would not have happened
i dont see what the block is for
just do
name = 'hello.?\/!##$%^&*()`~hello'
dvar = "$" + name.gsub(/\.|\?|\\|\/|\!|\#|\#|\{|\}|\$|\%|\^|\&|\*|\(|\)|\`|\~/, "")
puts dvar # => "$hellohello"
or use [] to denote OR
dvar = "$" + name.gsub(/[\.\?\\\/\!\#\\\#\{\}\$\%\^\&\*\(\)\`\~]/, "")
you have to escape the special characters and then OR them so it will remove them individually not just if they are all found together
also there is really no need to use gsub! to modify the string in place use the non mutator gsub() since you assign it to a new variable,
gsub! returns nil for which the operator + is not defined for stings, which gives you the no method error mentioned
It seems as the 'name' object is nil, you may be calling gsub! on nil which usually complains with a NoMethodError: private method gusb! called for nilNilClass, since I don't know the version of ruby you are using I am not sure if the error would be the same, but it's a good place to start looking at.

Ruby MatchData class is repeating captures, instead of including additional captures as it "should"

Ruby 1.9.1, OSX 10.5.8
I'm trying to write a simple app that parses through of bunch of java based html template files to replace a period (.) with an underscore if it's contained within a specific tag. I use ruby all the time for these types of utility apps, and thought it would be no problem to whip up something using ruby's regex support. So, I create a Regexp.new... object, open a file, read it in line by line, then match each line against the pattern, if I get a match, I create a new string using replaceString = currentMatch.gsub(/./, '_'), then create another replacement as whole string by newReplaceRegex = Regexp.escape(currentMatch) and finally replace back into the current line with line.gsub(newReplaceRegex, replaceString) Code below, of course, but first...
The problem I'm having is that when accessing the indexes within the returned MatchData object, I'm getting the first result twice, and it's missing the second sub string it should otherwise be finding. More strange, is that when testing this same pattern and same test text using rubular.com, it works as expected. See results here
My pattern:
(<(?:WEBOBJECT|webobject) (?:NAME|name)=(?:[a-zA-Z0-9]+.)+(?:[a-zA-Z0-9]+)(?:>))
Text text:
<WEBOBJECT NAME=admin.normalMode.someOtherPatternWeDontWant.moreThatWeDontWant>moreNonMatchingText<WEBOBJECT NAME=admin.SecondLineMatch>AndEvenMoreNonMatchingText
Here's the relevant code:
tagRegex = Regexp.new('(<(?:WEBOBJECT|webobject) (?:NAME|name)=(?:[a-zA-Z0-9]+\.)+(?:[a-zA-Z0-9]+)(?:>))+')
testFile = File.open('RegexTestingCompFix.txt', "r+")
lineCount=0
testFile.each{|htmlLine|
lineCount += 1
puts ("Current line: #{htmlLine} at line num: #{lineCount}")
tagMatch = tagRegex.match(htmlLine)
if(tagMatch)
matchesArray = tagMatch.to_a
firstMatch = matchesArray[0]
secondMatch = matchesArray[1]
puts "First match: #{firstMatch} and second match #{secondMatch}"
tagMatch.captures.each {|lineMatchCapture|
puts "Current capture for tagMatches: #{lineMatchCapture} of total match count #{matchesArray.size}"
#create a new regex using the match results; make sure to use auto escape method
originalPatternString = Regexp.escape(lineMatchCapture)
replacementRegex = Regexp.new(originalPatternString)
#replace any periods with underscores in a copy of lineMatchCapture
periodToUnderscoreCorrection = lineMatchCapture.gsub(/\./, '_')
#replace original match with underscore replaced copy within line
htmlLine.gsub!(replacementRegex, periodToUnderscoreCorrection)
puts "The modified htmlLine is now: #{htmlLine}"
}
end
}
I would think that I should get the first tag in matchData[0] then the second tag in matchData1, or, what I'm really doing because I don't know how many matches I'll get within any given line is matchData.to_a.each. And in this case, matchData has two captures, but they're both the first tag match
which is: <WEBOBJECT NAME=admin.normalMode.someOtherPatternWeDontWant.moreThatWeDontWant>
So, what the heck am I doing wrong, why does rubular test give me the expected results?
You want to use the on String#scan instead of the Regexp#match:
tag_regex = /<(?:WEBOBJECT|webobject) (?:NAME|name)=(?:[a-zA-Z0-9]+\.)+(?:[a-zA-Z0-9]+)(?:>)/
lines = "<WEBOBJECT NAME=admin.normalMode.someOtherPatternWeDontWant.moreThatWeDontWant>moreNonMatchingText\
<WEBOBJECT NAME=admin.SecondLineMatch>AndEvenMoreNonMatchingText"
lines.scan(tag_regex)
# => ["<WEBOBJECT NAME=admin.normalMode.someOtherPatternWeDontWant.moreThatWeDontWant>", "<WEBOBJECT NAME=admin.SecondLineMatch>"]
A few recommendations for next ruby questions:
newlines and spaces are your friends, you don't loose points for using more lines on your code ;-)
use do-end on blocks instead of {}, improves readability a lot
declare variables in snake case (hello_world) instead of camel case (helloWorld)
Hope this helps
I ended up using the String.scan approach, the only tricky point there was figuring out that this returns an array of arrays, not a MatchData object, so there was some initial confusion on my part, mostly due to my ruby green-ness, but it's working as expected now. Also, I trimmed the regex per Trevoke's suggestion. But snake case? Never...;-) Anyway, here goes:
tagRegex = /(<(?:webobject) (?:name)=(?:\w+\.)+(?:\w+)(?:>))/i
testFile = File.open('RegexTestingCompFix.txt', "r+")
lineCount=0
testFile.each do |htmlLine|
lineCount += 1
puts ("Current line: #{htmlLine} at line num: #{lineCount}")
oldMatches = htmlLine.scan(tagRegex) #oldMatches thusly named due to not explicitly using Regexp or MatchData, as in "the old way..."
if(oldMatches.size > 0)
oldMatches.each_index do |index|
arrayMatch = oldMatches[index]
aMatch = arrayMatch[0]
#create a new regex using the match results; make sure to use auto escape method
replacementRegex = Regexp.new(Regexp.escape(aMatch))
#replace any periods with underscores in a copy of lineMatchCapture
periodToUnderscoreCorrection = aMatch.gsub(/\./, '_')
#replace original match with underscore replaced copy within line, matching against the new escaped literal regex
htmlLine.gsub!(replacementRegex, periodToUnderscoreCorrection)
puts "The modified htmlLine is now: #{htmlLine}"
end # I kind of still prefer the brackets...;-)
end
end
Now, why does MatchData work the way it does? It seems like it's behavior is a bug really, and certainly not very useful in general if you can't get it provide a simple means of accessing all the matches. Just my $.02
Small bits:
This regexp helps you get "normalMode" .. But not "secondLineMatch":
<webobject name=\w+\.((?:\w+)).+> (with option 'i', for "case insensitive")
This regexp helps you get "secondLineMatch" ... But not "normalMode":
<webobject name=\w+\.((?:\w+))> (with option 'i', for "case insensitive").
I'm not really good at regexpt but I'll keep toiling at it.. :)
And I don't know if this helps you at all, but here's a way to get both:
<webobject name=admin.(\w+) (with option 'i').

what does this backtick ruby code mean?

while line = gets
next if line =~ /^\s*#/ # skip comments
break if line =~ /^END/ # stop at end
#substitute stuff in backticks and try again
redo if line.gsub!(/`(.*?)`/) { eval($1) }
end
What I don't understand is this line:
line.gsub!(/`(.*?)`/) { eval($1) }
What does the gsub! exactly do?
the meaning of regex (.*?)
the meaning of the block {eval($1)}
It will substitute within the matched part of line, the result of the block.
It will match 0 or more of the previous subexpression (which was '.', match any one char). The ? modifies the .* RE so that it matches no more than is necessary to continue matching subsequent RE elements. This is called "non-greedy". Without the ?, the .* might also match the second backtick, depending on the rest of the line, and then the expression as a whole might fail.
The block returns the result of eval ("evaluate a Ruby expression") on the backreference, which is the part of the string between the back tick characters. This is specified by $1, which refers to the first paren-enclosed section ("backreference") of the RE.
In the big picture, the result of all this is that lines containing backtick-bracketed expressions have the part within the backticks (and the backticks) replaced with the result value of executing the contained Ruby expression. And since the outer block is subject to a redo, the loop will immediately repeat without rerunning the while condition. This means that the resulting expression is also subject to a backtick evaluation.
Replaces everything between backticks in line with the result of evaluating the ruby code contained therein.
>> line = "one plus two equals `1+2`"
>> line.gsub!(/`(.*?)`/) { eval($1) }
>> p line
=> "one plus two equals 3"
.* matches zero or more characters, ? makes it non-greedy (i.e., it will take the shortest match rather than the longest).
$1 is the string which matched the stuff between the (). In the above example, $1 would have been set to "1+2". eval evaluates the string as ruby code.
line.gsub!(/(.*?)/) { eval($1) }
gsub! replaces line (instead if using line = line.gsub).
.*? so it'd match only until the first `, otherwise it'd replace multiple matches.
The block executes whatever it matches (so for example if "line" contains 1+1, eval would replace it with 2.

Resources