Ruby Regexp group matching, assign variables on 1 line - ruby

I'm currently trying to rexp a string into multiple variables. Example string:
ryan_string = "RyanOnRails: This is a test"
I've matched it with this regexp, with 3 groups:
ryan_group = ryan_string.scan(/(^.*)(:)(.*)/i)
Now to access each group I have to do something like this:
ryan_group[0][0] (first group) RyanOnRails
ryan_group[0][1] (second group) :
ryan_group[0][2] (third group) This is a test
This seems pretty ridiculous and it feels like I'm doing something wrong. I would be expect to be able to do something like this:
g1, g2, g3 = ryan_string.scan(/(^.*)(:)(.*)/i)
Is this possible? Or is there a better way than how I'm doing it?

You don't want scan for this, as it makes little sense. You can use String#match which will return a MatchData object, you can then call #captures to return an Array of captures. Something like this:
#!/usr/bin/env ruby
string = "RyanOnRails: This is a test"
one, two, three = string.match(/(^.*)(:)(.*)/i).captures
p one #=> "RyanOnRails"
p two #=> ":"
p three #=> " This is a test"
Be aware that if no match is found, String#match will return nil, so something like this might work better:
if match = string.match(/(^.*)(:)(.*)/i)
one, two, three = match.captures
end
Although scan does make little sense for this. It does still do the job, you just need to flatten the returned Array first. one, two, three = string.scan(/(^.*)(:)(.*)/i).flatten

You could use Match or =~ instead which would give you a single match and you could either access the match data the same way or just use the special match variables $1, $2, $3
Something like:
if ryan_string =~ /(^.*)(:)(.*)/i
first = $1
third = $3
end

You can name your captured matches
string = "RyanOnRails: This is a test"
/(?<one>^.*)(?<two>:)(?<three>.*)/i =~ string
puts one, two, three
It doesn't work if you reverse the order of string and the regex.

You have to decide whether it is a good idea, but ruby regexp can (automagically) define local variables for you!
I am not yet sure whether this feature is awesome or just totally crazy, but your regex can define local variables.
ryan_string = "RyanOnRails: This is a test"
/^(?<webframework>.*)(?<colon>:)(?<rest>)/ =~ ryan_string
# This defined three variables for you. Crazy, but true.
webframework # => "RyanOnRails"
puts "W: #{webframework} , C: #{colon}, R: #{rest}"
(Take a look at http://ruby-doc.org/core-2.1.1/Regexp.html , search for "local variable").
Note:
As pointed out in a comment, I see that there is a similar and earlier answer to this question by #toonsend (https://stackoverflow.com/a/21412455). I do not think I was "stealing", but if you want to be fair with praises and honor the first answer, feel free :) I hope no animals were harmed.

scan() will find all non-overlapping matches of the regex in your string, so instead of returning an array of your groups like you seem to be expecting, it is returning an array of arrays.
You are probably better off using match(), and then getting the array of captures using MatchData#captures:
g1, g2, g3 = ryan_string.match(/(^.*)(:)(.*)/i).captures
However you could also do this with scan() if you wanted to:
g1, g2, g3 = ryan_string.scan(/(^.*)(:)(.*)/i)[0]

Related

How to use gsubstitution with more letters

I've printed the code, wit ruby
string = "hahahah"
pring string.gsub("a","b")
How do I add more letter replacements into gsub?
string.gsub("a","b")("h","l") and string.gsub("a","b";"h","l")
didnt work...
*update I have tried this too but without any success .
letters = {
"a" => "l"
"b" => "n"
...
"z" => "f"
}
string = "hahahah"
print string.gsub(\/w\,letters)
You're overcomplicating. As with most method calls in Ruby, you can simply chain #gsub calls together, one after the other:
str = 'adfh'
print str.gsub("a","b").gsub("h","l") #=> 'bdfl'
What you're doing here is applying the second #gsub to the result of the first one.
Of course, that gets a bit long-winded if you do too many of them. So, when you find yourself stringing too many together, you'll want to look for a regex solution. Rubular is a great place to tinker with them.
The way to use your hash trick with #gsub and a regex expression is to provide a hash for all possible matches. This has the same result as the two #gsub calls:
print str.gsub(/[ah]/, {'a'=>'b', 'h'=>'l'}) #=> 'bdfl'
The regex matches either a or h (/[ah]/), and the hash is saying what to substitute for each of them.
All that said, str.tr('ah', 'bl') is the simplest way to solve your problem as specified, as some commenters have mentioned, so long as you are working with single letters. If you need to work with two or more characters per substitution, you'll need to use #gsub.

Use ruby to remove a part of a string on each entry in an array where it exists

I have a list of file paths, for example
[
'Useful',
'../Some.Root.Directory/Path/Interesting',
'../Some.Root.Directory/Path/Also/Interesting'
]
(I mention that they're file paths in case there is something that makes this task easier because they're files but they can be considered simply a set of strings some of which may start with a particular string)
and I need to make this into a set of pairs so that I have the original list but also
[
'Useful',
'Interesting',
'Also/Interesting'
]
I expected I'd be able to do this
'../Some.Root.Directory/Path/Interesting'.gsub!('../Some.Root.Directory/Path/', '')
or
'../Some.Root.Directory/Path/Interesting'.gsub!('\.\.\/Some\.Root\.Directory\/Path\/', '')
but neither of those replaces the provided string/pattern with an empty string...
So in irb
puts '../Some.Root.Directory/Path/Interesting'.gsub('\.\.\/Some\.Root\.Directory\/Path\/', '')
outputs
../Some.Root.Directory/Path/Interesting
and the desired output is
Interesting
How can I do this?
NB the path will be passed in so really I have
file_path.gsub!(removal_path, '')
If you are positive that strings start with removal_path you can do:
string[removal_path.size..-1]
to get the remaining part.
If you want to get pairs of the original paths and the shortened ones, you can use sub in combination with map:
a = [
'../Some.Root.Directory/Path/Interesting',
'../Some.Root.Directory/Path/Also/Interesting'
]
b = a.map do |v|
[v, v.sub('../Some.Root.Directory/Path', '')]
end
puts b
This will return an Array of arrays - each sub-array contains the original path plus the shortened one. As noted by #sawa - you can simply use sub instead of gsub, since you want to replace only a single occurrence.

Regex to capture string into ruby method params

I Looking for an Regex to capture this examples of strings:
first_paramenter, first_hash_key: 'class1 class2', second_hash_key: true
first_argument, single_hash_key: 'class1 class2'
first_argument_without_second_argument
The pattern rules are:
The string must start some word (the first parameter) /^(\w+)/
The second parameter is optional
If second parameter provided, must have one comma after fisrt parameter
The second argument is an hash, with keys and values. Values can be true, false or an string enclosed by quotes
The hash keys must start with letter
I'm using this regex, but it matches with the only second example:
^(\w+),(\s[a-z]{1}[a-z_]+:\s'?[\w\s]+'?,?)$
I'd go with something like:
^(\w+)(?:, ([a-z]\w+): ('[^']*')(?:, ([a-z]\w+): (\w+))?)?
Here's a Rubular example of it.
(?:...) create non-capturing groups which we can easily test for existence using ?. That makes it easy to test for optional chunks.
([a-z]\w+) is an easy way to say "it must start with a letter" while allowing normal alpha, digits and "_".
As far as testing for "Values can be true, false or an string enclosed by quotes", I'd do that in code after capturing. It's way too easy to create a complex pattern, and then be unable to maintain it later. It's better to use simple ones, then look to see whether you got what you expected, than to try to enforce it inside the regex.
in the third example, your regex return 5 matches. It would be better if return only one. It's possible?
I'm not sure what you're asking. This will return a single capture for each, but why you'd want that makes no sense to me if you're capturing parameters to send to a method:
/^(\w+(?:, [a-z]\w+: '[^']*'(?:, [a-z]\w+: \w+)?)?)/
http://rubular.com/r/GLVuSOieI6
There is frequently a choice to be made between attacking an entire string with a single regex or breaking the string up with one or more String methods, and then going after each piece separately. The latter approach often makes debugging and testing easier, and may also make the code intelligible to mere mortals. It's always a judgement call, of course, but I think this problem lends itself well to the divide and conquer approach. This is how I'd do it.
Code
def match?(str)
a = str.split(',')
return false unless a.shift.strip =~ /^\w+$/
a.each do |s|
return false unless ((key_val = s.split(':')).size == 2) &&
key_val.first.strip =~ /^[a-z]\w*$/ &&
key_val.last.strip =~ /^(\'.*?\'|true|false)$/
end
true
end
Examples
match?("first_paramenter, first_hash_key: 'class1 class2',
second_hash_key: true")
#=>true
match?("first_argument, single_hash_key: 'class1 class2'")
#=>true
match?("first_argument_without_second_argument")
#=>true
match?("first_parameter, first_hash_key: 7")
#=>false
match?("dogs and cats, first_hash_key: 'class1 class2'")
#=>false
match?("first_paramenter, first_hash_key: 'class1 class2',
second_hash_key: :true")
#=>false
You've got the basic idea, you have a bunch of small mistakes in there
/^(\w+)(,\s[a-z][a-z_]+:\s('[^']*'|true|false))*$/
explained:
/^(\w+) # starts with a word
(
,\s # the comma goes _inside_ the parens since its optional
[a-z][a-z_]+:\s # {1} is completely redundant
( # use | in a capture group to allow different possible keys
'[^']*' | # note that '? doesn't make sure that the quotes always match
true |
false
)
)*$/x # can have 0 or more hash keys after the first word

Regex to leave desired string remaining and others removed

In Ruby, what regex will strip out all but a desired string if present in the containing string? I know about /[^abc]/ for characters, but what about strings?
Say I have the string "group=4&type_ids[]=2&type_ids[]=7&saved=1" and want to retain the pattern group=\d, if it is present in the string using only a regex?
Currently, I am splitting on & and then doing a select with matching condition =~ /group=\d/ on the resulting enumerable collection. It works fine, but I'd like to know the regex to do this more directly.
Simply:
part = str[/group=\d+/]
If you want only the numbers, then:
group_str = str[/group=(\d+)/,1]
If you want only the numbers as an integer, then:
group_num = str[/group=(\d+)/,1].to_i
Warning: String#[] will return nil if no match occurs, and blindly calling nil.to_i always returns 0.
You can try:
$str =~ s/.*(group=\d+).*/\1/;
Typically I wouldn't really worry too much about a complex regex. Simply break the string down into smaller parts and it becomes easier:
asdf = "group=4&type_ids[]=2&type_ids[]=7&saved=1"
asdf.split('&').select{ |q| q['group'] } # => ["group=4"]
Otherwise, you can use regex a bunch of different ways. Here's two ways I tend to use:
asdf.scan(/group=\d+/) # => ["group=4"]
asdf[/(group=\d+)/, 1] # => "group=4"
Try:
str.match(/group=\d+/)[0]

Ruby MatchData class is repeating captures, instead of including additional captures as it "should"

Ruby 1.9.1, OSX 10.5.8
I'm trying to write a simple app that parses through of bunch of java based html template files to replace a period (.) with an underscore if it's contained within a specific tag. I use ruby all the time for these types of utility apps, and thought it would be no problem to whip up something using ruby's regex support. So, I create a Regexp.new... object, open a file, read it in line by line, then match each line against the pattern, if I get a match, I create a new string using replaceString = currentMatch.gsub(/./, '_'), then create another replacement as whole string by newReplaceRegex = Regexp.escape(currentMatch) and finally replace back into the current line with line.gsub(newReplaceRegex, replaceString) Code below, of course, but first...
The problem I'm having is that when accessing the indexes within the returned MatchData object, I'm getting the first result twice, and it's missing the second sub string it should otherwise be finding. More strange, is that when testing this same pattern and same test text using rubular.com, it works as expected. See results here
My pattern:
(<(?:WEBOBJECT|webobject) (?:NAME|name)=(?:[a-zA-Z0-9]+.)+(?:[a-zA-Z0-9]+)(?:>))
Text text:
<WEBOBJECT NAME=admin.normalMode.someOtherPatternWeDontWant.moreThatWeDontWant>moreNonMatchingText<WEBOBJECT NAME=admin.SecondLineMatch>AndEvenMoreNonMatchingText
Here's the relevant code:
tagRegex = Regexp.new('(<(?:WEBOBJECT|webobject) (?:NAME|name)=(?:[a-zA-Z0-9]+\.)+(?:[a-zA-Z0-9]+)(?:>))+')
testFile = File.open('RegexTestingCompFix.txt', "r+")
lineCount=0
testFile.each{|htmlLine|
lineCount += 1
puts ("Current line: #{htmlLine} at line num: #{lineCount}")
tagMatch = tagRegex.match(htmlLine)
if(tagMatch)
matchesArray = tagMatch.to_a
firstMatch = matchesArray[0]
secondMatch = matchesArray[1]
puts "First match: #{firstMatch} and second match #{secondMatch}"
tagMatch.captures.each {|lineMatchCapture|
puts "Current capture for tagMatches: #{lineMatchCapture} of total match count #{matchesArray.size}"
#create a new regex using the match results; make sure to use auto escape method
originalPatternString = Regexp.escape(lineMatchCapture)
replacementRegex = Regexp.new(originalPatternString)
#replace any periods with underscores in a copy of lineMatchCapture
periodToUnderscoreCorrection = lineMatchCapture.gsub(/\./, '_')
#replace original match with underscore replaced copy within line
htmlLine.gsub!(replacementRegex, periodToUnderscoreCorrection)
puts "The modified htmlLine is now: #{htmlLine}"
}
end
}
I would think that I should get the first tag in matchData[0] then the second tag in matchData1, or, what I'm really doing because I don't know how many matches I'll get within any given line is matchData.to_a.each. And in this case, matchData has two captures, but they're both the first tag match
which is: <WEBOBJECT NAME=admin.normalMode.someOtherPatternWeDontWant.moreThatWeDontWant>
So, what the heck am I doing wrong, why does rubular test give me the expected results?
You want to use the on String#scan instead of the Regexp#match:
tag_regex = /<(?:WEBOBJECT|webobject) (?:NAME|name)=(?:[a-zA-Z0-9]+\.)+(?:[a-zA-Z0-9]+)(?:>)/
lines = "<WEBOBJECT NAME=admin.normalMode.someOtherPatternWeDontWant.moreThatWeDontWant>moreNonMatchingText\
<WEBOBJECT NAME=admin.SecondLineMatch>AndEvenMoreNonMatchingText"
lines.scan(tag_regex)
# => ["<WEBOBJECT NAME=admin.normalMode.someOtherPatternWeDontWant.moreThatWeDontWant>", "<WEBOBJECT NAME=admin.SecondLineMatch>"]
A few recommendations for next ruby questions:
newlines and spaces are your friends, you don't loose points for using more lines on your code ;-)
use do-end on blocks instead of {}, improves readability a lot
declare variables in snake case (hello_world) instead of camel case (helloWorld)
Hope this helps
I ended up using the String.scan approach, the only tricky point there was figuring out that this returns an array of arrays, not a MatchData object, so there was some initial confusion on my part, mostly due to my ruby green-ness, but it's working as expected now. Also, I trimmed the regex per Trevoke's suggestion. But snake case? Never...;-) Anyway, here goes:
tagRegex = /(<(?:webobject) (?:name)=(?:\w+\.)+(?:\w+)(?:>))/i
testFile = File.open('RegexTestingCompFix.txt', "r+")
lineCount=0
testFile.each do |htmlLine|
lineCount += 1
puts ("Current line: #{htmlLine} at line num: #{lineCount}")
oldMatches = htmlLine.scan(tagRegex) #oldMatches thusly named due to not explicitly using Regexp or MatchData, as in "the old way..."
if(oldMatches.size > 0)
oldMatches.each_index do |index|
arrayMatch = oldMatches[index]
aMatch = arrayMatch[0]
#create a new regex using the match results; make sure to use auto escape method
replacementRegex = Regexp.new(Regexp.escape(aMatch))
#replace any periods with underscores in a copy of lineMatchCapture
periodToUnderscoreCorrection = aMatch.gsub(/\./, '_')
#replace original match with underscore replaced copy within line, matching against the new escaped literal regex
htmlLine.gsub!(replacementRegex, periodToUnderscoreCorrection)
puts "The modified htmlLine is now: #{htmlLine}"
end # I kind of still prefer the brackets...;-)
end
end
Now, why does MatchData work the way it does? It seems like it's behavior is a bug really, and certainly not very useful in general if you can't get it provide a simple means of accessing all the matches. Just my $.02
Small bits:
This regexp helps you get "normalMode" .. But not "secondLineMatch":
<webobject name=\w+\.((?:\w+)).+> (with option 'i', for "case insensitive")
This regexp helps you get "secondLineMatch" ... But not "normalMode":
<webobject name=\w+\.((?:\w+))> (with option 'i', for "case insensitive").
I'm not really good at regexpt but I'll keep toiling at it.. :)
And I don't know if this helps you at all, but here's a way to get both:
<webobject name=admin.(\w+) (with option 'i').

Resources