Reformat string with `Regexp`, named captures, and `String#%` - ruby

Does anyone know a way to directly use a MatchData object containing named captures as the input to a String template formatting operation (%)? When I attempt to do so, I get a "positional args mixed with named args" error.
s = "One-Two-Three"
re = /(?<first>.*?)-(?<second>.*?)-(?<third>.*)/
puts "%{second}" % s.match(re)
I found other ways to achieve the functional objective (ie by creating an array of the captures in the desired order and using positional templating), but the code is comparatively klunky.

Try this:
s = "One-Two-Three"
re = /(?<first>.*?)-(?<second>.*?)-(?<third>.*)/
match = s.match(re)
[match.names.map(&:to_sym), match.captures].transpose.to_h
# => {:first=>"One", :second=>"Two", :third=>"Three"}

What about using string interpolation directly:
puts "#{s.match(re)['second']}"

For ruby < 2.0 you want to use Hash[]:
m = s.match re
Hash[m.names.map(&:to_sym).zip m.captures]
#=> {:first=>"One", :second=>"Two", :third=>"Three"}

Related

Read string as variable RUBY

I am pulling the following string from a CSV file, from cell A1, and storing it as a variable:
#{collector_id}
So, cell A1 reads #{collector_id}, and my code essentially does this:
test = #excel_cell_A1
However, if I then do this:
puts test
I get this:
#{collector_id}
I need #{collector_id} to read as the actual variable collector_id, not the code that I am using to call the variable. Is that possible?
Thanks for the help. I am using ruby 1.9.3.
You can use sub or gsub to replace expected input values:
collector_id = "foo"
test = '#{collector_id}'
test.sub("\#{collector_id}", "#{collector_id}") #=> "foo"
I would avoid the use of eval (or at least sanity check what you are running) to reduce the risk of running arbitrary code you receive from the CSV file.
Try this:
test_to_s = eval("\"#{ test }\"")
puts test_to_s
%q["#{ test }"] will build the string "#{collector_id}" (the double quotes are part of the string, "#{collector_id}".length == 17) which then will be evaluated as ruby code by eval

replacing lines in ruby string

i'm trying to loop through a Ruby string containing many lines using the each_line method, but I also want to change them. I'm using the following code, but it doesn't seem to work:
string.each_line{|line| line=change_line(line)}
I suppose, that Ruby is sending a copy of my line and not the line itself, but unfortunatelly there is no method each_line!. I also tried with the gsub! method, using /^.*$/ to detect each line, but it seems that it calls the change_line method only ones and replaces all lines with it. Any ideas how to do that?
Thanks in advance :)
#azlisum: You are not storing the result of your concatenation. Use:
output = string.lines.map{|line|change_line(line)}.join
Comparing four ways to process by line in a string:
# Inject method (proposed by #steenslang)
output = string.each_line.inject(""){|s, line| s << change_line(line)}
# Join method (proposed by #Lars Haugseth)
output = string.lines.map{|line|change_line(line)}.join
# REGEX method (proposed by #olistik)
output = string.gsub!(/^(.*)$/) {|line| change_line(line)}
# String concatenation += method (proposed by #Erik Hinton)
output = ""
string.each_line{|line| output += change_line(line)}
The timing with Benchmark:
user system total real
Inject Time: 7.920000 0.010000 7.930000 ( 7.920128)
Join Time: 7.150000 0.010000 7.160000 ( 7.155957)
REGEX Time: 11.660000 0.010000 11.670000 ( 11.661059)
+= Time: 7.080000 0.010000 7.090000 ( 7.076423)
As #steenslag pointed out, 's += a' will generate a new string for each concatenation and is therefor not usually the best choice.
So given that, and given the times, your best bet is:
output = string.lines.map{|line|change_line(line)}.join
Also, this is the cleaner looking choice IMHO.
Notes:
Using Benchmark
Ruby-Doc: Benchmark
You should try starting out with a blank string too, each_lining through the string and then pushing the results onto the blank string.
output = ""
string.each_line{|line| output += change_line(line)}
In your original example, you are correct. Your changes are occuring but they are not being ssved anywhere. Each in Ruby does not alter anything by default.
You could use gsub! passing a block to it:
string.gsub!(/^(.*)$/) {|line| change_line(line)}
source: String#gsub!
String#each_line is meant for reading lines in a string, not writing them. You can use this to get the result you want like so:
changed_string = ""
string.each_line{ |line| changed_string += change_line(line) }
If you don't give each_line a block, you'll get an enumerator, which has the inject method.
str = <<HERE
smestring dsfg
line 2
HERE
res = str.each_line.inject(""){|m,line|m << line.upcase}

Regex to leave desired string remaining and others removed

In Ruby, what regex will strip out all but a desired string if present in the containing string? I know about /[^abc]/ for characters, but what about strings?
Say I have the string "group=4&type_ids[]=2&type_ids[]=7&saved=1" and want to retain the pattern group=\d, if it is present in the string using only a regex?
Currently, I am splitting on & and then doing a select with matching condition =~ /group=\d/ on the resulting enumerable collection. It works fine, but I'd like to know the regex to do this more directly.
Simply:
part = str[/group=\d+/]
If you want only the numbers, then:
group_str = str[/group=(\d+)/,1]
If you want only the numbers as an integer, then:
group_num = str[/group=(\d+)/,1].to_i
Warning: String#[] will return nil if no match occurs, and blindly calling nil.to_i always returns 0.
You can try:
$str =~ s/.*(group=\d+).*/\1/;
Typically I wouldn't really worry too much about a complex regex. Simply break the string down into smaller parts and it becomes easier:
asdf = "group=4&type_ids[]=2&type_ids[]=7&saved=1"
asdf.split('&').select{ |q| q['group'] } # => ["group=4"]
Otherwise, you can use regex a bunch of different ways. Here's two ways I tend to use:
asdf.scan(/group=\d+/) # => ["group=4"]
asdf[/(group=\d+)/, 1] # => "group=4"
Try:
str.match(/group=\d+/)[0]

Ruby MatchData class is repeating captures, instead of including additional captures as it "should"

Ruby 1.9.1, OSX 10.5.8
I'm trying to write a simple app that parses through of bunch of java based html template files to replace a period (.) with an underscore if it's contained within a specific tag. I use ruby all the time for these types of utility apps, and thought it would be no problem to whip up something using ruby's regex support. So, I create a Regexp.new... object, open a file, read it in line by line, then match each line against the pattern, if I get a match, I create a new string using replaceString = currentMatch.gsub(/./, '_'), then create another replacement as whole string by newReplaceRegex = Regexp.escape(currentMatch) and finally replace back into the current line with line.gsub(newReplaceRegex, replaceString) Code below, of course, but first...
The problem I'm having is that when accessing the indexes within the returned MatchData object, I'm getting the first result twice, and it's missing the second sub string it should otherwise be finding. More strange, is that when testing this same pattern and same test text using rubular.com, it works as expected. See results here
My pattern:
(<(?:WEBOBJECT|webobject) (?:NAME|name)=(?:[a-zA-Z0-9]+.)+(?:[a-zA-Z0-9]+)(?:>))
Text text:
<WEBOBJECT NAME=admin.normalMode.someOtherPatternWeDontWant.moreThatWeDontWant>moreNonMatchingText<WEBOBJECT NAME=admin.SecondLineMatch>AndEvenMoreNonMatchingText
Here's the relevant code:
tagRegex = Regexp.new('(<(?:WEBOBJECT|webobject) (?:NAME|name)=(?:[a-zA-Z0-9]+\.)+(?:[a-zA-Z0-9]+)(?:>))+')
testFile = File.open('RegexTestingCompFix.txt', "r+")
lineCount=0
testFile.each{|htmlLine|
lineCount += 1
puts ("Current line: #{htmlLine} at line num: #{lineCount}")
tagMatch = tagRegex.match(htmlLine)
if(tagMatch)
matchesArray = tagMatch.to_a
firstMatch = matchesArray[0]
secondMatch = matchesArray[1]
puts "First match: #{firstMatch} and second match #{secondMatch}"
tagMatch.captures.each {|lineMatchCapture|
puts "Current capture for tagMatches: #{lineMatchCapture} of total match count #{matchesArray.size}"
#create a new regex using the match results; make sure to use auto escape method
originalPatternString = Regexp.escape(lineMatchCapture)
replacementRegex = Regexp.new(originalPatternString)
#replace any periods with underscores in a copy of lineMatchCapture
periodToUnderscoreCorrection = lineMatchCapture.gsub(/\./, '_')
#replace original match with underscore replaced copy within line
htmlLine.gsub!(replacementRegex, periodToUnderscoreCorrection)
puts "The modified htmlLine is now: #{htmlLine}"
}
end
}
I would think that I should get the first tag in matchData[0] then the second tag in matchData1, or, what I'm really doing because I don't know how many matches I'll get within any given line is matchData.to_a.each. And in this case, matchData has two captures, but they're both the first tag match
which is: <WEBOBJECT NAME=admin.normalMode.someOtherPatternWeDontWant.moreThatWeDontWant>
So, what the heck am I doing wrong, why does rubular test give me the expected results?
You want to use the on String#scan instead of the Regexp#match:
tag_regex = /<(?:WEBOBJECT|webobject) (?:NAME|name)=(?:[a-zA-Z0-9]+\.)+(?:[a-zA-Z0-9]+)(?:>)/
lines = "<WEBOBJECT NAME=admin.normalMode.someOtherPatternWeDontWant.moreThatWeDontWant>moreNonMatchingText\
<WEBOBJECT NAME=admin.SecondLineMatch>AndEvenMoreNonMatchingText"
lines.scan(tag_regex)
# => ["<WEBOBJECT NAME=admin.normalMode.someOtherPatternWeDontWant.moreThatWeDontWant>", "<WEBOBJECT NAME=admin.SecondLineMatch>"]
A few recommendations for next ruby questions:
newlines and spaces are your friends, you don't loose points for using more lines on your code ;-)
use do-end on blocks instead of {}, improves readability a lot
declare variables in snake case (hello_world) instead of camel case (helloWorld)
Hope this helps
I ended up using the String.scan approach, the only tricky point there was figuring out that this returns an array of arrays, not a MatchData object, so there was some initial confusion on my part, mostly due to my ruby green-ness, but it's working as expected now. Also, I trimmed the regex per Trevoke's suggestion. But snake case? Never...;-) Anyway, here goes:
tagRegex = /(<(?:webobject) (?:name)=(?:\w+\.)+(?:\w+)(?:>))/i
testFile = File.open('RegexTestingCompFix.txt', "r+")
lineCount=0
testFile.each do |htmlLine|
lineCount += 1
puts ("Current line: #{htmlLine} at line num: #{lineCount}")
oldMatches = htmlLine.scan(tagRegex) #oldMatches thusly named due to not explicitly using Regexp or MatchData, as in "the old way..."
if(oldMatches.size > 0)
oldMatches.each_index do |index|
arrayMatch = oldMatches[index]
aMatch = arrayMatch[0]
#create a new regex using the match results; make sure to use auto escape method
replacementRegex = Regexp.new(Regexp.escape(aMatch))
#replace any periods with underscores in a copy of lineMatchCapture
periodToUnderscoreCorrection = aMatch.gsub(/\./, '_')
#replace original match with underscore replaced copy within line, matching against the new escaped literal regex
htmlLine.gsub!(replacementRegex, periodToUnderscoreCorrection)
puts "The modified htmlLine is now: #{htmlLine}"
end # I kind of still prefer the brackets...;-)
end
end
Now, why does MatchData work the way it does? It seems like it's behavior is a bug really, and certainly not very useful in general if you can't get it provide a simple means of accessing all the matches. Just my $.02
Small bits:
This regexp helps you get "normalMode" .. But not "secondLineMatch":
<webobject name=\w+\.((?:\w+)).+> (with option 'i', for "case insensitive")
This regexp helps you get "secondLineMatch" ... But not "normalMode":
<webobject name=\w+\.((?:\w+))> (with option 'i', for "case insensitive").
I'm not really good at regexpt but I'll keep toiling at it.. :)
And I don't know if this helps you at all, but here's a way to get both:
<webobject name=admin.(\w+) (with option 'i').

How do I convert a Ruby string with brackets to an array?

I would like to convert the following string into an array/nested array:
str = "[[this, is],[a, nested],[array]]"
newarray = # this is what I need help with!
newarray.inspect # => [['this','is'],['a','nested'],['array']]
You'll get what you want with YAML.
But there is a little problem with your string. YAML expects that there's a space behind the comma. So we need this
str = "[[this, is], [a, nested], [array]]"
Code:
require 'yaml'
str = "[[this, is],[a, nested],[array]]"
### transform your string in a valid YAML-String
str.gsub!(/(\,)(\S)/, "\\1 \\2")
YAML::load(str)
# => [["this", "is"], ["a", "nested"], ["array"]]
You could also treat it as almost-JSON. If the strings really are only letters, like in your example, then this will work:
JSON.parse(yourarray.gsub(/([a-z]+)/,'"\1"'))
If they could have arbitrary characters (other than [ ] , ), you'd need a little more:
JSON.parse("[[this, is],[a, nested],[array]]".gsub(/, /,",").gsub(/([^\[\]\,]+)/,'"\1"'))
For a laugh:
ary = eval("[[this, is],[a, nested],[array]]".gsub(/(\w+?)/, "'\\1'") )
=> [["this", "is"], ["a", "nested"], ["array"]]
Disclaimer: You definitely shouldn't do this as eval is a terrible idea, but it is fast and has the useful side effect of throwing an exception if your nested arrays aren't valid
Looks like a basic parsing task. Generally the approach you are going to want to take is to create a recursive function with the following general algorithm
base case (input doesn't begin with '[') return the input
recursive case:
split the input on ',' (you will need to find commas only at this level)
for each sub string call this method again with the sub string
return array containing the results from this recursive method
The only slighlty tricky part here is splitting the input on a single ','. You could write a separate function for this that would scan through the string and keep a count of the openbrackets - closedbrakets seen so far. Then only split on commas when the count is equal to zero.
Make a recursive function that takes the string and an integer offset, and "reads" out an array. That is, have it return an array or string (that it has read) and an integer offset pointing after the array. For example:
s = "[[this, is],[a, nested],[array]]"
yourFunc(s, 1) # returns ['this', 'is'] and 11.
yourFunc(s, 2) # returns 'this' and 6.
Then you can call it with another function that provides an offset of 0, and makes sure that the finishing offset is the length of the string.

Resources