Ruby: Concise way to conditionally join strings with newlines? - ruby

I need to add a description string to an existing comments variable, which contains either a string or nil. I want to separate the new description from any existing comments with a newline, but only if there are existing comments. I've come up with a couple ways that are kind of concise,
old_comments = comments + "\n" rescue ""
new_comments = old_comments + description
or
new_comments = [comments, description].compact.join("\n")
but I'm surprised there's not a less "tricky" way to squeeze this into a one-liner. Or is there?

[*comments, description].join($/)

Related

Combine Regexp and set of values (hash/array) to compare if a string matches in ruby

I have the following pattern to check:
"MODEL_NAME"-"ID"."FORMAT_TYPE"
where, for example:
MODEL_NAME = [:product, :brand]
FORMAT_TYPE = [:jpg, :png]
First I wanted to check if the regexp is something like:
/^\w+-\d+.\w+$/
and I have also to check if the part of my string is part of my arrays. I want something more flexible than:
/^(product|brand)-\d+.(jpg|png)$/
which I could manage through my arrays. What is a good solution to do it?
/^(#{MODEL_NAME.join '|'})-\d+\.(#{FORMAT_TYPE.join '|'})$/
# => /^(product|brand)-\d+\.(jpg|png)$/

How to bring this two line variable assignment and comparision down to one line

I have two lines of ruby code to strip URLs (string) of certain objects from their specific extension (string) and test and possibly reassign if they match a specific string ('index').
page_path = page_object.url_string.chomp(page_object.extension_string)
page_path = '' if page_path == 'index'
My actual variable names are different (shorter) of course, the ones above are just for better illustration.
Is it suitable and possible to get this done in one, elegant line of ruby code?
a possible solution
page_path = page_object.url_string.
chomp(page_object.extension_string).
sub(/\Aindex\Z/,'')

Ruby regex: split string with match beginning with either a newline or the start of the string?

Here's my regular expression that I have for this. I'm in Ruby, which — if I'm not mistaken — uses POSIX regular expressions.
regex = /(?:\n^)(\*[\w+ ?]+\*)\n/
Here's my goal: I want to split a string with a regex that is *delimited by asterisks*, including those asterisks. However: I only want to split by the match if it is prefaced with a newline character (\n), or it's the start of the whole string. This is the string I'm working with.
"*Friday*\nDo not *break here*\n*But break here*\nBut again, not this"
My regular expression is not splitting properly at the *Friday* match, but it is splitting at the *But break here* match (it's also throwing in a here split). My issue is somewhere in the first group, I think: (?:\n^) — I know it's wrong, and I'm not entirely sure of the correct way to write it. Can someone shed some light? Here's my complete code.
regex = /(?:\n^)(\*[\w+ ?]+\*)\n/
str = "*Friday*\nDo not *break here*\n*But break here*\nBut again, not this"
str.split(regex)
Which results in this:
>>> ["*Friday*\nDo not *break here*", "*But break here*", "But again, not this"]
I want it to be this:
>>> ["*Friday*", "Do not *break here*", "*But break here*", "But again, not this"]
Edit #1: I've updated my regex and result. (2011/10/18 16:26 CST)
Edit #2: I've updated both again. (16:32 CST)
What if you just add a '\n' to the front of each string. That simplifies the processing quite a bit:
regex = /(?:\n)(\*[\w+ ?]+\*)\n/
str = "*Friday*\nDo not *break here*\n*But break here*\nBut again, not this"
res = ("\n"+str).split(regex)
res.shift if res[0] == ""
res
=> [ "*Friday*", "Do not *break here*",
"*But break here*", "But again, not this"]
We have to watch for the initial extra match but it's not too bad. I suspect someone can shorten this a bit.
Groups 1 & 2 of the regex below :
(?:\A|\\n)(\*.*?\*)|(?:\A|\\n)(.*?)(?=\\n|\Z)
Will give you your desired output. I am no ruby expert so you will have to create the list yourself :)
Why not just split at newlines? From your example, it looks that's what you're really trying to do.
str.split("\n")

A good way to insert a string before a regex match in Ruby

What's a good way to do this? Seems like I could use a combination of a few different methods to achieve what I want, but there's probably a simpler method I'm overlooking. For example, the PHP function preg_replace will do this. Anything similar in Ruby?
simple example of what I'm planning to do:
orig_string = "all dogs go to heaven"
string_to_insert = "nice "
regex = /dogs/
end_result = "all nice dogs go to heaven"
It can be done using Ruby's "gsub", as per:
http://railsforphp.com/2008/01/17/regular-expressions-in-ruby/#preg_replace
orig_string = "all dogs go to heaven"
end_result = orig_string.gsub(/dogs/, 'nice \0')
result = subject.gsub(/(?=\bdogs\b)/, 'nice ')
The regex checks for each position in the string whether the entire word dogs can be matched there, and then it inserts the string nice there.
The word boundary anchors \b ensure that we don't accidentally match hotdogs etc.

Ruby MatchData class is repeating captures, instead of including additional captures as it "should"

Ruby 1.9.1, OSX 10.5.8
I'm trying to write a simple app that parses through of bunch of java based html template files to replace a period (.) with an underscore if it's contained within a specific tag. I use ruby all the time for these types of utility apps, and thought it would be no problem to whip up something using ruby's regex support. So, I create a Regexp.new... object, open a file, read it in line by line, then match each line against the pattern, if I get a match, I create a new string using replaceString = currentMatch.gsub(/./, '_'), then create another replacement as whole string by newReplaceRegex = Regexp.escape(currentMatch) and finally replace back into the current line with line.gsub(newReplaceRegex, replaceString) Code below, of course, but first...
The problem I'm having is that when accessing the indexes within the returned MatchData object, I'm getting the first result twice, and it's missing the second sub string it should otherwise be finding. More strange, is that when testing this same pattern and same test text using rubular.com, it works as expected. See results here
My pattern:
(<(?:WEBOBJECT|webobject) (?:NAME|name)=(?:[a-zA-Z0-9]+.)+(?:[a-zA-Z0-9]+)(?:>))
Text text:
<WEBOBJECT NAME=admin.normalMode.someOtherPatternWeDontWant.moreThatWeDontWant>moreNonMatchingText<WEBOBJECT NAME=admin.SecondLineMatch>AndEvenMoreNonMatchingText
Here's the relevant code:
tagRegex = Regexp.new('(<(?:WEBOBJECT|webobject) (?:NAME|name)=(?:[a-zA-Z0-9]+\.)+(?:[a-zA-Z0-9]+)(?:>))+')
testFile = File.open('RegexTestingCompFix.txt', "r+")
lineCount=0
testFile.each{|htmlLine|
lineCount += 1
puts ("Current line: #{htmlLine} at line num: #{lineCount}")
tagMatch = tagRegex.match(htmlLine)
if(tagMatch)
matchesArray = tagMatch.to_a
firstMatch = matchesArray[0]
secondMatch = matchesArray[1]
puts "First match: #{firstMatch} and second match #{secondMatch}"
tagMatch.captures.each {|lineMatchCapture|
puts "Current capture for tagMatches: #{lineMatchCapture} of total match count #{matchesArray.size}"
#create a new regex using the match results; make sure to use auto escape method
originalPatternString = Regexp.escape(lineMatchCapture)
replacementRegex = Regexp.new(originalPatternString)
#replace any periods with underscores in a copy of lineMatchCapture
periodToUnderscoreCorrection = lineMatchCapture.gsub(/\./, '_')
#replace original match with underscore replaced copy within line
htmlLine.gsub!(replacementRegex, periodToUnderscoreCorrection)
puts "The modified htmlLine is now: #{htmlLine}"
}
end
}
I would think that I should get the first tag in matchData[0] then the second tag in matchData1, or, what I'm really doing because I don't know how many matches I'll get within any given line is matchData.to_a.each. And in this case, matchData has two captures, but they're both the first tag match
which is: <WEBOBJECT NAME=admin.normalMode.someOtherPatternWeDontWant.moreThatWeDontWant>
So, what the heck am I doing wrong, why does rubular test give me the expected results?
You want to use the on String#scan instead of the Regexp#match:
tag_regex = /<(?:WEBOBJECT|webobject) (?:NAME|name)=(?:[a-zA-Z0-9]+\.)+(?:[a-zA-Z0-9]+)(?:>)/
lines = "<WEBOBJECT NAME=admin.normalMode.someOtherPatternWeDontWant.moreThatWeDontWant>moreNonMatchingText\
<WEBOBJECT NAME=admin.SecondLineMatch>AndEvenMoreNonMatchingText"
lines.scan(tag_regex)
# => ["<WEBOBJECT NAME=admin.normalMode.someOtherPatternWeDontWant.moreThatWeDontWant>", "<WEBOBJECT NAME=admin.SecondLineMatch>"]
A few recommendations for next ruby questions:
newlines and spaces are your friends, you don't loose points for using more lines on your code ;-)
use do-end on blocks instead of {}, improves readability a lot
declare variables in snake case (hello_world) instead of camel case (helloWorld)
Hope this helps
I ended up using the String.scan approach, the only tricky point there was figuring out that this returns an array of arrays, not a MatchData object, so there was some initial confusion on my part, mostly due to my ruby green-ness, but it's working as expected now. Also, I trimmed the regex per Trevoke's suggestion. But snake case? Never...;-) Anyway, here goes:
tagRegex = /(<(?:webobject) (?:name)=(?:\w+\.)+(?:\w+)(?:>))/i
testFile = File.open('RegexTestingCompFix.txt', "r+")
lineCount=0
testFile.each do |htmlLine|
lineCount += 1
puts ("Current line: #{htmlLine} at line num: #{lineCount}")
oldMatches = htmlLine.scan(tagRegex) #oldMatches thusly named due to not explicitly using Regexp or MatchData, as in "the old way..."
if(oldMatches.size > 0)
oldMatches.each_index do |index|
arrayMatch = oldMatches[index]
aMatch = arrayMatch[0]
#create a new regex using the match results; make sure to use auto escape method
replacementRegex = Regexp.new(Regexp.escape(aMatch))
#replace any periods with underscores in a copy of lineMatchCapture
periodToUnderscoreCorrection = aMatch.gsub(/\./, '_')
#replace original match with underscore replaced copy within line, matching against the new escaped literal regex
htmlLine.gsub!(replacementRegex, periodToUnderscoreCorrection)
puts "The modified htmlLine is now: #{htmlLine}"
end # I kind of still prefer the brackets...;-)
end
end
Now, why does MatchData work the way it does? It seems like it's behavior is a bug really, and certainly not very useful in general if you can't get it provide a simple means of accessing all the matches. Just my $.02
Small bits:
This regexp helps you get "normalMode" .. But not "secondLineMatch":
<webobject name=\w+\.((?:\w+)).+> (with option 'i', for "case insensitive")
This regexp helps you get "secondLineMatch" ... But not "normalMode":
<webobject name=\w+\.((?:\w+))> (with option 'i', for "case insensitive").
I'm not really good at regexpt but I'll keep toiling at it.. :)
And I don't know if this helps you at all, but here's a way to get both:
<webobject name=admin.(\w+) (with option 'i').

Resources