Need better regex solution in Ruby - ruby

I have following code:
date_time = Time.now.strftime('%Y%m%d%H%M%S')
name = "builder-#{date_time}" # builder-20150923125450
if some_condition
name.sub!("#{date_time}", "one-#{date_time}") # builder-one-20150923125450
end
Above code is working fine.
But I think it could be better as I feel like I am repeating #{date_time} twice here.
I have heard of regex capture and replace. Can we use it here? If yes, how?

To utilize capturing mechanism, you need to use round brackets round a subpattern that you would like to refer to using a back-reference in the replacement string.
Here is an example:
date_time = Time.now.strftime('%Y%m%d%H%M%S')
name = "builder-#{date_time}"
puts name.sub(/^([^-]*-)/, "\\1one-")
See IDEONE demo
The ^([^-]*-) matches and captures all characters other than - from the beginning of the string (^) and a hyphen, and then we refer to the text with \\1 in the replacement string.
Refer to Use Parentheses for Grouping and Capturing at Regular-Expressions.info for more details.
A more optimal way is using a ternary operator when initializing name variable:
a = 1
date_time = Time.now.strftime('%Y%m%d%H%M%S')
name = "builder-" + (some_condition ? "one-" : "") + "#{date_time}"
IDEONE demo

Strategy one - precalculate the prefix:
date_time = Time.now.strftime('%Y%m%d%H%M%S')
prefix = some_condition ? 'builder-one-' : 'builder-'
name = "#{prefix}#{date_time}"
The string 'builder-' is repeated twice here. Obviously, you can DRY it even more, but it's an overkill IMHO.
Strategy two - use a lookahead:
date_time = Time.now.strftime('%Y%m%d%H%M%S')
name = "builder-#{date_time}"
name.sub!(/(?=#{date_time})/, "one-") if some_condition
Now date_time appears only twice. I wouldn't say it's a great improvement. I wouldn't say there is much of a problem to begin with.

"builder-" + ("one-" if some_condition).to_s + date_time
date_time = "right now"
some_condition = true
"builder-" + ("one-" if some_condition).to_s + date_time
#=> "builder-one-right now"
some_condition = false
"builder-" + ("one-" if some_condition).to_s + date_time
#=> "builder-right now"
Note that:
("one-" if false).to_s #=> nil.to_s => ""

Related

Adding backreferenced value to its replacement

I am trying to add a number from a backreference to another number, but I seem to get only concatenation:
textStr = "<testsuite errors=\"0\" tests=\"4\" time=\"4.867\" failures=\"0\" name=\"TestRateUs\">"
new_str = textStr.gsub(/(testsuite errors=\"0\" tests=\")(\d+)(\" time)/, '\1\2+4\3')
# => "<testsuite errors=\"0\" tests=\"4+4\" time=\"4.867\" failures=\"0\" name=\"TestRateUs\">"
I tried also using to_i on the backreferenced value, but I can't get the extracted value to add. Do I need to do something to the value to make it addable?
If you are manipulating XML, I'd suggest using some specific library for that. In this answer, I just want to show how to perform operations on the submatches.
You can sum up the values inside a block:
textStr="<testsuite errors=\"0\" tests=\"4\" time=\"4.867\" failures=\"0\" name=\"TestRateUs\">"
new_str = textStr.gsub(/(testsuite errors=\"0\" tests=\")(\d+)(\" time)/) do
Regexp.last_match[1] + (Regexp.last_match[2].to_i + 4).to_s + Regexp.last_match[3]
end
puts new_str
See IDEONE demo
If we use {|m|...} we won't be able to access captured texts since m is equal to Regexp.last_match[0].to_s.

Create regular expression from Array of search terms ruby

Is there a way / gem to create regular expressions with some basic search parameters.
e.g.
Search = ["\"German Shepherd\"","Collie","poodle", "Miniature Schnauzer"]
Such that the regexp will search (case insensitively) for:
"German Shepherd" - exactly
OR
"Collie"
OR
"poodle"
OR
"Miniature" AND "Schnauzer"
So in this case something like:
/German\ Shepherd|Collie|poodle|(?=.*Miniature)(?=.*Schnauzer).+/i
(Open to suggestions of better ways of doing the last bit...)
If I understood the question properly, here you go:
regexps = ["\"German Shepherd\"","Collie","poodle", "Miniature Schnauzer"]
# those in quotes
greedy = regexps.select { |re| re =~ /\A['"].*['"]\z/ } # c'"mon, parser
# the rest unquoted
non_greedy = (regexps - greedy).map(&:split).flatten
# concatenating... ⇓⇓⇓ get rid of quotes
all = Regexp.union(non_greedy + greedy.map { |re| re[1...-1] })
#⇒ /Collie|poodle|Miniature|Schnauzer|German\ Shepherd/
UPD
I finally got what is to be done with Miniature Schnauzer (please see a comment below for further explanation.) That said, these words are to be permuted and joined with non-greedy .*?:
non_greedy = (regexps - greedy).map(&:split).map do |re|
# single word? YES : NO, permute and join
re.length < 2 ? re : re.permutation.map { |p| Regexp.new p.join('.*?') }
end.flatten
all = Regexp.union(non_greedy + greedy.map { |re| re[1...-1] })
#=> /Collie|poodle|(?-mix:Miniature.*?Schnauzer)|(?-mix:Schnauzer.*?Miniature)|German\ Shepherd/

Can we use the relational operator in gsub?

I need to replace the . character with . \n in the following string format. But, the constraint is, don't replace the . character with .\n in following pattern string only.
"test was done and was negative. Urine dipstick: ph = 6\\n \\342\\200\\242 spec. Grav. = 1.015"
I need the following output, like
"test was done and was negative. \n Urine dipstick: ph = 6\\n \\342\\200\\242 spec. Grav. = 1.015"
The constraint is => "spec. Grav. = 1.015".
str = "test was done and was negative. Urine dipstick: ph = 6\\n \\342\\200\\242 spec. Grav. = 1.015"
puts str.sub('. ', ".\n")
#=> test was done and was negative.
#=> Urine dipstick: ph = 6\n \342\200\242 spec. Grav. = 1.015
String.sub only substitutes the first match.
str.gsub(/\.(?! (Grav| =))/, ".\n")
should do the job.
Brief explanation
\. matches any .
(?!) denotes a negative look-ahead. That means that it won't match anything found in these brackets.
(Grav| =) hence a dot followed by either Grav or = won't be matched.
You want this?
str.gsub(/\.(?!\n)/, "\.\n")

Issue dealing with white space with Ruby regular expressions

I'm trying to write a simple script expression that allows me to identify the java files in a directory that have a private constructor. I have had some luck but I want my script to acknowledge there is white space between the access modifier and the constructor name but not care if it is a space or n spaces or a tab or n tabs etc.
I am trying to use...
"private\s+"+object_name
but the + (1 or more) is not finding a constructor with 2 spaces between the modifier and the constructor name.
I know I am missing something. Any help would be greatly appreciated.
Thanks.
Here is the full code if it helps...
!#/usr/bin/ruby
path = ARGV[0]
if path.nil?
puts "missing path argument"
exit
end
entries = Dir.entries( path )
entries.each do |file_name|
file_name = file_name.rstrip
if ( file_name.end_with? "java" )
text = File.read( path+file_name )
object_name = file_name.chomp( ".java" )
search_str = "private\s+"+object_name
matches = text.match( Regexp.escape( search_str ) )
if ( !matches.nil? && matches.length > 0 )
puts matches
end
end
end
I think you want to escape the \ in your Ruby string and also Regexp.escape your object name and not the whole regex including the whitespace matcher, e.g.,
[...]
search_regex = Regexp.new("private\\s+" + Regexp.escape(object_name))
matches = text.match(search_regex)
As #LBg also points out, if you want to use + concatenation, better to use single quotes that won't require escaping the \. Or use doubles with substitution as in:
search_regex = Regexp.new("private\\s+#{Regexp.escape(object_name)}")
A double-quoted string reads "\s" as " ", no problems with that, but prefer use single-quoted in this case. Regexp.escape removes the funcionality of the regex's symbols of the string. private + ("\s" is " ") is converted to private\ \+ and, with match, will try to find the string private +object_name, what is not what you want. Remove the Regexp.escape and it should work well.

Way to partially match a Ruby string using Regexp

I'm working on 2 cases:
assume I have those var:
a = "hello"
b = "hello-SP"
c = "not_hello"
Any partial matches
I want to accept any string that has the variable a inside, so b and c would match.
Patterned match
I want to match a string that has a inside, followed by '-', so b would match, c does not.
I am having problem, because I always used the syntax /expression/ to define Regexp, so how dynamically define an RegExp on Ruby?
You can use the same syntax to use variables in a regex, so:
reg1 = /#{a}/
would match on anything that contains the value of the a variable (at the time the expression is created!) and
reg2 = /#{a}-/
would do the same, plus a hyphen, so hello- in your example.
Edit: As Wayne Conrad points out, if a contains "any characters that would have special meaning in a regular expression," you need to escape them. Example:
a = ".com"
b = Regexp.new(Regexp.escape(a))
"blah.com" =~ b
Late to comment but I wasn't able to find what I was looking for.The above mentioned answers didn't help me.Hope it help someone new to ruby who just wants a quick fix.
Ruby Code:
st = "BJ's Restaurant & Brewery"
#take the string you want to match into a variable
m = (/BJ\'s/i).match(string) #(/"your regular expression"/.match(string))
# m has the match #<MatchData "BJ's">
m.to_s
# this will display the match
=> "BJ's"

Resources