Regex strings in Ruby

Regex strings in Ruby - ruby

Input strings:
str1 = "$13.90 Price as Shown"
str2 = "$590.50 $490.00 Price as Selected"
str3 = "$9.90 or 5/$27.50 Price as Selected"
Output strings:
str1 = "13.90"
str2 = "490.00"
str3 = "9.90"
My code to get output:
str = str.strip.gsub(/\s\w{2}\s\d\/\W\d+.\d+/, "") # remove or 5/$27.50 from string
str = /\W\d+.\d+\s\w+/.match(str).to_s.gsub("$", "").gsub(" Price", "")
This code works fine for all 3 different types of strings. But how can I improve my code? Are there any better solutions?
Also guys can you give link to good regex guide/book?

A regex I suggested first is just a sum total of your regexps:
(?<=(?<!\/)\$)\d+.\d+(?=\s\w+)
See demo
Since it is next to impossible to compare numbers with regex, I suggest
Extracting all float numbers
Parse them as float values
Get the minimum one
Here is a working snippet:
def getLowestNumberFromString(input)
arr = input.scan(/(?<=(?<!\/)\$)\d+(?:\.\d+)?/)
arr.collect do |value|
value.to_f
end
return arr.min
end
puts getLowestNumberFromString("$13.90 Price as Shown")
puts getLowestNumberFromString("$590.50 $490.00 Price as Selected")
puts getLowestNumberFromString("$9.90 or 5/$27.50 Price as Selected")
The regex breakdown:
(?<=(?<!\/)\$) - assert that there is a $ symbol not preceded with / right before...
\d+ - 1 or more digits
(?:\.\d+)? - optionally followed with a . followed by 1 or more digits
Note that if you only need to match floats with decimal part, remove the ? and non-capturing group from the last subpattern (/(?<=(?<!\/)\$)\d+\.\d+/ or even /(?<=(?<!\/)\$)\d*\.?\d+/).

Supposing input can be relied upon to look like one of your three examples, how about this?
expr = /\$(\d+\.\d\d)\s+(?:or\s+\d+\/\$\d+\.\d\d\s+)?Price/
str = "$9.90 or 5/$27.50 Price as Selected"
str[expr, 1] # => "9.90"
Here it is on Rubular: http://rubular.com/r/CakoUt5Lo3
Explained:
expr = %r{
\$ # literal dollar sign
(\d+\.\d\d) # capture a price with two decimal places (assume no thousands separator)
\s+ # whitespace
(?: # non-capturing group
or\s+ # literal "or" followed by whitespace
\d+\/ # one or more digits followed by literal "/"
\$\d+\.\d\d # dollar sign and price
\s+ # whitespace
)? # preceding group is optional
Price # the literal word "Price"
}x
You might use it like this:
MATCH_PRICE_EXPR = /\$(\d+\.\d\d)\s+(?:or\s+\d+\/\$\d+\.\d\d\s+)?Price/
def match_price(input)
return unless input =~ MATCH_PRICE_EXPR
$1.to_f
end
puts match_price("$13.90 Price as Shown")
# => 13.9
puts match_price("$590.50 $490.00 Price as Selected")
# => 490.0
puts match_price("$9.90 or 5/$27.50 Price as Selected")
# => 9.9

My code works fine for all 3 types of strings. Just wondering how can
I improve that code
str = str.gsub(/ or \d\/[\$\d.]+/i, '')
str = /(\$[\d.]+) P/.match(str)
Ruby Live Demo
http://ideone.com/18XMjr

A better regex is probably: /\B\$(\d+\.\d{2})\b/
str = "$590.50 $490.00 Price as Selected"
str.scan(/\B\$(\d+\.\d{2})\b/).flatten.min_by(&:to_f)
#=> "490.00"

Assuming you simply want the smallest dollar value in each line:
r = /
\$ # match a dollar sign
\d+ # match one or more digits
\. # match a decimal point
\d{2} # match two digits
/x # extended mode
[str1, str2, str3].map { |s| s.scan(r).min_by { |s| s[1..-1].to_f } }
#=> ["$13.90", "$490.00", "$9.90"]
Actually, you don't have to use a regex. You could do it like this:
def smallest(str)
val = str.each_char.with_index(1).
select { |c,_| c == ?$ }.
map { |_,i| str[i..-1].to_f }.
min
"$%.2f" % val
end
smallest(str1) #=> "$13.90"
smallest(str2) #=> "$490.00"
smallest(str3) #=> "$9.90"

Related

Ruby, looping through a string deleting groups of characters until a desired output is achieved

I have a coding problem I solved and want to refactor. I know there has to be a cleaner way of doing what I did.
The goal is to write a method that takes a string of "!" and "?" and reduces the string by eliminating all odd groupings of each symbol.
Example - a string "????!!!" would have an odd grouping of "!!!" because there are three in a row. These would be deleted from the string.
If there is only one "!" or "?" its left because it is not in a group.
Ex -
remove("!????!!!?") answer == "!"
# => ("!????!!!?" --> "!?????" --> "!")
In the first string, the only odd grouping is "!!!", once removed, it leaves a new string with an odd grouping "?????". You remove the next odd grouping so you're left with "!". This fits the desired output.
Another example
remove("!???!!") == ""
# => ("!???!!" --> "!!!" --> "")
Current code:
def remove(s)
arr = [s]
i = 0
until i == arr[0].length
s = s.chars.chunk{|c|c}.map{ |n,a| a.join }.select{|x| x if x.length.even? || x.length <= 1}.join
arr << s
i += 1
end
return arr[-1]
end
My code solves this problem and all test cases. I have a suspicion that my until loop can be removed/refactored so that I could solve this problem in one line and have spent hours trying to figure it out with no luck.

Suppose
str = "???!!!???!"
If we first remove the two groups "???" we are left with "!!!!", which cannot be reduced further.
If we first remove the group "!!!" we are left with "??????!", which cannot be reduced further.
If we are permitted to remove all odd groups of either character without reference to the effect that either has on the other, we obtain !, which cannot be reduced further.
It's not clear what rule is to be used. Here are three possibilities and code to implement each.
I will use the following two regular expressions, and in the first two cases a helper method.
Rq = /
(?<!\?) # do not match a question mark, negative lookbehind
\? # match a question mark
(\?{2})+ # match two question marks one or more times
(?!\?) # do not match a question mark, negative lookahead
/x # free-spacing regex definition mode
which is commonly written /(?<!\?)\?(\?{2})+(?!\?)/.
Similarly,
Rx = /(?<!!)!(!{2})+(?!!)/
def sequential(str, first_regex, second_regex)
s = str.dup
loop do
size = s.size
s = s.gsub(first_regex,'').gsub(second_regex,'')
return s if s.size == size
end
end
I apply each of the three methods below to two example strings:
str1 = "???!!!???!"
str2 = 50.times.map { ['?', '!'].sample }.join
#=> "?!!!?!!!?!??????!!!?!!??!!???!?!????!?!!!?!?!???!?"
Replace all odd groups of "?" then odd groups of "!" then repeat until no further removals are possible
def question_before_exclamation(str)
sequential(str, Rq, Rx)
end
question_before_exclamation str1 #=> "!!!!"
question_before_exclamation str2 #=> "??!??!?!!?!?!!?"
Replace all odd groups of "!" then odd groups of "?" then repeat until no further removals are possible
def exclamation_before_question(str)
sequential(str, Rx, Rq)
end
exclamation_before_question str1 #=> "??????!"
exclamation_before_question str2 #=> "??!????!!?!?!!?!?!!?"
Replace all odd groups of both "?" and "!" then repeat until no further removals are possible
Rqx = /#{Rq}|#{Rx}/
#=> /(?-mix:(?<!\?)\?(\?{2})+(?!\?))|(?-mix:(?<!!)!(!{2})+(?!!))/
def question_and_explanation(str)
s = str.dup
loop do
size = s.size
s = s.gsub(Rqx,'')
return s if s.size == size
end
end
question_and_explanation str1 #=> "!"
question_and_explanation str2 #=> "??!?!!?!?!!?!?!!?"

I don't know the exact Ruby syntax for this, but you could simplify your solution by using regular expressions:
Gather all matches of consecutive characters
if all matches are of even length or 1 exit
Test if matches are an odd length
if an odd length, replace with the empty string
else do nothing
Goto step 1
A solution in Perl would be:
#!perl
use strict;
use warnings;
use feature qw(say);
my $string = '!????!!!?';
sub reduce {
my ($s) = #_;
while ( my #matches = $s =~ m/((.)\2+)/g ) {
last if ! grep { length($_) > 1 && length($_) % 2 == 1 } #matches;
foreach my $match ( #matches ) {
$s =~ s/\Q$match// if length($match) > 1 && length($match) % 2 == 1;
}
}
return $s;
}
say reduce($string);

I could be wrong (this is ruby, after all) but I don't think you'll find a one-liner for this because ruby's utility functions generally aren't recursive. But you can use regex to simplify your logic, at the very least:
def remove(s)
while s =~ /(?<!\!)\!([\!]{2})+(?!\!)/ || s =~ /(?<!\?)\?([\?]{2})+(?!\?)/
s.gsub! /(?<!\!)\!([\!]{2})+(?!\!)/, "" # remove odd !
s.gsub! /(?<!\?)\?([\?]{2})+(?!\?)/, "" # remove odd ?
end
return s
end
To make the regex less mind-boggling, it helps to look at them with 'a' instead of '?' and '!':
/(?<!a)a([a]{2})+(?!a)/ #regex for 'a'
(?<!a) #negative lookbehind: the match cannot start with an 'a'
a([a]{2})+ #the match should be an 'a' followed by 1 or more pairs
(?!a) #negative lookahead: the match cannot end with an 'a'

It should be simple enough with a regular expression replacement
def remove(string)
begin
original = string
string.gsub!(/(\!{3,})|(\?{3,})/) { |s| s.length.even? ? s : '' }
end until original == string
string
end
puts remove("!????!!!?").inspect # answer == "!"
puts remove("!???!!").inspect # answer == ""
puts remove("!????!!").inspect # answer == "!????!!"

Replacing hyphens in words with the next letter capitalized

I have a symbol like the following. Whenever the symbol contains the "-" hyphen mark, I want to remove it and upcase the subsequent letter.
I am able to do it like so:
sym = :'new-york'
str = sym.to_s.capitalize
/-(.)/.match(str)
str = str.gsub(/-(.)/,$1.capitalize)
=> "NewYork"
This required four lines. Is there a more elegant way to create CamelCase (upper CamelCase e.g. NewYork, NewJersey, BucksCounty) from hyphened words in Ruby?

Here's one way:
sym.to_s.split('-').map(&:capitalize).join #=> "NewYork"

sym.to_s.gsub(/(-|\A)./) { $&[-1].upcase }
or
sym.to_s.gsub(/(-|\A)./) { |m| m[-1].upcase }

r = /
([[:alpha:]]+) # match 1 or more letters in capture group 1
- # match a hyphen
([[:alpha:]]+) # match 1 or more letters in capture group 2
/x # free-spacing regex definition mode
sym = :'new-york'
sym.to_s.sub(r) { $1.capitalize + $2.capitalize }
#=> "NewYork"

Detect specific format of version number using regex

I'm looking to extract elements of an array containing a version number, where a version number is either at the start or end of a string or padded by spaces, and is a series of digits and periods but does not start or end with a period. For example "10.10 Thingy" and "Thingy 10.10.5" is valid, but "Whatever 4" is not.
haystack = ["10.10 Thingy", "Thingy 10.10.5", "Whatever 4", "Whatever 4.x"]
haystack.select{ |i| i[/(?<=^| )(\d+)(\.\d+)*(?=$| )/] }
=> ["10.10 Thingy", "Thingy 10.10.5", "Whatever 4"]
I'm not sure how to modify the regex to require at least one period so that "Whatever 4" is not in the results.

This is only a slight variant of Archonic's answer.
r = /
(?<=\A|\s) # match the beginning of the string or a space in a positive lookbehind
(?:\d+\.)+ # match >= 1 digits followed by a period in a non-capture group, >= 1 times
\d+ # match >= 1 digits
(?=\s|\z) # match a space or the end of the string in a positive lookahead
/x # free-spacing regex definition mode
haystack = ["10.10 Thingy", "Thingy 10.10.5", "Whatever 4", "Whatever 4.x"]
haystack.select { |str| str =~ r }
#=> ["10.10 Thingy", "Thingy 10.10.5"]
The question was not to return the version information, but to to return the strings that have correct version information. As a result there is no need for the lookarounds:
r = /
[\A\s\] # match the beginning of the string or a space
(?:\d+\.)+ # match >= 1 digits followed by a period in a non-capture group, >= 1 times
\d+ # match >= 1 digits
[\s\z] # match a space or the end of the string in a positive lookahead
/x # free-spacing regex definition mode
haystack.select { |str| str =~ r }
#=> ["10.10 Thingy", "Thingy 10.10.5"]
Suppose one wanted to obtain both the strings that contain valid versions and the versions contained in those strings. One could write the following:
r = /
(?<=\A|\s\) # match the beginning of string or a space in a pos lookbehind
(?:\d+\.)+ # match >= 1 digits then a period in non-capture group, >= 1 times
\d+ # match >= 1 digits
(?=\s|\z) # match a space or end of string in a pos lookahead
/x # free-spacing regex definition mode
haystack.each_with_object({}) do |str,h|
version = str[r]
h[str] = version if version
end
# => {"10.10 Thingy"=>"10.10", "Thingy 10.10.5"=>"10.10.5"}

Ah hah! I knew I was close.
haystack.select{ |i| i[/(?<=^| )(\d+)(\.\d+)+(?=$| )/] }
The asterisk at the end of (\.\d+)* was allowing that pattern to repeat any number of times, including zero times. You can limit that with (\.\d+){x,y} where x and y are the min and max times. You can also only identify a minimum with (\.\d+){x,}. In my case I wanted a minimum of once, which would be (\.\d+){1,}, however that's synonymous with (\.\d+)+. That only took half the day to figure out...

Regex matching except when pattern is after another pattern

I am looking to find method names for python functions. I only want to find method names if they aren't after "def ". E.g.:
"def method_name(a, b):" # (should not match)
"y = method_name(1,2)" # (should find `method_name`)
My current regex is /\W(.*?)\(/.

str = "def no_match(a, b):\ny = match(1,2)"
str.scan(/(?<!def)\s+\w+(?=\()/).map(&:strip)
#⇒ ["match"]
The regex comments:
negative lookbehind for def,
followed by spaces (will be stripped later),
followed by one or more word symbols \w,
followed by positive lookahead for parenthesis.
Sidenote: one should never use regexps to parse long strings for any purpose.

I have assumed that lines that do not contain "def" are of the form "[something]=[zero or more spaces][method name]".
R1 = /
\bdef\b # match 'def' surrounded by word breaks
/x # free-spacing regex definition mode
R2 = /
[^=]+ # match any characters other than '='
= # match '='
\s* # match >= 0 whitespace chars
\K # forget everything matched so far
[a-z_] # match a lowercase letter or underscore
[a-z0-9_]* # match >= 0 lowercase letters, digits or underscores
[!?]? # possibly match '!' or '?'
/x
def match?(str)
(str !~ R1) && str[R2]
end
match?("def method_name1(a, b):") #=> false
match?("y = method_name2(1,2)") #=> "method_name2"
match?("y = method_name") #=> "method_name"
match?("y = method_name?") #=> "method_name?"
match?("y = def method_name") #=> false
match?("y << method_name") #=> nil
I chose to use two regexes to be able to deal with both my first and penultimate examples. Note that the method returns either a method name or a falsy value, but the latter may be either false or nil.

Count capitalized of each sentence in a paragraph Ruby

I answered my own question. Forgot to initialize count = 0
I have a bunch of sentences in a paragraph.
a = "Hello there. this is the best class. but does not offer anything." as an example.
To figure out if the first letter is capitalized, my thought is to .split the string so that a_sentence = a.split(".")
I know I can "hello world".capitalize! so that if it was nil it means to me that it was already capitalized
EDIT
Now I can use array method to go through value and use '.capitalize!
And I know I can check if something is .strip.capitalize!.nil?
But I can't seem to output how many were capitalized.
EDIT
a_sentence.each do |sentence|
if (sentence.strip.capitalize!.nil?)
count += 1
puts "#{count} capitalized"
end
end
It outputs:
1 capitalized
Thanks for all your help. I'll stick with the above code I can understand within the framework I only know in Ruby. :)

Try this:
b = []
a.split(".").each do |sentence|
b << sentence.strip.capitalize
end
b = b.join(". ") + "."
# => "Hello there. This is the best class. But does not offer anything."

Your post's title is misleading because from your code, it seems that you want to get the count of capitalized letters at the beginning of a sentence.
Assuming that every sentence is finishing on a period (a full stop) followed by a space, the following should work for you:
split_str = ". "
regex = /^[A-Z]/
paragraph_text.split(split_str).count do |sentence|
regex.match(sentence)
end
And if you want to simply ensure that each starting letter is capitalized, you could try the following:
paragraph_text.split(split_str).map(&:capitalize).join(split_str) + split_str

There's no need to split the string into sentences:
str = "It was the best of times. sound familiar? Out, damn spot! oh, my."
str.scan(/(?:^|[.!?]\s)\s*\K[A-Z]/).length
#=> 2
The regex could be written with documentation by adding x after the closing /:
r = /
(?: # start a non-capture group
^|[.!?]\s # match ^ or (|) any of ([]) ., ! or ?, then one whitespace char
) # end non-capture group
\s* # match any number of whitespace chars
\K # forget the preceding match
[A-Z] # match one capital letter
/x
a = str.scan(r)
#=> ["I", "O"]
a.length
#=> 2
Instead of Array#length, you could use its alias, size, or Array#count.

You can count how many were capitalized, like this:
a = "Hello there. this is the best class. but does not offer anything."
a_sentence = a.split(".")
a_sentence.inject(0) { |sum, s| s.strip!; s.capitalize!.nil? ? sum += 1 : sum }
# => 1
a_sentence
# => ["Hello there", "This is the best class", "But does not offer anything"]
And then put it back together, like this:
"#{a_sentence.join('. ')}."
# => "Hello there. This is the best class. But does not offer anything."
EDIT
As #Humza sugested, you could use count:
a_sentence.count { |s| s.strip!; s.capitalize!.nil? }
# => 1

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Regex strings in Ruby - ruby

My code works fine for all 3 types of strings. Just wondering how can I improve that code str = str.gsub(/ or \d\/[\$\d.]+/i, '') str = /(\$[\d.]+) P/.match(str) Ruby Live Demo http://ideone.com/18XMjr

A better regex is probably: /\B\$(\d+\.\d{2})\b/ str = "$590.50 $490.00 Price as Selected" str.scan(/\B\$(\d+\.\d{2})\b/).flatten.min_by(&:to_f) #=> "490.00"

Related

Ruby, looping through a string deleting groups of characters until a desired output is achieved

Replacing hyphens in words with the next letter capitalized

Detect specific format of version number using regex

Regex matching except when pattern is after another pattern

Count capitalized of each sentence in a paragraph Ruby

Categories

Resources