ruby regex matching cent ¢ - ruby

I am having difficulty matching string "79¢ /lb" with this regex: (\$|¢)\d+(.\d{1,2})?
It works fine when the cent symbol appears in the beginning, but I don't know what needs to be added near the end of the string.
Basically I'm planning to extract a float value from this price tag, that is, 0.79, thanks in advance, I'm using ruby.

Well, that regex requires the $ or ¢ to be at the start of the string. To match 79¢ /lb, you'll need something like:
(\d+)¢
where the ¢ comes after the digits.
A single regex to match the many varied formats that you're likely to see will be a little more complex. I would suggest either doing it as multiple regexes (for simplicity), or asking another question here specifying the full range of strings you want to capture the prices from.

It's easiest to figure out the right regex when you consider each case separately. If I understand your question correctly, there are 4 cases:
cents, with the ¢ symbol before the price
cents, with the ¢ symbol after the price
dollars (and optional cents), with the $ symbol before the price
dollars (and optional cents), with the $ symbol after the price
First, write a regex for each case separately:
¢(\d{1,2})\b
\b(\d{1,2})¢
\$(\d+(?:\.\d{2})?)\b
\b(\d+(?:\.\d{2})?)\$
Then, combine them into a single regex:
regex = %r{
¢(\d{1,2})\b | # case 1
\b(\d{1,2})¢ | # case 2
\$(\d+(?:\.\d{2})?)\b | # case 3
\b(\d+(?:\.\d{2})?)\$ # case 4
}x
Then, match to your heart's content:
string_with_prices.scan(regex) do |match|
# If there was a match in the first two groups, it's for cents
cents = $1 || $2
# ...and the last two groups are dollars.
dollars = $3 || $4
if cents
puts "found price (cents): #{cents}"
elsif dollars
puts "found price (dollars): #{dollars}"
else
puts 'unknown match!'
end
end
Note: To test this code, I had to use 'c' instead of '¢' because Ruby was telling me invalid multibyte char (US-ASCII). To avoid this issue, use a different character encoding, or else figure out the encoded value of the '¢' character and embed it directly in the regex, e.g. %r{\x42} instead of %r{¢}.

Maybe you don't need to do everything in your reg exp;
#price is the string that contains the price
if price =~ /\$|¢/
value = string.match(/\d+/)
end
Or something along those lines.

Related

Regex for series of four digits each up to 100

I'm trying to write a regex to validate a string and accepts only a series of four comma-separated digits, each up to 100. Something like this would be valid:
20,30,40,50
and these invalid:
120,0,20,0
20,30,40,ss
invalid_string
Any thoughts?
They're used for CMYK colours. We just need to store them here, not use them.
Number Range and Subroutine
In Ruby 2+, for a compact regex, use this:
^([0-9]|[1-9][0-9]|100)(?:,\g<1>){3}$
Explanation
The ^ anchor asserts that we are at the beginning of the string
The parentheses around ([0-9]|[1-9][0-9]|100) match a number from 0 to 100 and define subroutine #1
(?:,\g<1>) matches one comma and the expression defined by subroutine # 1
The {3} quantifier repeats that three times
The $ anchor asserts that we are at the end of the string
I'd save myself the headache of using regex for a number related problem. Also the validation message will look akward so it's better to make your own:
validate :that_string_has_only_4_numbers_upto_100
def that_string_has_only_4_numbers_upto_100
errors.add(:str, 'is not valid.') unless str.split(/,/).all? { |n| 1..100 === n.to_i }
end
Unless you a re regex jedi guru like #zx81 :p.
^(?:\d{1,2},){3}\d{1,2}$
Try this

What is the best way to delimit a csv files thats contain commas and double quotes?

Lets say I have the following string and I want the below output without requiring csv.
this, "what I need", to, do, "i, want, this", to, work
this
what i need
to
do
i, want, this
to
work
This problem is a classic case of the technique explained in this question to "regex-match a pattern, excluding..."
We can solve it with a beautifully-simple regex:
"([^"]+)"|[^, ]+
The left side of the alternation | matches complete "quotes" and captures the contents to Group1. The right side matches characters that are neither commas nor spaces, and we know they are the right ones because they were not matched by the expression on the left.
Option 2: Allowing Multiple Words
In your input, all tokens are single words, but if you also want the regex to work for my cat scratches, "what I need", your dog barks, use this:
"([^"]+)"|[^, ]+(?:[ ]*[^, ]+)*
The only difference is the addition of (?:[ ]*[^, ]+)* which optionally adds spaces + characters, zero or more times.
This program shows how to use the regex (see the results at the bottom of the online demo):
subject = 'this, "what I need", to, do, "i, want, this", to, work'
regex = /"([^"]+)"|[^, ]+/
# put Group 1 captures in an array
mymatches = []
subject.scan(regex) {|m|
$1.nil? ? mymatches << $& : mymatches << $1
}
mymatches.each { |x| puts x }
Output
this
what I need
to
do
i, want, this
to
work
Reference
How to match (or replace) a pattern except in situations s1, s2, s3...
Article about matching a pattern unless...

Ruby regex to capture any string with a number

I am looking for a regular expression in Ruby to capture a sentence that has any sort of number in it.
For instance, I need to capture all of the following:
"5 different ways to do it"
"2 x 2 is certainly 4"
"there are 15 different things"
"Try to get to 10"
I only want to capture sentences with a number within, but that has nothing else before or after the number. I don't want to include things like:
"$2 billion dollars"
"The 5x effect"
It has to be just a sequence for 1 or more numbers at the beginning, middle, or end of a sentence.
Thanks.
You probably want something like:
/^.*(?<!\S)\d+(?!\S).*$/
Which will match a number and "look-around" for a non-space.
This
(s =~ /(^|\s)\d+(\s|$)/) ? s : nil
will return the string s if it contains at least one non-negative integer, that is:
the entire string,
at the beginning of the string followed by a whitespace character,
at the end the string preceded by a whitespace character, or
is both preceded and followed by a whitespace character.

What's wrong with this RegEx?

I'm trying to implement this in a small ruby script, and tested it on http://www.rubular.com/, where it worked perfectly. Not sure why its not performing in the actual script.
The RegEx: /(motion|links|sound|button|symbol)|(0.\d{8})|(\s\d{1}\s)|(\d{10}\s)/
The Text it's Against:
Trial ID: 1 | Trial Type: motion | Trick? 1
Click Time: 0.87913100 1302969732
Trial ID: 7 | Trial Type: button | Trick? 0
Click Time: 0.19817800 1302987043
etc. etc.
What I am trying to grab: Only the numbers, and the single word after "Trial Type". So for the first line of the example, I would only want " 1 motion 1 0.87913100 1302969732" to be returned. I also want to keep the space before the first number in each trial.
My short ruby script:
File.open('log.txt', 'r') do |file|
contents = file.readlines.to_s
regex = Regexp.new(/(motion|links|sound|button|symbol)|(0\.\d{8})|(\s\d{1}\s)|(\d{10}\s)/)
matchdata = regex.match(contents).to_a
matchdata.each do |match|
if match != nil
puts match
end
end
end
It only outputs two "1"s though. Hmm... I know its reading the file contents right, and when I tried an alternate simplet regex it worked fine.
Thanks for any help I get here!! : )
You want to use String#scan
matchdata = contents.scan(regex)
Also #Mike Penington is correct, you shouldn't have to do the if match != nil if you do it right. You have to clean up your regex as well. The pipe character in regex is a special character to denote match the left side OR the right side, and you have the litteral pipe character that you must escape.
You need to escape the literal pipes inside the regex, fill in other missing literals (like Trick, \?, Click\sTime:, remove some of the spaces, etc...), and insert regex spaces where appropriate... i.e.
regex = Regexp.new(/(motion|links|sound|button|symbol)\s\|\sTrick\?\s*\d\s*Click\s+Time:\s+(0\.\d{,8})\s(\d{10}))/)
EDIT: fixed parenthesis nesting in the original
If you know that the data follows a particular pattern, you can just follow that pattern in the regex, and pick up the portions you want with ( ).
/Trial ID: (\d+) \| Trial Type: (\w+) \| Trick\? (\d+) Click Time: ([\.\d]+) ([\.\d]+)/
The more you know previously about the data, the more specifically you can make the regex.
If you see some variations in the data, and the regex fails to match, then just relax the pattern:
If the Trail ID, Trail ID may include a decimal point, use [\.\d]+ instead of \d+.
If the space can be more than one, then replace it with []+
If the space can be a tab, or can be absent, use \s* or [ \t]*.
If the Trial ID: part may appear as a different phrase, replace it with .*?,
and so on.
If you are not sure how many spaces/tabs appear, use this:
/Trial\s*ID:\s*(\d+)\s*\|\s*Trial\s*Type:\s*(\w+)\s*\|\s*Trick\?\s*(\d+)\s*Click\s*Time:\s*([\.\d]+)\s+([\.\d]+)/
This is one of those times that trying to everything in a big regex makes you work too hard. Simplify things:
ary = [
'Trial ID: 1 | Trial Type: motion | Trick? 1 Click Time: 0.87913100 1302969732',
'Trial ID: 7 | Trial Type: button | Trick? 0 Click Time: 0.19817800 1302987043'
]
ary.each do |li|
numbers = li.scan(/[\d.]+/)
trial_type = li[/Trial Type: (\w+)/, 1]
puts "%d %s %d %f %d\n" % [numbers.first, trial_type, *numbers[1 .. -1]]
end
# >> 1 motion 1 0.879131 1302969732
# >> 7 button 0 0.198178 1302987043
Regex patterns are powerful, but people think it's macho to do everything in one big line. You have to weigh doing that with the increased work necessary to put together the regex in the first place, plus maintain it if something changes in the text being parsed later.

Ruby Regex match unless escaped with \

Using Ruby I'm trying to split the following text with a Regex
~foo\~\=bar =cheese~monkey
Where ~ or = denotes the beginning of match unless it is escaped with \
So it should match
~foo\~\=bar
then
=cheese
then
~monkey
I thought the following would work, but it doesn't.
([~=]([^~=]|\\=|\\~)+)(.*)
What is a better regex expression to use?
edit To be more specific, the above regex matches all occurrences of = and ~
edit Working solution. Here is what I came up with to solve the issue. I found that Ruby 1.8 has look ahead, but doesn't have lookbehind functionality. So after looking around a bit, I came across this post in comp.lang.ruby and completed it with the following:
# Iterates through the answer clauses
def split_apart clauses
reg = Regexp.new('.*?(?:[~=])(?!\\\\)', Regexp::MULTILINE)
# need to use reverse since Ruby 1.8 has look ahead, but not look behind
matches = clauses.reverse.scan(reg).reverse.map {|clause| clause.strip.reverse}
matches.each do |match|
yield match
end
end
What does "remove the head" mean in this context?
If you want to remove everything before a certain char, this will do:
.*?(?<!\\)= // anything up to the first "=" that is not preceded by "\"
.*?(?<!\\)~ // same, but for the squiggly "~"
.*?(?<!\\)(?=~) // same, but excluding the separator itself (if you need that)
Replace by "", repeat, done.
If your string has exactly three elements ("1=2~3") and you want to match all of them at once, you can use:
^(.*?(?<!\\)(?:=))(.*?(?<!\\)(?:~))(.*)$
matches: \~foo\~\=bar =cheese~monkey
| 1 | 2 | 3 |
Alternatively, you split the string using this regex:
(?<!\\)[=~]
returns: ['\~foo\~\=bar ', 'cheese', 'monkey'] for "\~foo\~\=bar =cheese~monkey"
returns: ['', 'foo\~\=bar ', 'cheese', 'monkey'] for "~foo\~\=bar =cheese~monkey"

Resources