That's what I am doing:
c.scan(/[1-9]|1[0-2]/)
For some reason, it returns only numbers from 1 to 9, ignoring the second part. I tried experimenting a little bit, it seems that the method will search for 10-12 only if 1 is excluded from [1-9] part, e.g., c.scan(/[2-9]|1[0-2]/) will do. What is the reason?
P.S. I know that this method lacks lookbehinds and will search for numbers and "part of numbers" as well
Change the order of your patterns and add word boundaries if necessary.
c.scan(/\b(?:1[0-2]|[1-9])\b/)
The pattern before | is used first. So in our case, it matches all the numbers from 10 to 12. After that the next pattern, that is the one after | is used and now it matches all the remaining numbers ranges from 1 to 9. Note that this would match 9 in 59 also. So i suggest you to put your pattern inside a capturing or non-capturing group and add word boundary \b (matches between a word character and a non-word character) before and after to that group .
DEMO
| matches left to right, and the first part of the right side (1) is always matched by the left side. Reverse them:
c.scan(/1[0-2]|[1-9]/)
Here's another way you might consider extracting numbers between 1 and 12 (assuming that's what you want to do):
c = '14 0 11x 15 003 y12'
c.scan(/\d+/).map(&:to_i).select { |n| (1..12).cover?(n) }
#=> [11, 3, 12]
I've returned an array of integers, rather than strings, thinking that probably would be more useful, but if you want strings:
c.scan(/\d+/).map { |s| s.to_i.to_s }
.select { |s| ['10', '11', '12', *'1'..'9'].include?(s) }
#=> ["11", "3", "12"]
I see several advantages to this approach, versus using a single regex:
it's easy to understand;
the regex is simple;
it's easy to modify if the permissible values change; and
it can be broken into three pieces to facilitate testing.
I have street names and numbers in a file, like so:
Sokolov 19, 20, 23 ,25
Hertzl 80,82,84,86
Hertzl 80a,82b,84e,90
Aba Hillel Silver 2,3,5,6,
Weizman 8
Ahad Ha'am 9 13 29
I parse the lines one by one with regex. I want a regex that will find and match:
The name of the street,
The street numbers with its possible a,b,c,d attached.
I've come up with this mean while:
/(\D{2,})\s+(\d{1,3}[a-d|א-ד]?)(?:[,\s]{1,3})?/
It finds the street name and first number. I need to find all the numbers.
I don't want to use two separate regex's if possible, and I prefer not to use Ruby's scan but just have it in one regex.
You can use regex to find all the numbers, with their separators:
re = /\A(.+?)\s+((?:\d+[a-z]*[,\s]+)*\d+[a-z]*)/
txt = "Sokolov 19, 20, 23 ,25
Hertzl 80,82,84,86
Hertzl 80a,82b,84e,90
Aba Hillel Silver 2,3,5,6,
Weizman 8
Ahad Ha'am 9 13 29"
matches = txt.lines.map{ |line| line.match(re).to_a[1..-1] }
p matches
#=> [["Sokolov", "19, 20, 23 ,25"],
#=> ["Hertzl", "80,82,84,86"],
#=> ["Hertzl", "80a,82b,84e,90"],
#=> ["Aba Hillel Silver", "2,3,5,6"],
#=> ["Weizman", "8"],
#=> ["Ahad Ha'am", "9 13 29"]]
The above regex says:
\A Starting at the front of the string
(…) Capture the result
.+? Find one or more characters, as few as possible that make the rest of this pattern match.
\s+ Followed by one or more whitespace characters (which we don't capture)
(…) Capture the result
(?:…)* Find zero or more of what's in here, but don't capture them
\d+ One or more digits (0–9)
[a-z]* Zero or more lowercase letters
[,\s]+ One or more commas and/or whitespace characters
\d+ Followed by one or more digits
[a-z]* And zero or more lowercase letters
However, if you want to break the number up into pieces you will need to use scan or split or the equivalent.
result = matches.map{ |name,numbers| [name,numbers.scan(/[^,\s]+/)] }
p result
#=> [["Sokolov", ["19", "20", "23", "25"]],
#=> ["Hertzl", ["80", "82", "84", "86"]],
#=> ["Hertzl", ["80a", "82b", "84e", "90"]],
#=> ["Aba Hillel Silver", ["2", "3", "5", "6"]],
#=> ["Weizman", ["8"]],
#=> ["Ahad Ha'am", ["9", "13", "29"]]]
This is because regex captures inside a repeating group do not capture each repetition. For example:
re = /((\d+) )+/
txt = "hello 11 2 3 44 5 6 77 world"
p txt.match(re)
#=> #<MatchData "11 2 3 44 5 6 77 " 1:"77 " 2:"77">
The whole regex matches the whole string, but each capture only saves the last-seen instance. In this case, the outer capture only gets "77 " and the inner capture only gets "77".
Why do you prefer not to use scan? This is what it is made for.
If you want your 3rd example to work, you need to have the [a-d] change to include the e in the range. After changing that you can use (\D{2,})\s+(\d{1,3}[a-e]?(?:[,\s]{1,3})*)*. Using the examples you gave I did some testing using Rubular.
Using some more groupings you can have the repetition on those last few conditions (which seem to be pretty tricky. This way the spacing and comma at the end will get caught in the repetition after consuming the space initially.
The only way around the limitation that you can only capture the last instance of a repeated expression is to write your regex for a single instance and let the regex machine do the repeating for you, as occurs with the global substitute options, admittedly similar to scan. Unfortunately, in that case, you have to match for either the street name or the street number and then have no way to easily associate the captured numbers with the captured names.
Regex is great at what it does, but when you try to extend its application beyond it's natural limitations, it's not pretty. ;-)
I want a regex that will find and match....
Do the street names also contain digits (0-9), other characters beside an apostrophe?
Are the street numbers based off arbitrary data? Is it always just an optional a, b, c, or d?
Are you needing a minimum and maximum limitation of string length?
Here are some possible options:
If you are unsure about what the street name contains, but know your street number pattern will be numbers with an optional letter, commas or spaces.
/^(.*?)\s+(\d+(?:[a-z]?[, ]+\d+)*)(?=,|$)/
See working demo
If the street names contain only letters with optional apostrophe's and the street numbers contain numbers with an optional letter, comma.
/^([a-zA-Z' ]+)\s+(\d+(?:[a-z]?[, ]+\d+)*)(?=,|$)/
See working demo
If your street name and street number pattern are always consistant, you could easily do.
/^([a-zA-Z' ]+)\s+([0-9a-z, ]+)$/
See working demo
I have various strings that represent time left coming in from a data feed. The formats look like this:
13:35
01:36
00:34
I want to use regex to change the formats to:
13:35 --> 13:35 (ok as-is)
01:36 --> 1:36 (removing leading 0)
00:34 --> 0:34 (remove first leading 0)
Currently, I'm doing this:
time_left.gsub(/\A0+/, '')
Accomplishes first two target formats, but not the third, which results in:
:34 (should be 0:34)
Your regex /\A0+/ removes all leading 0s, but it sounds like you just want to remove the first one. You just want /\A0/.
Regex are not the best choice for this. I'd go after this like, ... uh, this:
puts %w[
13:35
01:36
00:34
].map { |s|
"%0d:%0d" % s.split(':').map(&:to_i)
}
Which outputs:
13:35
1:36
0:34
Just take out the + that means 1 or more. So it will take two zeros if it has two zeros.
time_left.gsub(/\A0/, '')
If each of the times are in a separate string and you are applying the regex individually to them, then, you should not use gsub:
time_left.sub(/\A0/, "")
If is rather the case that all the times are in a single string, then you cannot use \A.
time_left.gsub(/(?!<\d)0/, "")
The second one will also remove zeros after the colon.
How about using a negative look-ahead:
/\A0+(?!:)/
( Tested in Perl )
I have a bunch of strings with opening hours in this format:
Mon-Fri: AM7:00-PM8:00\nSat-Sun: AM8:00-PM6:00
I can deal with the "AM" part by just removing it, but I'd like to convert the PM by
Removing "PM"
Adding 12 to the number before the ":"
Taking care of the fact that PM is sometimes double-digits (e.g. PM11:00)
There can be zero or more PM times in the string.
I'm not sure how to manipulate the time as a number. I've gotten this far:
opening_hours.sub! /PM([\d]?[\d]):/, "***\1***"
Which outputs things like this:
AM7:15-***\u0001***00
The '\u0001` may be due to Japanese characters in the string.
You can take advantage of the fact that String#gsub accepts a block. Something like this will do for you?
s = "Mon-Fri: AM7:00-PM8:00\nSat-Sun: AM8:00-PM11:00"
s2 = s.gsub('AM', '').gsub(/PM(\d+)/) do |match|
(match.gsub('PM', '').to_i + 12).to_s
end
s2 # => "Mon-Fri: 7:00-20:00\nSat-Sun: 8:00-23:00"
Have a look at this question, ruby has a class called datatime.
Convert 12 hr time to 24 hr format in Ruby
I am having quite the difficulty using regex in ruby to split a string along several delimiters these delimiters are:
,
/
&
and
each of these delimiters can have any amount of white space on either side of the delimiter but each item can contain a valid space.
a great example that I've been testing against is the string 1, 2 /3 and 4 12
what I would like is something around the lines of "1, 2 /3 and 4 12".split(regex) =>["1", "2", "3", "4 12"]
The closest I've been able to get is /\s*,|\/|&|and \s*/ but this generates ["1", " 2 ", "3 ", "4 12"] instead of the desired results.
Realize this is very close and I could simply all trim on each item, but being so close and knowing it can be done is sort of driving me mad. Hopefully someone can help me keep the madness at bay.
/\s*,|\/|&|and \s*/
This parses as /(\s*,)|\/|&|(and \s*)/. I.e. the leading \s* only applies to the comma and the trailing \s* only applies to "and". You want:
/\s*(,|\/|&|and )\s*/
Or, to avoid capturing:
/\s*(?:,|\/|&|and )\s*/
Try .scan:
irb(main):030:0> "1, 2 /3 and 4 12".scan(/\d+(?:\s*\d+)*/)
=> ["1", "2", "3", "4 12"]
You can try:
(?:\s*)[,\/](?:\s*)|(?:\s*)and(?:\s*)
But as Nakilon suggested, you may have better luck with scan instead of split.