I am looking for a regular expression in Ruby to capture a sentence that has any sort of number in it.
For instance, I need to capture all of the following:
"5 different ways to do it"
"2 x 2 is certainly 4"
"there are 15 different things"
"Try to get to 10"
I only want to capture sentences with a number within, but that has nothing else before or after the number. I don't want to include things like:
"$2 billion dollars"
"The 5x effect"
It has to be just a sequence for 1 or more numbers at the beginning, middle, or end of a sentence.
Thanks.
You probably want something like:
/^.*(?<!\S)\d+(?!\S).*$/
Which will match a number and "look-around" for a non-space.
This
(s =~ /(^|\s)\d+(\s|$)/) ? s : nil
will return the string s if it contains at least one non-negative integer, that is:
the entire string,
at the beginning of the string followed by a whitespace character,
at the end the string preceded by a whitespace character, or
is both preceded and followed by a whitespace character.
Related
The following question was posted by #ruhroe about an hour ago. I was about to post an answer when it was taken down. That's unfortunate, as I thought it was rather interesting. I'm putting it back up in case the OP sees this and also to give others an opportunity to post solutions.
The original question (which I've edited):
The problem is to split a string on some spaces in the string, based on criteria which depend in part on a number given by the user. If that number were, say, 5, each substring would contain either:
one word having 5 or more characters or
as many consecutive words (separated by spaces) as possible, provided the resulting string has at most 5 characters.
For example, if the string were:
"abcdefg fg hijkl mno pqrs tuv wx yz"
the result would be:
["abcdefg", "fg", "hijkl", "mno", "pqrs", "tuv", "wx yz"]
"abcdefg" is on a separate line because it has at least five characters.
"fg" is on a separate line because "fg" contains 5 or few characters and when combined with the following word, with a space between them, the resulting string, "fg hijkl", contains more than 5 characters.
"hijkl" is on a separate line because it satisfies both criteria.
How can I do that?
I believe this does it:
str = "abcdefg fg hijkl e mn pqrs tuv wx yz"
str.scan(/\b(?:\w{5,}|\w[\w\s]{0,3}\w|\w)\b/)
#=> ["abcdefg", "fg", "hijkl", "e mn", "pqrs", "tuv", "wx yz"]
As you iterate through the words in your collection (splitting the original string up into words should be trivial), it seems like there are three possible scenarios:
It's a blank line, and we should insert the current word into the line
It's a non-blank line, and the word can fit
It's a non-blank line, and the word can't fit and it should go into a new line
Something like this should work (note - I haven't tested this much outside of your solution. You'll definitely want to do that):
words.each do |word|
if line.blank?
# this is a new line, so start it with the current word
line << word
elsif word_can_fit_line?(line, word, length)
# the word fits, so append it to the current line
line << " #{word}"
else
# the word doesn't fit, so keep this line and start a new one with
# the current word
lines << line
line = word
end
end
# add the last line and we're done
lines << line
lines
Note that the implementation of word_can_fit_line? should be trivial - you just want to see if the current line length, plus a space, plus the word length, is less than or equal to your desired line length.
Is there a better way to write the following regular expression in Ruby? The first regex matches a string that begins with a (lower case) consonant, the second with a vowel.
I'm trying to figure out if there's a way to write a regular expression that matches the negative of the second expression, versus writing the first expression with several ranges.
string =~ /\A[b-df-hj-np-tv-z]/
string =~ /\A[aeiou]/
The statement
$string =~ /\A[^aeiou]/
will test whether the string starts with a non-vowel character, which includes digits, punctuation, whitespace and control characters. That is fine if you know beforehand that the string begins with a letter, but to check that it starts with a consonant you can use forward look-ahead to test that it starts with both a letter and a non-vowel, like this
$string =~ /\A(?=[^aeiou])(?=[a-z])/i
To match an arbitrary number of consonants, you can use the sub-expression (?i:(?![aeiou])[a-z]) to match a consonant. It is atomic, so you can put a repetition count like {3} right after it. For example, this program finds all the strings in a list that contain three consonants in a row
list = %w/ aab bybt xeix axei AAsE SAEE eAAs xxsa Xxsr /
puts list.select { |word| word =~ /\A(?i:(?![aeiou])[a-z]){3}/ }
output
bybt
xxsa
Xxsr
I modified the answer provided by #Alexander Cherednichenko in order to get rid of the if statements.
/^[^aeiou\W]/i.match(s) != nil
If you want to catch a string that doesn't start with vowels, but only starts with consonants you can use this code below. It returns true if a string starts with any letter other than A, E, I, O, U. s is any string we give to a function
if /^[^aeiou\W]/i.match(s) == nil
return false
else
return true
end
i added at the end to make regular expression case insensitive.
\W is used to catch any non-word character, for example if a string starts with a digit like: "1something"
[^aeiou] means a range of character except a e i o u
And we put ^ at the beginning before [ to indicate that the following range [^aeiou\W] if for the 1st character
Note that ^[^aeiou\W] pattern is not correct because it also matches a line that starts with a digit, or underscore. Borodin's solution is working well, but there is one more possible solution without lookaheads, based on character class subtraction (more here) and using the more contemporary Regexp#match?:
/\A[a-z&&[^aeiou]]/i.match?(word)
See the Rubular demo.
Details
\A - start of a string (^ in Ruby is start of any line)
[a-z&&[^aeiou]] - an a-z character range matching any ASCII letter (/i flag makes it case insensitive) except for the aeiou chars.
See the Ruby demo:
test = %w/ 1word _word ball area programming /
puts test.select { |w| /\A[a-z&&[^aeiou]]/i.match?(w) }
# => ['ball', 'programming']
I'm trying to group a string by three (but could be any number) characters at a time. Using this code:
"this gets three at a time".scan(/\w\w\w/)
I get:
["thi","get","thr","tim"]
But what I'm trying to get is:
["thi","sge","tst","hre","eat","ati","me"]
\w matches letters digits and underscores (i.e. it's shorthand for [a-zA-Z0-9_]), not spaces. It does not magically skip spaces though, as you seem to expect.
So you'll first have to remove the spaces:
"this gets three at a time".gsub(/\s+/, "").scan(/.../)
or non-word characters:
"this gets three at a time".gsub(/\W+/, "").scan(/.../)
before you match the three characters.
Although you should rather use
"this gets three at a time".gsub(/\W+/, "").scan(/.{1,3}/)
to also obtain the last 1 or 2, if the length is not divisible by 3.
"this gets three at a time".tr(" \t\n\r", "").scan(/.{1,3}/)
You can try these as well:
sentence = "this gets three at a time"
sentence[" "] = ""
sentence.scan(/\w\w\w/) // no change in regex
Or:
sentence = "this gets three at a time"
sentence[" "] = ""
sentence.scan(/.{1,3}/)
Or:
sentence = "this gets three at a time"
sentence[" "] = ""
sentence.scan(/[a-zA-Z]{1,3}/)
My user input is a string I need to split into two parts, (1) a partial phone number [any sequence of digits - . space, parens so I assume that is represented by /[\d\. \-\(\)]/ ] and (2) whatever follows (if anything).
For example
"88 comment" -> "88" & "comment"
"415-915 second part" --> "415-915" & "second part"
"(415) 915 part 2" --> "(415) 915" & "part 2"
"a note" --> "" & "a note"
"part 2" --> "" & "part 2"
As a relative newbie to ruby and regex, I have no idea how to extract multiple parts, and how to define the second part as being whatever comes after the first part (which basically means whatever comes after anything that doesn't match the first part)
Here's the regex (I'll explain below):
/^([-\d. ()]*)(.*)$/
^ means "start at the beginning of the string"
In ([-\d. ()]*), the * means "match any number of the previous character, and the parens mean to create a match group (this is how you will get the value later). So this is the first sequence.
In (.*), . means "match any single character", so .* means "match any number of any characters", it's basically a catch-all. The parens create a second match group.
$ means "finish at the end of the string"
So in ruby:
string =~ /^([-\d. ()]*)(.*)$/
puts $1.strip # is the phone number (with excess whitespace removed)
puts $2.strip # is the rest (with excess whitespace removed)
Try /([\d.\s()/-]*)(.+)/ The first group will capture the number, the second one the "other" part. I don't know ruby, so you have to implement that pattern yourself.
i need to break a string into an array,
as "2 + 3" should be as "2","+","3"
even "2+3" should be "2","+","3"
As long as the format is consistent (always a space between numbers and signs), NSArray's -componentsSeparatedByString: will work for you. If there's a possibility the string will appear like "2+3" or even "2 +3" you could try removing all whitespace characters with -stringByTrimmingCharactersInSet: then using -componentsSeparatedByCharactersInSet: with the sign characters you expect.