This question already has answers here:
regex - matching non-necessarily consecutive occurrences
(4 answers)
Closed 3 months ago.
I'm trying to match a regex with a string as long as possible. This is the string to look in:
"xxaxxbxxbxbxxbxxbxxbxxdxx"
The pattern to match is:
"bcda"
The pattern is to be interpreted as follows:
b: There are several of them in the string. The first one should match.
c: There isn't one in the string, so nothing is returned.
d: There is just one near the end of the string. It should be returned.
a: There is one at the beginning of the string. Since b, c, and d were sought first and results are returned, a will not be returned.
The expected return is:
"bd"
It may be that regex match is not the correct way to accomplish this, but I'd like to ask for assistance with one. The basic question is this: can I use regex to generically find a substring that represents as much of a an ordered, but not necessarily consecutive, sequence of candidate characters as it possibly can? If so, how?
As #sawa explained, you cannot do this with a single regex. Here is a recursive solution.
def consecutive_matches(str, pattern)
return '' if str.empty? || pattern.empty?
ch, pat = pattern[0], pattern[1..-1]
i = str.index(ch)
if i
ch + consecutive_matches(str[i+1..-1], pat)
else
consecutive_matches(str, pat)
end
end
str = "xxaxxbxxbxbxxbxxbxxbxxdxx"
consecutive_matches(str, "bcda") #=> "bd"
consecutive_matches(str, "abcd") #=> "abd"
consecutive_matches(str, "dabc") #=> "d"
consecutive_matches(str, "cfgh") #=> ""
It is impossible to do that with a single regex match. A capture in a regex must be a substring of the original string. bd here is not, so there is no way to match that as a single capture.
Related
I am trying to return all words which have more than four letters in the below exercise.
def timed_reading(max_length, text)
var_b = text.split(" ")
var_b.map do |i|
if i.length >= max_length
return i
end
end
end
print timed_reading(4,"The Fox asked the stork, 'How is the soup?'")
# >> asked
I seem to get only one word.
If you want to filter a list and select only certain kinds of entries, use the select method:
var_b.select do |i|
i.length >= max_length
end
Where that's all you need.
The return i in the middle is confusing things, as that breaks out of the loop and returns a single value from the method itself. Remember that in Ruby, unlike others such as JavaScript, return is often implied and doesn't need to be spelled out explicitly.
Blocks don't normally have return in them for this reason unless they need to interrupt the flow and break out of the method itself.
You don't need to first extract all words from the string and then select those having at least four letters. Instead you can just extract the desired words using String#scan with a regular expression.
str = "The Fox asked the stork, 'How is the soup?'? Très bon?"
str.scan /\p{Alpha}{4,}/
#=> ["asked", "stork", "soup", "Très"]
The regular expression reads, "Match strings containing 4 or more letters". I've used \p{Alpha} (same as \p{L} and [[:alpha:]]) to match unicode letters. (These are documented in Regexp. Search for these expressions there.) You could replace \p{Alpha} with [a-zA-Z], but in that case "Très" would not be matched.
If you wish to also match digits, use \p{Alnum} or [[:alnum:]] instead. While \w also matches letters (English only) and digits, it also matches underscores, which you probably don't want in this situation.
Punctuation can be a problem when words are extracted from the string by splitting on whitespace.
arr = "That is a cow.".split
#=> ["That", "is", "a", "cow."]
arr.select { |word| word.size >= 4 }
#=> ["That", "cow."]
but "cow" has only three letters. If you instead used String#scan to extract words from the string you obtain the desired result.
arr = "That is a cow?".scan /\p{Alpha}+/
#=> ["That", "is", "a", "cow"]
arr.select { |word| word.size >= 4 }
#=> ["That"]
However, if you use scan you may as well use a regular expression to retrieve only words having at least 4 characters, and skip the extra step.
I am trying to do this test and there are bunch of solutions online and here but I first want to figure out why my solution is wrong even though it seems that it puts right results when I enter certain strings :
Here is what they are asking :
Write a method that takes in a string. Return the longest word in the
string. You may assume that the string contains only letters and
spaces.
You may use the String split method to aid you in your quest.
Here is my solution where I thought I could turn string into array, sort it from max length descending and then just print first element in that new string like this :
def longest_word(sentence)
sentence = sentence.split
sentence.sort_by! { |longest| -longest.length }
return sentence[0]
end
That doesn't seem to work obviously since their test gives me all false..here is the test :
puts("\nTests for #longest_word")
puts("===============================================")
puts(
'longest_word("short longest") == "longest": ' +
(longest_word('short longest') == 'longest').to_s
)
puts(
'longest_word("one") == "one": ' +
(longest_word('one') == 'one').to_s
)
puts(
'longest_word("abc def abcde") == "abcde": ' +
(longest_word('abc def abcde') == 'abcde').to_s
)
puts("===============================================")
So the question is why? And can I just fix my code or the idea is all wrong and I need to do it completely different?
str = "Which word in this string is longest?"
r = /[[:alpha:]]+/
str.scan(r).max_by(&:length)
#=> "longest"
This regular expression reads, "match one or more characters". The outer brackets constitute a character class, meaning one of the characters within the brackets must be matched.
To deal with words that are hyphenated or contain single quotes, the following is an imperfect modification1:
str = "Who said that chicken is finger-licken' good?"
r = /[[[:alpha:]]'-]+/
str.scan(r).max_by(&:length)
#=> "finger-licken'"
This regular expression reads, "match one or more characters that are a letter, apostrophe or hyphen". The outer brackets constitute a character class, meaning one of the characters within the brackets must be matched.
1 I've successfully used "finger-licken'" in scrabble.
I'd write it something like:
str = "Write a method that takes in a string"
str.split.sort_by(&:length).last # => "string"
I'm receiving a string that contains two numbers in a handful of different formats:
"344, 345", "334,433", "345x532" and "432 345"
I need to split them into two separate numbers in an array using split, and then convert them using Integer(num).
What I've tried so far:
nums.split(/[\s+,x]/) # split on one or more spaces, a comma or x
However, it doesn't seem to match multiple spaces when testing. Also, it doesn't allow a space in the comma version shown above ("344, 345").
How can I match multiple delimiters?
You are using a character class in your pattern, and it matches only one character. [\s+,x] matches 1 whitespace, or a +, , or x. You meant to use (?:\s+|x).
However, perhaps, a mere \D+ (1 or more non-digit characters) should suffice:
"345, 456".split(/\D+/).map(&:to_i)
R1 = Regexp.union([", ", ",", "x", " "])
#=> /,\ |,|x|\ /
R2 = /\A\d+#{R1}\d+\z/
#=> /\A\d+(?-mix:,\ |,|x|\ )\d+\z/
def split_it(s)
return nil unless s =~ R2
s.split(R1).map(&:to_i)
end
split_it("344, 345") #=> [344, 345]
split_it("334,433") #=> [334, 433]
split_it("345x532") #=> [345, 532]
split_it("432 345") #=> [432, 345]
split_it("432&345") #=> nil
split_it("x32 345") #=> nil
Your original regex would work with a minor adjustment to move the '+' symbol outside the character class:
"344 ,x 345".split(/[\s,x]+/).map(&:to_i) #==> [344,345]
If the examples are actually the only formats that you'll encounter, this will work well. However, if you have to be more flexible and accommodate unknown separators between the numbers, you're better off with the answer given by Wiktor:
"344 ,x 345".split(/\D+/).map(&:to_i) #==> [344,345]
Both cases will return an array of Integers from the inputs given, however the second example is both more robust and easier to understand at a glance.
it doesn't seem to match multiple spaces when testing
Yeah, character class (square brackets) doesn't work like this. You apply quantifiers on the class itself, not on its characters. You could use | operator instead. Something like this:
.split(%r[\s+|,\s*|x])
This is my expected result.
Input a string and get three returned string.
I have no idea how to finish it with Regex in Ruby.
this is my roughly idea.
match(/(.*?)(_)(.*?)(\d+)/)
Input and expected output
# "R224_OO2003" => R224, OO, 2003
# "R2241_OOP2003" => R2244, OOP, 2003
If the example description I gave in my comment on the question is correct, you need a very straightforward regex:
r = /(.+)_(.+)(\d{4})/
Then:
"R224_OO2003".scan(r).flatten #=> ["R224", "OO", "2003"]
"R2241_OOP2003".scan(r).flatten #=> ["R2241", "OOP", "2003"]
Assuming that your three parts consist of (R and one or more digits), then an underbar, then (one or more non-whitespace characters), before finally (a 4-digit numeric date), then your regex could be something like this:
^(R\d+)_(\S+)(\d{4})$
The ^ indicates start of string, and the $ indicates end of string. \d+ indicates one or more digits, while \S+ says one or more non-whitespace characters. The \d{4} says exactly four digits.
To recover data from the matches, you could either use the pre-defined globals that line up with your groups, or you could could use named captures.
To use the match globals just use $1, $2, and $3. In general, you can figure out the number to use by counting the left parentheses of the specific group.
To use the named captures, include ? right after the left paren of a particular group. For example:
x = "R2241_OOP2003"
match_data = /^(?<first>R\d+)_(?<second>\S+)(?<third>\d{4})$/.match(x)
puts match_data['first'], match_data['second'], match_data['third']
yields
R2241
OOP
2003
as expected.
As long as your pattern covers all possibilities, then you just need to use the match object to return the 3 strings:
my_match = "R224_OO2003".match(/(.*?)(_)(.*?)(\d+)/)
#=> #<MatchData "R224_OO2003" 1:"R224" 2:"_" 3:"OO" 4:"2003">
puts my_match[0] #=> "R224_OO2003"
puts my_match[1] #=> "R224"
puts my_match[2] #=> "_"
puts my_match[3] #=> "00"
puts my_match[4] #=> "2003"
A MatchData object contains an array of each match group starting at index [1]. As you can see, index [0] returns the entire string. If you don't want the capture the "_" you can leave it's parentheses out.
Also, I'm not sure you are getting what you want with the part:
(.*?)
this basically says one or more of any single character followed by zero or one of any single character.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
This is my code:
['banana', 'anana', 'naan'].map do |food|
food.reverse
end.select { |f| f.match /^a/ }
# => ["ananab", "anana"]
I wonder if this regex is used to find a word which include characters different from "a" (Negation) or if the regex matches a word, that begins with "a" (ananab and anana from my understanding).
Can someone help me a little more?
If you thought ^ means negation, that is so only when it is the first character in [], which expresses character groups. Even if you have a negation [^a], that does not mean a string that does not have the character a. It means a string that has a character other than a.
Regex is a tool to try to match something. In its implementation, it will try to match a pattern in any way possible by changing the match position, backtracking, etc. If you want to see if a string does not match a pattern, the most straightforward way is to use negation on the predicate, not on the regex. The following will return true when string s does not include an a:
s !~ /a/
But in such simple case, you can rather do:
!s.include?("a")
or instead of select, you can use reject:
reject{|s| s.include?("a")}
The regex /^a/ means: "match any string where the first character is an a". The ^ character here indicates the start of a string.
I believe what you're looking for is /[^a]/, which means: "match any string that contains a character that is not an a". The [^...] syntax will match any character except those within the braces.
To really get at what you want, throw a * on the end of there /^[^a]*$/, to say: "match any string where all the characters from start (^) to finish ($) are not a.
Addendum: As per the comments, anywhere I've written "string", I really mean "string or line". The ^ and $ characters are anchors, pinning the regex to the start or end of the line. Or, in the case of a string without any newlines, it anchors to the start or end of the string.
It is not clear what your question is. Are you asking what the code you posted does, or how to exclude items from the array that contain the letter "a" ?
The code you posted:
['banana', 'anana', 'naan'].map do |food|
food.reverse
end.select { |f| f.match /^a/ }
does the following:
first it creates a new array were each element is reversed (assumes each element is a string)
so
['banana', 'anana', 'naan'].map do |food|
food.reverse
end
should result in an array like this:
["ananab", "anana", "naan"]
This is because map takes each element in the array and injects it into the block
do |food|
food.reverse
end
which does a string reverse on each elemet and creates a new array were each element has been "mapped" to the reversed string
then the
.select { |f| f.match /^a/ }
part will create another array from ["ananab", "anana", "naan"]
containing each element that begins with the letter a (/^a/ means begins with "a")
resulting in the array
["ananab", "anana"]
If your question is how to exclude words containing the letter a then
['banana', 'anana', 'naan'].reject { |s| s.include?("a") }
should do what you want (as sawa pointed out)
Try something like this instead:
['banana', 'anana', 'naan'].map(&:reverse).select { |f| !f.include?("a") }
The code does the following: reverses the strings in the array and filters all strings that contain the character "a".